# Week 26 — Policy Gradients & Variance Reduction

*Last updated:* 2025-09-09

## Objectives
- [ ] Understand policy gradients & variance reduction
- [ ] Complete guided exercises (theory → code → evaluation)
- [ ] Apply learning in a small project or lab
- [ ] Reflect using self-assessment checklist

## Mini-Theory (Deep Dive)
- REINFORCE; baselines; entropy regularization
- Advantage estimation (GAE) intuition

## Guided Exercises
    The following exercises are structured to help you learn by doing. Each has **starter code**, **hints**, and **checks**.

In [None]:
# Exercise: Implement tabular Q-learning on a simple gridworld
import numpy as np, random

n_states, n_actions = 16, 4
Q = np.zeros((n_states, n_actions))
alpha, gamma, eps = 0.1, 0.95, 0.1

def step(s, a):
    s2 = (s + (1 if a==0 else -1)) % n_states
    r = 1.0 if s2 == n_states-1 else -0.01
    done = s2 == n_states-1
    return s2, r, done

for ep in range(300):
    s = 0
    done = False
    while not done:
        a = np.argmax(Q[s]) if random.random() > eps else random.randrange(n_actions)
        s2, r, done = step(s, a)
        Q[s, a] += alpha * (r + gamma * Q[s2].max() - Q[s, a])
        s = s2
print("Q[0]:", Q[0])

## Project Work
- This week connects to: `syllabus/phase-05-rl-control.md`
- Implement the **Build** task described in the project README. Tie your notebook experiments into that code (e.g., import your module or save artifacts for the project).

### Deliverable
- A short write-up (5–10 bullets) on **what worked, what didn’t, and what you’ll try next**.

## Self-Assessment Checklist
- [ ] I can explain the key concepts of **Policy Gradients & Variance Reduction** in my own words.
- [ ] I completed the guided exercises and validated outputs.
- [ ] I produced a small artifact (code, plot, or report) and linked it to the project.
- [ ] I captured 3–5 learnings and 2 next steps.

---
**Tip:** Keep each week to ~10 hours: ~3h study, ~3h coding, ~3h project, ~1h reflection.