🎮 RL Book Interactive Labs

Companion interactive apps for Complete Reinforcement Learning Journey: From Basics to RLHF

Don't just read about algorithms — watch them think.

🧪 What Is This?

Each chapter of the book has a companion browser-based interactive lab where you can:

🔵 Step through algorithms cell-by-cell and see values update in real-time
🎛️ Tweak parameters (γ, ε, learning rate) with sliders and instantly see the effect
🤖 Watch agents navigate grids, solve problems, and learn from mistakes
📊 Inspect Q-values, policy arrows, and convergence logs live

No installation required. Open in any browser. Works on desktop and mobile.

📚 Available Labs

Chapter	Lab	Concepts Covered	Try It
Ch 2	MDP Explorer	States, Actions, Rewards, Transitions, Deterministic vs Stochastic, Policy, Value Function	▶ Launch
Ch 3	Policy Iteration on FrozenLake	Bellman equations, policy evaluation (sweep-by-sweep), policy improvement, convergence	▶ Launch
Ch 4	Monte Carlo Blackjack (coming soon)	First-visit MC, exploring starts, episode replay	—
Ch 5	TD Learning & SARSA (coming soon)	TD(0), SARSA, Q-learning, cliff walking	—
Ch 6	DQN on CartPole (coming soon)	Experience replay, target networks, training curves	—
Ch 7	Policy Gradients (coming soon)	REINFORCE, baselines, variance reduction	—

🌐 Ch2: MDP Explorer

Understand the building blocks of every RL algorithm. Explore a 5×5 Gridworld MDP interactively.

Three Modes

Mode	What You Learn
🔍 Explore	Click any cell → see its state (r,c), reward, transition probabilities for each action, and Q-values
π Policy	See policy arrows on every cell. Click to cycle actions and build your own policy
V Value	Color-coded heatmap of V(s). Green = high value, red = low value

Key Features

Deterministic vs Stochastic — slide slip from 0 to 0.6 and watch transition probabilities change
Click any cell → full breakdown of transitions, rewards, and Q(s,a) for all 4 actions
⚡ Solve Optimal Policy — finds π* and shows value heatmap
🤖 Run Robot — animated step-by-step episode
🤖×10 Run 10 Episodes — shows success rate (deterministic vs stochastic)
✏️ Edit Grid — paint walls, pits, goals, and start positions to create your own MDP

What to Try

Click cells → inspect State, Action, Reward, Transition
⚡ Solve with slip=0 → observe shortest path
Set slip=0.3 → Solve again → policy becomes cautious near pits!
Run 🤖 with slip=0 → always reaches goal
Run 🤖×10 with slip=0.3 → some episodes fail!
Compare γ=0.3 vs γ=0.99 → value function changes dramatically
Edit grid: add more pits near the goal → watch policy adapt

🚀 Ch3: Policy Iteration Visualizer

Step through the Policy Iteration algorithm on a 4×4 FrozenLake grid.

What You Can Do

Button	What Happens
① One Eval Sweep	Each cell lights up blue as its value updates via the Bellman equation
① Full Evaluation	Runs all sweeps until V^π converges
② Improve Policy	Arrows change one-by-one to the greedy action — green flash on changes
▶▶ Auto-Run	Runs the full evaluate → improve loop with pauses between iterations
🤖 Run Robot	Animated robot walks the grid following the current policy
↺ Reset	Start fresh with new parameters

What to Try

Press ① One Sweep — watch cells light up one by one
Press ① again — values get more accurate each sweep
Press ① Full Eval — converge V^π completely
Press ② — watch arrows change direction!
Repeat ①→② until π* found
Press 🤖 — watch the robot navigate!
Try γ=0.5 vs γ=0.99 — compare policies
Try Slip=0 vs Slip=0.5 — deterministic vs stochastic

🏗️ Project Structure

rl-book-labs/
├── README.md
├── ch2/
│   └── index.html          # MDP Explorer (5×5 Gridworld)
├── ch3/
│   └── index.html          # Policy Iteration on FrozenLake
├── ch4/                    # (coming soon)
├── ch5/                    # (coming soon)
└── ch6/                    # (coming soon)

Each lab is a single HTML file — no build step, no dependencies, no frameworks. Just open in any browser.

🎓 About the Book

Complete Reinforcement Learning Journey: From Basics to RLHF

The only book that takes you from "What is a Markov Decision Process?" all the way to "How do we align language models with human values?" — with intuition, math, code, and interactive labs at every step.

Key Features

📖 Intuition → Math → Code triple for every concept
🤖 DeliBot running example that grows with the theory
🧠 Think Like an Agent boxes for building intuition
⚠️ Common Misconceptions boxes to prevent errors
🔬 Interactive Labs (this repo!) for hands-on learning
📝 Quizzes with detailed answer keys for each chapter

🤝 Contributing

Found a bug in a lab? Have an idea for a new visualization? Contributions are welcome!

Fork the repo
Create a branch (git checkout -b feature/new-lab)
Commit your changes
Open a Pull Request

📄 License

MIT License — free to use, modify, and distribute.

Built with ❤️ as a companion to the book.
"The best way to learn an algorithm is to watch it think."

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude		.claude
EngComp4_landlinear-master		EngComp4_landlinear-master
__pycache__		__pycache__
extracted_images		extracted_images
frames/data_to_vector		frames/data_to_vector
media		media
output		output
profile_readme		profile_readme
CLAUDE.md		CLAUDE.md
Ch2_MDP_Environments_Lab.ipynb		Ch2_MDP_Environments_Lab.ipynb
Ch3_Dynamic_Programming_Lab.ipynb		Ch3_Dynamic_Programming_Lab.ipynb
Gilbert_Strang_Linear_Algebra_and_Its_Applicatio_230928_225121.pdf		Gilbert_Strang_Linear_Algebra_and_Its_Applicatio_230928_225121.pdf
LADR4e.pdf		LADR4e.pdf
LADW-2014-09.pdf		LADW-2014-09.pdf
Linear Algebra.pdf		Linear Algebra.pdf
LinearAlgebraPrimerConcepts.pdf		LinearAlgebraPrimerConcepts.pdf
Master_Prompt_1_Beamer_Manim_Design.md		Master_Prompt_1_Beamer_Manim_Design.md
Master_Prompt_2_Manim_Code_Generator.md		Master_Prompt_2_Manim_Code_Generator.md
README.md		README.md
claude_words_scene.py		claude_words_scene.py
ila.pdf		ila.pdf
manim_scenes.py		manim_scenes.py
preview-9780192598325_A40523561.pdf		preview-9780192598325_A40523561.pdf
slides.tex		slides.tex
vmls.pdf		vmls.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎮 RL Book Interactive Labs

🧪 What Is This?

📚 Available Labs

🌐 Ch2: MDP Explorer

Three Modes

Key Features

What to Try

🚀 Ch3: Policy Iteration Visualizer

What You Can Do

What to Try

🏗️ Project Structure

🎓 About the Book

Key Features

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

mlnjsh/Projects

Folders and files

Latest commit

History

Repository files navigation

🎮 RL Book Interactive Labs

🧪 What Is This?

📚 Available Labs

🌐 Ch2: MDP Explorer

Three Modes

Key Features

What to Try

🚀 Ch3: Policy Iteration Visualizer

What You Can Do

What to Try

🏗️ Project Structure

🎓 About the Book

Key Features

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages