Skip to content

๐ŸŽฏ Beamer + Manim Animated Presentations โ€” Mathematical animations embedded in LaTeX slides for lectures & talks.

Notifications You must be signed in to change notification settings

mlnjsh/Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฎ RL Book Interactive Labs

Companion interactive apps for Complete Reinforcement Learning Journey: From Basics to RLHF

Don't just read about algorithms โ€” watch them think.

License: MIT GitHub Pages


๐Ÿงช What Is This?

Each chapter of the book has a companion browser-based interactive lab where you can:

  • ๐Ÿ”ต Step through algorithms cell-by-cell and see values update in real-time
  • ๐ŸŽ›๏ธ Tweak parameters (ฮณ, ฮต, learning rate) with sliders and instantly see the effect
  • ๐Ÿค– Watch agents navigate grids, solve problems, and learn from mistakes
  • ๐Ÿ“Š Inspect Q-values, policy arrows, and convergence logs live

No installation required. Open in any browser. Works on desktop and mobile.


๐Ÿ“š Available Labs

Chapter Lab Concepts Covered Try It
Ch 2 MDP Explorer States, Actions, Rewards, Transitions, Deterministic vs Stochastic, Policy, Value Function โ–ถ Launch
Ch 3 Policy Iteration on FrozenLake Bellman equations, policy evaluation (sweep-by-sweep), policy improvement, convergence โ–ถ Launch
Ch 4 Monte Carlo Blackjack (coming soon) First-visit MC, exploring starts, episode replay โ€”
Ch 5 TD Learning & SARSA (coming soon) TD(0), SARSA, Q-learning, cliff walking โ€”
Ch 6 DQN on CartPole (coming soon) Experience replay, target networks, training curves โ€”
Ch 7 Policy Gradients (coming soon) REINFORCE, baselines, variance reduction โ€”

๐ŸŒ Ch2: MDP Explorer

Understand the building blocks of every RL algorithm. Explore a 5ร—5 Gridworld MDP interactively.

Three Modes

Mode What You Learn
๐Ÿ” Explore Click any cell โ†’ see its state (r,c), reward, transition probabilities for each action, and Q-values
ฯ€ Policy See policy arrows on every cell. Click to cycle actions and build your own policy
V Value Color-coded heatmap of V(s). Green = high value, red = low value

Key Features

  • Deterministic vs Stochastic โ€” slide slip from 0 to 0.6 and watch transition probabilities change
  • Click any cell โ†’ full breakdown of transitions, rewards, and Q(s,a) for all 4 actions
  • โšก Solve Optimal Policy โ€” finds ฯ€* and shows value heatmap
  • ๐Ÿค– Run Robot โ€” animated step-by-step episode
  • ๐Ÿค–ร—10 Run 10 Episodes โ€” shows success rate (deterministic vs stochastic)
  • โœ๏ธ Edit Grid โ€” paint walls, pits, goals, and start positions to create your own MDP

What to Try

  1. Click cells โ†’ inspect State, Action, Reward, Transition
  2. โšก Solve with slip=0 โ†’ observe shortest path
  3. Set slip=0.3 โ†’ Solve again โ†’ policy becomes cautious near pits!
  4. Run ๐Ÿค– with slip=0 โ†’ always reaches goal
  5. Run ๐Ÿค–ร—10 with slip=0.3 โ†’ some episodes fail!
  6. Compare ฮณ=0.3 vs ฮณ=0.99 โ†’ value function changes dramatically
  7. Edit grid: add more pits near the goal โ†’ watch policy adapt

๐Ÿš€ Ch3: Policy Iteration Visualizer

Step through the Policy Iteration algorithm on a 4ร—4 FrozenLake grid.

What You Can Do

Button What Happens
โ‘  One Eval Sweep Each cell lights up blue as its value updates via the Bellman equation
โ‘  Full Evaluation Runs all sweeps until V^ฯ€ converges
โ‘ก Improve Policy Arrows change one-by-one to the greedy action โ€” green flash on changes
โ–ถโ–ถ Auto-Run Runs the full evaluate โ†’ improve loop with pauses between iterations
๐Ÿค– Run Robot Animated robot walks the grid following the current policy
โ†บ Reset Start fresh with new parameters

What to Try

  1. Press โ‘  One Sweep โ€” watch cells light up one by one
  2. Press โ‘  again โ€” values get more accurate each sweep
  3. Press โ‘  Full Eval โ€” converge V^ฯ€ completely
  4. Press โ‘ก โ€” watch arrows change direction!
  5. Repeat โ‘ โ†’โ‘ก until ฯ€* found
  6. Press ๐Ÿค– โ€” watch the robot navigate!
  7. Try ฮณ=0.5 vs ฮณ=0.99 โ€” compare policies
  8. Try Slip=0 vs Slip=0.5 โ€” deterministic vs stochastic

๐Ÿ—๏ธ Project Structure

rl-book-labs/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ ch2/
โ”‚   โ””โ”€โ”€ index.html          # MDP Explorer (5ร—5 Gridworld)
โ”œโ”€โ”€ ch3/
โ”‚   โ””โ”€โ”€ index.html          # Policy Iteration on FrozenLake
โ”œโ”€โ”€ ch4/                    # (coming soon)
โ”œโ”€โ”€ ch5/                    # (coming soon)
โ””โ”€โ”€ ch6/                    # (coming soon)

Each lab is a single HTML file โ€” no build step, no dependencies, no frameworks. Just open in any browser.


๐ŸŽ“ About the Book

Complete Reinforcement Learning Journey: From Basics to RLHF

The only book that takes you from "What is a Markov Decision Process?" all the way to "How do we align language models with human values?" โ€” with intuition, math, code, and interactive labs at every step.

Key Features

  • ๐Ÿ“– Intuition โ†’ Math โ†’ Code triple for every concept
  • ๐Ÿค– DeliBot running example that grows with the theory
  • ๐Ÿง  Think Like an Agent boxes for building intuition
  • โš ๏ธ Common Misconceptions boxes to prevent errors
  • ๐Ÿ”ฌ Interactive Labs (this repo!) for hands-on learning
  • ๐Ÿ“ Quizzes with detailed answer keys for each chapter

๐Ÿค Contributing

Found a bug in a lab? Have an idea for a new visualization? Contributions are welcome!

  1. Fork the repo
  2. Create a branch (git checkout -b feature/new-lab)
  3. Commit your changes
  4. Open a Pull Request

๐Ÿ“„ License

MIT License โ€” free to use, modify, and distribute.


Built with โค๏ธ as a companion to the book.
"The best way to learn an algorithm is to watch it think."

About

๐ŸŽฏ Beamer + Manim Animated Presentations โ€” Mathematical animations embedded in LaTeX slides for lectures & talks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors