This repository hosts chapter-wise companion code for the book Reinforcement Learning Explained.
It provides clean, minimal, and well-tested implementations of key reinforcement learning concepts.
- Chapter 2: The RL Problem Formulation
- Chapter 3: Multi-Armed Bandits
- Chapter 4: Dynamic Programming Approaches
- Chapter 5: Monte Carlo Methods
- Chapter 6: Temporal-Difference Learning
- Chapter 7: TD Control — SARSA and Q-Learning
- Chapter 8: Eligibility Traces and TD(λ)
- Chapter 9: Model-Based RL and Planning
- Chapter 10: Function Approximation Basics
- Chapter 11: Policy Gradient Fundamentals (REINFORCE)
| Chapter | Title | Status | Notes |
|---|---|---|---|
| 1 | Introduction | ✅ Complete | Book only (no code needed) |
| 2 | The RL Problem Formulation | ✅ Complete | GridWorld, evaluation, policies, examples |
| 3 | Multi-Armed Bandits | ✅ Complete | Bandit envs, ε-greedy, UCB, Thompson |
| 4 | Dynamic Programming Approaches | ✅ Complete | Policy Iteration, Value Iteration |
| 5 | Monte Carlo Methods | ✅ Complete | Prediction, Control, On/Off-Policy |
| 6 | Temporal-Difference Learning | ✅ Complete | TD(0), n-step TD, prediction examples |
| 7 | TD Control | ✅ Complete | SARSA, Q-learning, Cliff-Walking, exploration |
| 8 | Eligibility Traces and TD(λ) | ✅ Complete | TD(λ), SARSA(λ), True Online TD(λ), gridworld demos |
| 9 | Model-Based RL and Planning | ✅ Complete | Dyna-Q, planning with rollouts, gridworld demos |
| 10 | Function Approximation Basics | ✅ Complete | Linear approx, tile coding, TD(0), SARSA, Mountain Car |
| 11 | Policy Gradient Fundamentals | ✅ Complete | REINFORCE, baselines, softmax & Gaussian policies |
rl-fundamentals-code/
├─ ch2_rl_formulation/ # Chapter 2
├─ ch3_multi_armed_bandits/ # Chapter 3
├─ ch4_dynamic_programming/ # Chapter 4
├─ ch5_monte_carlo/ # Chapter 5
├─ ch6_td_learning/ # Chapter 6
├─ ch7_td_control/ # Chapter 7
├─ ch8_td_lambda/ # Chapter 8
├─ ch9_model_based_planning/ # Chapter 9
├─ ch10_function_approx/ # Chapter 10
├─ ch11_policy_gradient/ # Chapter 11
├─ utils/
└─ .github/workflows/
To run all tests:
python -m pytest -q- GitHub Actions (
.github/workflows/*.yml) automatically run tests for each chapter on every push and pull request. - This ensures correctness and reproducibility of the examples.
If you use this code or the accompanying book in your research or teaching, please cite:
Book (forthcoming):
@book{baride2025rlexplained,
author = {Srikanth Baride and Rodrigue Rizk and K. C. Santosh},
title = {Reinforcement Learning Explained},
publisher = {CRC Press | Taylor \& Francis Group},
year = {2025},
isbn = {9781041252993},
note = {Accepted for publication; preprint available at \url{https://github.com/srikanthbaride/rl-explained-preprint}}
}
Companion Code (GitHub):
@misc{baride2025rlcode,
author = {Srikanth Baride},
title = {Reinforcement Learning Explained — Companion Code},
year = {2025},
howpublished = {\url{https://github.com/srikanthbaride/Reinforcement-Learning-Explained-Code}},
note = {Accessed: YYYY-MM-DD}
}MIT License © 2025 — Srikanth Baride