In this repo

Python replication of all the plots from Reinforcement Learning: An Introduction
Solution for all of the exercises
Anki flashcards summary of the book

1. Replicate all the figures

To reproduce a figure, say figure 2.2, do:

cd chapter2
python figures.py 2.2

Chapter 2

Chapter 4

Figure 4.2: Jack’s car rental problem (value function, policy)
Figure 4.3: The solution to the gambler’s problem (value function, policy)

Chapter 5

Figure 5.1: Approximate state-value functions for the blackjack policy
Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES
Figure 5.3: Weighted importance sampling
Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates
Figure 5.5: A couple of right turns for the racetrack task (1, 2, 3)

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Figure 10.1: The cost-to-go function for Mountain Car task in one run (428 steps; 12, 104, 1000, 9000 episodes)
Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

Chapter 12

Chapter 13

2. Solution to all of the exercises (text answers)

To reproduce the results of an exercise, say exercise 2.5 do:

cd chapter2
python figures.py ex2.5

Chapter 2

Chapter 4

Exercise 4.7: Modified Jack's car rental problem (value function, policy)
Exercise 4.9: Gambler’s problem with ph = 0.25 (value function, policy) and ph = 0.55 (value function, policy)

Chapter 5

Exercise 5.14: Modified MC Control on the racetrack (1, 2)

Chapter 6

Chapter 7

Chapter 8

Chapter 11

Exercise11.3: One-step semi-gradient Q-learning to Baird’s counterexample

3. Anki flashcards (cf. this blog)

Appendix

Dependencies

numpy
matplotlib
seaborn

Credits

All of the code and answers are mine, except for mountain car's tile coding (url in the book).

This README is inspired from ShangtongZhang's repo.

Design choices

All of the chapters are self-contained.
The environments use a gym-like API with methods:

s = env.reset()
s_p, r, d, dict = env.step(a)

How long did it take

The entire thing (plots, exercises, anki cards (including reviewing)) took about 400h of focused work.

Name		Name	Last commit message	Last commit date
Latest commit History 385 Commits
chapter1		chapter1
chapter10		chapter10
chapter11		chapter11
chapter12		chapter12
chapter13		chapter13
chapter2		chapter2
chapter4		chapter4
chapter5		chapter5
chapter6		chapter6
chapter7		chapter7
chapter8		chapter8
chapter9		chapter9
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
exercises.txt		exercises.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In this repo

1. Replicate all the figures

Chapter 2

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

2. Solution to all of the exercises (text answers)

Chapter 2

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 11

3. Anki flashcards (cf. this blog)

Appendix

Dependencies

Credits

Design choices

How long did it take

About

Contributors 2

Languages

License

mtrazzi/rl-book-challenge

Folders and files

Latest commit

History

Repository files navigation

In this repo

1. Replicate all the figures

Chapter 2

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

2. Solution to all of the exercises (text answers)

Chapter 2

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 11

3. Anki flashcards (cf. this blog)

Appendix

Dependencies

Credits

Design choices

How long did it take

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages