a computational approach to understanding and automating goal-directed learning and decision making
-
maze
Learning agents to navigate with Q-learning, comparing Expected SARSA, Dyna-Q and Dyna-Q+
-
mountain-car
On-policy Control with function approximation. Weak car climbing hill using episodic semi-gradient one-step SARSA
-
pendulum
Average Reward Softmax Actor-Critic on a continuing task.
-
lunar-lander
Deep RL using experience replay and expected SARSA
-
pong
Pong from Pixels using policy gradients
References Reinforcement learning: an introduction (second edition) Richard S Sutton; Andrew G Barto Full pdf