RL Sketchpad

Implementations of various RL Algorithms

Contents:

Reinforcement Learning: An Introduction (2nd ed, 2018) by Sutton and Barto
UCL Course on RL (2016) YouTube lectures by David Silver
Mini Posts - other algorithms

Reinforcement Learning: An Introduction (2nd ed, 2018) by Sutton and Barto

Implementation of selected algorithms from the book. I tried to make code snippets minimal and faithful to the book.

Part I: Tabular Solution Methods
Chapter 2: Multi-armed Bandits 2.4: Simple Bandit - fig. 2.1, 2.2 2.6: Tracking Bandit - fig. 2.3
Chapter 4: Dynamic Programming 4.1: Iterative Policy Evaluation - FrozenLake-v0 4.3: Policy Iteration - FrozenLake-v0 4.4: Value Iteration - FrozenLake-v0
Chapter 5: Monte Carlo Methods 5.1: First-Visit MC Prediction - Blackjack-v0, fig. 5.1 5.3: Monte Carlo ES Control - Blackjack-v0, fig. 5.2 5.4: On-Policy First-Visit MC Control - Blackjack-v0
Chapter 6: Temporal-Difference Learning 6.1: TD Prediction - Blackjack-v0, example 6.2 Also: Running-Mean MC Prediction 6.4: Sarsa - WindyGridworld, example 6.5 6.5: Q-Learning - CliffWalking, example 6.6
Part II: Approximate Solution Methods
Chapter 9: On-Policy Prediction with Approximation 9.3a: Gradient Monte Carlo - example 9.1, fig. 9.1 9.3b: Semi-Gradient TD - example 9.2, fig. 9.2 (left) 9.5a: Linear Models - Polynomial and Fourier Bases - fig. 9.5 9.5b: Linear Models - Tile Coding - fig. 9.10 9.7: Neural Network with Memory Reply
Chapter 10: On-Policy Control with Approximation 10.1: Episodic Semi-Gradient Sarsa - MountainCar, fig 10.1, 10.2

UCL Course on RL (2016) Youtube lectures by David Silver

A bit more in-depth explanation of selected concepts from David Sivler lectures and Sutton and Barto book.

Lecture 3 - Dynamic Programming
- Dynamic Programming - Iterative Policy Evaluation, Policy Iteration, Value Iteration
Lecture 4 - Model Free Prediction
- MC and TD Prediction
- N-Step and TD(λ) Prediction - Forward TD(λ) and Backward TD(λ) with Eligibility Traces
Lecture 4 - Model-Free Control
- On-Policy Control - MC, TD, N-Step, Forward TD(λ), Backward TD(λ) with Eligibility Traces
- Off-Policy Control - Expectation Based - Q-Learning, Expected SARSA, Tree Backup
- Off-Policy Control - Importance Sampling - I.S. SARSA, N-Step I.S. SARSA, Off-Policy MC Control

Mini Posts

ANN and Correlated Data - simplest possible example showing why memory reply is necessary
Minimal TF Keras - fit sine wave

Name		Name	Last commit message	Last commit date
Latest commit History 486 Commits
Deep_Q_Network		Deep_Q_Network
Mini_Posts		Mini_Posts
RL_An_Introduction_2018		RL_An_Introduction_2018
UCL_Course_on_RL		UCL_Course_on_RL
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep_Q_Network

Deep_Q_Network

Mini_Posts

Mini_Posts

RL_An_Introduction_2018

RL_An_Introduction_2018

UCL_Course_on_RL

UCL_Course_on_RL

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

Repository files navigation

RL Sketchpad

Reinforcement Learning: An Introduction (2nd ed, 2018) by Sutton and Barto

UCL Course on RL (2016) Youtube lectures by David Silver

Mini Posts

About

Releases

Packages

Languages

marcinbogdanski/rl-sketchpad

Folders and files

Latest commit

History

Repository files navigation

RL Sketchpad

Reinforcement Learning: An Introduction (2nd ed, 2018) by Sutton and Barto

UCL Course on RL (2016) Youtube lectures by David Silver

Mini Posts

About

Resources

Stars

Watchers

Forks

Languages