This repository contains implementations of RL techniques on a Grid World:
-
Markov Decision Process
- Policy Evaluation
- Policy Iteration
- Value Iteration
-
Model Free
- First Visit Monte Carlo (MC) Policy Evaluation
- Temporal Difference (TD) Estimation
- On policy ε-greedy First Visit MC control
- MC ε-greedy First Visit Iterative Optimisation
- SARSA: On-policy TD control
- Q-Learning: Off-policy TD control