DQN: Deep Q-Network
This project implements a variation on DQN (as described in the original DeepMind paper) and learns several policies to solve three reinforcement learning problems in OpenAI Gym: CartPole-v0, LunarLander-v2 and MountainCar-v0.
For a detailed writeup on the algorithm and project see this post.
The differences between this algorithm and DQN are:
- no convolutional layers in the neural networks.
- no preprocessing function (phi).
- L2 squared loss by default
1 and 2 are not needed for the selected Gym environments since these already provide feature-based representations of states. 3 is just how I decided to train these agents. But the training algorithm allows replacing the L2 loss with any other loss, including Huber loss.
Training was done over combinations of epsilon annealing schedules and total training steps. For all environments, three policies (models) are included: best performing, worst performing and one generated from a combination of hyperparameters which proved successful in generating good policies for the three environments.
dqn.py: implementation of DQN. Contains both training and testing functions.
train.py: training on all 3 environments at once.
test.py: testing on all 3 environments at once.
plot.py: plotting of training statistics.
- CartPole-v0: trained models along with training and testing stats for this environment.
- LunarLander-v2: trained models along with training and testing stats for this environment.
- MountainCar-v0: trained models along with training and testing stats for this environment.
- OpenAI Gym (see installation note below)
- matplotlib (if you want to plot training stats).
After downloading or cloning the repo:
pip install -r requirements.txt
note: in order to install Box2D (used by Gym for LunarLander), you might have:
- download the Gym sources
- compile and install Gym locally from sources:
pip install -e <path to gym>
- install Box2D from Gym:
pip install -e '<path to gym>/[box2d]'