This project provides a PyTorch implementation of the Deep Q-Network (DQN) and Double Deep Q-Network (DDQN) reinforcement learning algorithms to solve the CartPole-v1 environment provided by OpenAI's Gym library.
The following dependencies are required to run this project:
- python 3.8.10
- gym
- torch
- matplotlib
- gym[classic_control]
You can install these packages using pip
pip3 install gym torch matplotlib
pip3 install gym[classic_control]
Navigate to the CartPole directory and run the train.py script:
$ cd CartPole
$ python3 train.py
This starts the training process for the agent. Upon completion of the training, the program will plot the history of rewards over the number of episodes. It will also save the model as DQN.pth in the models directory.
You can tune the following hyperparameters to better train the model:
- Memory_size (default: 10000)
- Sample_size (default: 64)
- Number of episodes (default: 1000)
- Number of steps (default: 500)
- Gamma (default: 0.99)
- Epsilon (default: 1)
- Epsilon decay (default: 0.99)
- Target network update period (default: 10)
- Model to use (default: DDQN)
These hyperparameters are stored in config/hyperparameters.py file.
To evaluate the performance of the trained model, you can run the evaluate.py
script with several arguments:
- --render to render the environment
- --untrained to use untrained network
- --model with string argument to choose the model to test (DQN or DDQN)
python3 evaluate.py --render --model DDQN
Alternatively, you can run this repository inside the docker container without the need of installing the required packages. For that pull the image from the Docker Hub:
docker pull fenixkz/cartpole_ddqn:torch
And then run the bash script start_docker.sh
chmod +x start_docker.sh
./start_docker.sh
Note that if you want to use CUDA inside the docker image, you have to have NVIDIA Container Toolkit. The installation process can be found here
Also, this script runs the following command xhost +local:root
. This command allows any local user to access your X server. It's a potential security risk, so be sure you understand the implications.
Here is an example of the potential results (DQN):
In this instance, the maximum number of steps the model was able to balance the pole in evaluate mode was 235.
Here is an example of training results for DDQN:
In this case, in evaluate mode the pole was balancing for hours