Deep Recurrent Q-Learning (DRQN) with Pytorch

This repository is dedicated to implementing Deep Recurrent Q-Learning (DRQN) using PyTorch, inspired by the paper Deep Recurrent Q-Learning for Partially Observable MDPs.

Prerequisites

To get started with this project, ensure you have the following installed:

Pytorch (version 1.5.0)
OpenAI Gym (version 0.17.1)
Tensorboard (version 2.1.0)

Training Environment

We utilize the OpenAI Gym's CartPole-v1 environment for training our DRQN model.

Partially Observable Markov Decision Process (POMDP)

In the CartPole-v1 environment, the agent receives information about the cart's position, velocity, and the pole's angle, velocity. For our DRQN implementation, we define the observable state as the position of the cart and the angle of the pole, deliberately omitting velocity data to create a POMDP.

Stable Recurrent Updates

We employ two methods for stable recurrent updates in our DRQN model:

1. Bootstrapped Sequential Updates

Randomly select episodes from the replay memory.
Start updating from the beginning of each episode.
Generate targets at each timestep from the target Q-network.
Carry forward the RNN's hidden state throughout the episode.

2. Bootstrapped Random Update

Randomly select episodes from the replay memory.
Begin updating from random points within the episode for a predetermined number of timesteps (lookup_step).
Zero out the RNN's initial state at the start of each update.
Again, targets at each timestep are generated from the target Q-network.

The parameters above are crucial for configuring the DRQN settings, including the choice of update method and the length of the lookup_step. Longer lookup_step values have shown to yield better results.

Comparative Analysis: DQN vs. DRQN

We compare three scenarios:

DQN with a fully observed MDP.
DQN under POMDP conditions.
DRQN under POMDP conditions.

Observations:

DQN with full observability (orange) achieves the highest rewards.
DQN under POMDP constraints (blue) struggles to reach high rewards.
DRQN in POMDP scenarios (red) demonstrates decent performance, despite limited observability.

TODO

Implement random update mechanism in DRQN.

Feel free to contribute or suggest improvements to this project!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
.gitignore		.gitignore
DQN.py		DQN.py
DQN_POMDP.py		DQN_POMDP.py
DQN_POMDP_SEED_1.pth		DQN_POMDP_SEED_1.pth
DQN_SEED_1.pth		DQN_SEED_1.pth
DRQN.py		DRQN.py
DRQN_Fully.py		DRQN_Fully.py
DRQN_POMDP_Random_FOMDP_SEED_1.pth		DRQN_POMDP_Random_FOMDP_SEED_1.pth
DRQN_POMDP_Random_SEED_1.pth		DRQN_POMDP_Random_SEED_1.pth
DRQN_POMDP_SEQ_FOMDP_SEED_1.pth		DRQN_POMDP_SEQ_FOMDP_SEED_1.pth
DRQN_POMDP_SEQ_SEED_1.pth		DRQN_POMDP_SEQ_SEED_1.pth
README.md		README.md

keep9oing/DRQN-Pytorch-CartPole-v1

Folders and files

Latest commit

History

Repository files navigation

Deep Recurrent Q-Learning (DRQN) with Pytorch

Prerequisites

Training Environment

Partially Observable Markov Decision Process (POMDP)

Stable Recurrent Updates

1. Bootstrapped Sequential Updates

2. Bootstrapped Random Update

Comparative Analysis: DQN vs. DRQN

TODO

About

Resources

Stars

Watchers

Forks

Languages