Skip to content

Pytorch Implementation Of Deep Q Reinforcement Learning to play Attari Breakout game.

Notifications You must be signed in to change notification settings

rahulranjan29/Playing-Attari-Breakout-using-DeepQNetworks-

Repository files navigation

Playing-Attari-Breakout-using-DeepQNetworks-

Pytorch Implementation Of Deep-Q Reinforcement Learning Algorithm to play the Attari Breakout game.

To know the basics of Deep Q-Reinforcement Learning

Demistifying Deep Reinforcement Learning

Requirements

How to run :

  • To train DQN: python main.py --train_dqn
  • To test DQN : python main.py --test_dqn --resume (Path to model weights)
  • Saved Model Weights

Algorithm :

Reference Paper: Human Level Control Through Deep Reinforcement Learning.[Link]

Implementation Details:

Model:

Experience Replay Buffer

We store states, action, reward in memory to be used for experience replay. We sample random minibatches from this buffer to train our model. It helps in decorrelating the input. The number of frames that can be stored in this buffer depends on the size of your RAM / GPU Memory. In my implementation, I used a cyclic replay buffer of size 0.4M frames. [1] For the first 50000 steps of training, we do not train our model, we only use this to fill our replay buffer to a initial capacity.

An important thing to take into consideration is the amount of memory 0.4M frames will consume in your RAM. If we store scaled versions(dtype: np.float32) of frames in buffer, each frame costs us 0.12Mb. If you use 0.4M frames, the total memory you require would be around 45Gb which we obviously dont want. To make efficient use of memory, do not scale the frames, simply store each frame in np.uint8 format in buffer and whenever required just convert to float32. In np.uint8 format, the total memory required would be around 10~11Gb for 0.4M frames.

Preprocessing:

attari_wrapper_openai.py modifies the original attari environment to add functionalities which was implemented in DeepMind's Paper. It also applies a pre-processing function to convert the original 210x160x3 frame to 84x84 grayscale frame and stacks up 4 recent frames to get the input of shape 4x84x84 to be forwared to the CNN model. Make sure you do not set the Scale parameter as True to avoid memory issues.

Training:

Using a linearly decaying epsilon greedy policy to take actions:

We start with an initial epsilon value of 1.0 for 50000 steps, for the next 1M steps, the value of epsilon is linearly decreased to a constant final value of 0.01 untill termination of training.

Optimizing :

From 50000 step onwards, we start to optimize our model, For every 4 steps into the epsiode, we sample a random batch of frames, compute losses using the policy network and target network and update. The code is implemented in optimize_model() method. I run the training for a total of around 5M steps with 50k episodes.

Results:

Fig1: Reward/Episode

Fig2: Average Reward/100Episodes

Fig3: LengthOfEpisode/Episode

Fig4: AverageLengthOfEpisodes/100Episode

Fig5: Loss/Episode

Fig6: AverageLoss/100Episode

Additional Tip

You could run this code on google colab and it will take around 4-5 hours of training to achieve the above results. Make sure you dont change the seed to reproduce it.

References

About

Pytorch Implementation Of Deep Q Reinforcement Learning to play Attari Breakout game.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages