Deep Reinforcement Learning Algorithms and Code - Explanations of research papers and their implementations (All algorithm implementations are done in Pytorch)
REINFORCE
: Vanilla Policy GradientDQN
: Deep Q-Learning, Mnih et al, 2013A3C/A2C
: Asynchronous methods for Deep RL,Mnih et al, 2016PPO
: Proximal Policy Optimization,Schulman et al, 2017DDPG
: Deep Deterministic Policy Gradient,Lillicrap et al, 2015
(Folder General
: General tips on Deep reinforcement Learning)
From Open AI "Spinning Up as a Deep RL Researcher (or Practitioner)".: How to start in Deep RL assuming you've got a solid background in Mathematics(1,2), a general knowledge of Deep Learning and are familiar with at least one Deep Learning Library (Like PyTorch or TensorFlow):
Which algorithms? You should probably start with vanilla policy gradient (also called REINFORCE), DQN, A2C (the synchronous version of A3C), PPO (the variant with the clipped objective), and DDPG, approximately in that order. The simplest versions of all of these can be written in just a few hundred lines of code (ballpark 250-300), and some of them even less (for example, a no-frills version of VPG can be written in about 80 lines). Write single-threaded code before you try writing parallelized versions of these algorithms. (Do try to parallelize at least one.)
Further Algorithms to study (Suggested at Open AI Hackathon):
TRPO
: Schulman et al, 2015C51
: Bellemare et al, 2017QR-DQN
: Dabney et al, 2017SVG
: Heess et al, 2015I2A
: Weber et al, 2017MBMF
: Nagabandi et al, 2017AlphaZero
: Silver et al, 2017
Start with the most simple algorithm (REINFORCE). First read the paper carefully. Then read the implementation and try to rewrite the code from scratch. Take care not to overfit on implementation details or on paper details.
My framework of choice is Pytorch which is covered by a free licence ( Modified BSD license).
The implementations were taken from various sources with a focus on simplicity and ease of understanding (including Udacity's repository for the Deep Reinforcement Learning Nanodegree). There are numerous implementations available including very good modular ones but my purpose is mastering the RL theory and algorithms. Creating modular code is a secondary goal.
There are minor corrections on the implementations with the aim of making them easier to understand and consistent.