A list of references to my reimplementations of RL algorithms:
-
Asynchronous Methods for Deep Reinforcement Learning (A3C) (arxiv, my code)
-
Advantage Actor Critic (A2C) (my code)
-
Proximal Policy Optimization Algorithms (PPO) (arxiv, my code)
-
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)(arxiv, my code)
-
Continuous Deep Q-Learning with Model-based Acceleration (NAF) (arxiv, my code)
- Move TRPO to a2c-ppo-acktr code, implement it as a hessian free optimizer (as ACKTR is implemented as KFAC)