pytorch-rl

A list of references to my reimplementations of RL algorithms:

Asynchronous Methods for Deep Reinforcement Learning (A3C) (arxiv, my code)
Advantage Actor Critic (A2C) (my code)
Proximal Policy Optimization Algorithms (PPO) (arxiv, my code)
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)(arxiv, my code)
Trust Region Policy Optimization (TRPO) (arxiv, my code)
Continuous Deep Q-Learning with Model-based Acceleration (NAF) (arxiv, my code)

TODO (volunteers are welcome)

Move TRPO to a2c-ppo-acktr code, implement it as a hessian free optimizer (as ACKTR is implemented as KFAC)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md