Machine learning 1992 Q-learning
NIPS 13 Playing Atari with Deep Reinforcement Learning
DQN *Nature 2015 Human-level control through deep reinforcement learning
AAAI 15 Deep Recurrent Q-Learning for Partially Observable MDPs
*ICML 16 Continuous Deep Q-Learning with Model-based Acceleration
DDQN *AAAI 16 Deep Reinforcement Learning with Double Q-Learning
ICLR 16 Prioritized experience replay
Arxiv 15 Dueling Network Architectures for Deep Reinforcement Learning
HER NIPS 17 Hindsight Experience Replay
C51 ICML 17 A Distributional Perspective on Reinforcement Learning
ICLR 18 Distributed Prioritized Experience Replay
AAAI 18 Rainbow Combining Improvements in Deep Reinforcement Learning
*REINFORCE: ML 1992 Simple statistical gradient-following algorithms for connectionist reinforcement learning
NIPS 2010 Policy gradient methods for reinforcement learning with function approximation
*NIPS 16 Safe and efficient off-policy reinforcement learning
GAE ICLR 16 High-Dimensional Continuous Control Using Generalized Advantage Estimation
ICML 17 Constrained Policy Optimization
ICML 13 Guided Policy Search
TRPO: ICML 15 Trust Region Policy Optimization
PPO: Arxiv 17 Proximal Policy Optimization Algorithms
ICLR 18 The Mirage of Action-Dependent Baselines in Reinforcement Learning,
NIPS 2000 Actor-critic algorithms
*NIPS 2010 Policy gradient methods for reinforcement learning with function approximation
ACC 2010 Model-free reinforcement learning with continuous action in practice
ICML 12 Off-policy actor-critic
SIAM 2013 On actor-critic algorithms
*A2C/A3C ICML 16 Asynchronous methods for deep reinforcement learning
ICLR 16 High-Dimensional Continuous Control Using Generalized Advantage Estimation
ACER ICLR 17 Sample Efficient Actor-Critic with Experience Replay,
Q-Prop ICLR 17 Q-Prop Sample-Efficient Policy Gradient with An Off-Policy Critic
SAC ICML 18 Soft Actor-Critic Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
*DPG ICML 14 Deterministic policy gradient algorithms
*DDPG ICLR 16 Continuous control with deep reinforcement learning
ACKTR NIPS 17 Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
TD3 ICML 18 Addressing Function Approximation Error in Actor-Critic Methods
Arxiv 16 Progressive Neural Networks
ICLR 16 Actor-Mimic Deep Multitask and Transfer Reinforcement Learning
*ICLR 16 RL2 FAST REINFORCEMENT LEARNING VIA SLOW REINFORCEMENT LEARNING
ICML 17 Model-agnostic meta-learning for fast adaptation of deep networks
*SNAIL ICLR 18 A Simple Neural Attentive Meta-Learner
Arxiv 17 Learning to Learn Meta-Critic Networks for Sample Efficient Learning
CogSci 17 Learning to Reinforcement Learn
ICLR 19 Unsupervised Meta-Learning for Reinforcement Learning
NIPS 18 Some Considerations on Learning to Explore via Meta-Reinforcement Learning
AlphaZero Axiv 17 Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
Nature 2017 Mastering the Game of Go without Human Knowledge
I2A NIPS 17 Imagination-Augmented Agents for Deep Reinforcement Learning
MBMF Arxiv 18 Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
MBVE ICML 18 Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning
Arxiv 18 World Models
C51 ICML 17 A Distributional Perspective on Reinforcement Learning
IQN ICML 18 Implicit Quantile Networks for Distributional Reinforcement Learning
QR-DQN AAAI 18 Distributional Reinforcement Learning with Quantile Regression
Dopamine Arxiv 18 Dopamine A Research Framework for Deep Reinforcement Learning
Arxiv 18 IMPALA Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Arxiv 19 Horizon Facebook’s Open Source Applied Reinforcement Learning Platform
PCL NIPS 17 Bridging the Gap Between Value and Policy Based Reinforcement Learning
Trust-PCL ICLR 18 Trust-PCL An Off-Policy Trust Region Method for Continuous Control
Arxiv 18 Equivalence Between Policy Gradients and Soft Q-Learning
IPG Arxiv 17 Interpolated Policy Gradient Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
PGQL ICLR 17 Combining Policy Gradient and Q-learning
Reactor ICLR 18 The Reactor A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning
NIPS 16 Strategic Attentive Writer for Learning Macro-Actions
ICML 17 FeUdal Networks for Hierarchical Reinforcement Learning
NIPS 18 Data-Efficient Hierarchical Reinforcement Learning
RLLab ICML 16 Benchmarking Deep Reinforcement Learning for Continuous Control
Arxiv 17 Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control
WSDM 18 Offline A/B testing for Recommender Systems
ICML 16 Data-Efficient Off-Policy policy evaluation for reinforcement learning
ICML 16 Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
NIPS 17 Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation
NIPS 17 Breaking the Curse of Horizon Infinite-Horizon Off-Policy Estimation
WSDM 19 When People Change their Mind Off-Policy Evaluation in Non-stationary Recommendation Environments
Arixv 19 Off-Policy Evaluation via Off-Policy Classification
On-policy
VPG, TRPO, PPO
Off-policy:
DDPG, TD3, SAC
https://spinningup.openai.com/en/latest/spinningup/keypapers.html
https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html