Skip to content

Latest commit

 

History

History
536 lines (451 loc) · 53.7 KB

review-papers.md

File metadata and controls

536 lines (451 loc) · 53.7 KB

Deep RL

Jul

Jun

April-May

March 2019


Feb 2019


Jan 2019


2018

  • Accelerated Methods for Deep Reinforcement Learning. arxiv
  • A Deep Reinforcement Learning Chatbot (Short Version). arxiv
  • AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search. arxiv
  • A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. arxiv
  • Composable Deep Reinforcement Learning for Robotic Manipulation. arxiv
  • Cooperative Multi-Agent Reinforcement Learning for Low-Level Wireless Communication. arxiv
  • Deep Reinforcement Fuzzing. arxiv
  • Deep Reinforcement Learning of Cell Movement in the Early Stage of C. elegans Embryogenesis. arxiv
  • Deep Reinforcement Learning For Sequence to Sequence Models. arxiv code
  • Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods. arxiv
  • Deep Reinforcement Learning in Portfolio Management. arxiv code
  • Deep Reinforcement Learning using Capsules in Advanced Game Environments. arxiv
  • Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft. arxiv
  • Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes. arxiv code
  • Diversity is All You Need: Learning Skills without a Reward Function. arxiv
  • Faster Deep Q-learning using Neural Episodic Control. arxiv
  • Feedback-Based Tree Search for Reinforcement Learning. arxiv
  • Feudal Reinforcement Learning for Dialogue Management in Large Domains. arxiv
  • Forward-Backward Reinforcement Learning. arxiv
  • Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies. arxiv
  • IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arxiv
  • Kickstarting Deep Reinforcement Learning. arxiv
  • Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. arxiv
  • Meta Reinforcement Learning with Latent Variable Gaussian Processes. arxiv
  • Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches. arxiv
  • Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations. arxiv
  • Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents. arxiv
  • Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. arxiv
  • Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. arxiv
  • Reinforcement Learning from Imperfect Demonstrations. arxiv
  • Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arxiv
  • RUDDER: Return Decomposition for Delayed Rewards. arxiv code
  • Semi-parametric Topological Memory for Navigation. arxiv tensorflow
  • Shared Autonomy via Deep Reinforcement Learning. arxiv
  • Setting up a Reinforcement Learning Task with a Real-World Robot. arxiv
  • Simple random search provides a competitive approach to reinforcement learning. arxiv code
  • Unsupervised Meta-Learning for Reinforcement Learning. arxiv
  • Using reinforcement learning to learn how to play text-based games. arxiv

2017

  • A Deep Reinforcement Learning Chatbot. arxiv
  • A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. arxiv code
  • A Deep Reinforced Model for Abstractive Summarization. arxiv
  • A Distributional Perspective on Reinforcement Learning. arxiv
  • A Laplacian Framework for Option Discovery in Reinforcement Learning. arxiv
  • Boosting the Actor with Dual Critic. arxiv
  • Bridging the Gap Between Value and Policy Based Reinforcement Learning. arxiv
  • Car Racing using Reinforcement Learning. pdf
  • Cold-Start Reinforcement Learning with Softmax Policy Gradients. arxiv
  • Curiosity-driven Exploration by Self-supervised Prediction. arxiv tensorflow
  • Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arxiv code
  • DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning. arxiv code
  • Deep Reinforcement Learning: An Overview. arxiv
  • Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. arxiv code
  • Deep reinforcement learning from human preferences. arxiv
  • Deep Reinforcement Learning that Matters. arxiv code
  • Device Placement Optimization with Reinforcement Learning. arxiv
  • Distributional Reinforcement Learning with Quantile Regression. arxiv
  • End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning. arxiv
  • Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arxiv
  • Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning. arxiv
  • Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arxiv
  • Learning how to Active Learn: A Deep Reinforcement Learning Approach. arxiv tensorflow
  • Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning. arxiv tensorflow
  • MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. arxiv code
  • Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arxiv
  • Micro-Objective Learning : Accelerating Deep Reinforcement Learning through the Discovery of Continuous Subgoals. arxiv
  • Neural Architecture Search with Reinforcement Learning. arxiv tensorflow
  • Neural Map: Structured Memory for Deep Reinforcement Learning. arxiv
  • Observational Learning by Reinforcement Learning. arxiv
  • Overcoming Exploration in Reinforcement Learning with Demonstrations. arxiv
  • Practical Network Blocks Design with Q-Learning. arxiv
  • Rainbow: Combining Improvements in Deep Reinforcement Learning. arxiv
  • Reinforcement Learning for Architecture Search by Network Transformation. arxiv code
  • Reinforcement Learning via Recurrent Convolutional Neural Networks. arxiv code
  • Reinforcement Learning with a Corrupted Reward Channel. arxiv
  • Reinforcement Learning with Deep Energy-Based Policies. arxiv code
  • Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads. arxiv
  • Robust Deep Reinforcement Learning with Adversarial Attacks. arxiv
  • Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arxiv
  • Shallow Updates for Deep Reinforcement Learning. arxiv code
  • Stochastic Neural Networks for Hierarchical Reinforcement Learning. pdf code
  • Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing. arxiv code
  • Task-Oriented Query Reformulation with Reinforcement Learning. arxiv code
  • Teaching a Machine to Read Maps with Deep Reinforcement Learning. arxiv code
  • TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning. arxiv code
  • Value Prediction Network. arxiv
  • Variational Deep Q Network. arxiv
  • Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation.arxiv
  • Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. arxiv

2016

  • Asynchronous Methods for Deep Reinforcement Learning. [arxiv] ⭐
  • Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al., ICLR. [arxiv]
  • A New Softmax Operator for Reinforcement Learning.[url]
  • Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al., ICML. [arxiv]
  • Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al., ICLR. [arxiv]
  • Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al., ICLR. [arxiv]
  • Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al., arXiv. [url]
  • Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML. [arxiv]
  • Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al., ICML. [arxiv]
  • Continuous control with deep reinforcement learning. [arxiv] ⭐
  • Deep Successor Reinforcement Learning. [arxiv]
  • Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop. [arxiv]
  • Deep Exploration via Bootstrapped DQN. [arxiv] ⭐
  • Deep Reinforcement Learning for Dialogue Generation. [arxiv] tensorflow
  • Deep Reinforcement Learning in Parameterized Action Space. [arxiv] ⭐
  • Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments.[url]
  • Designing Neural Network Architectures using Reinforcement Learning. arxiv code
  • Dialogue manager domain adaptation using Gaussian process reinforcement learning. [arxiv]
  • End-to-End Reinforcement Learning of Dialogue Agents for Information Access. [arxiv]
  • Generating Text with Deep Reinforcement Learning. [arxiv]
  • Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al., arXiv. [arxiv]
  • Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al., arXiv. [arxiv]
  • Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv. [arxiv]
  • Hierarchical Object Detection with Deep Reinforcement Learning. [arxiv]
  • High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al., ICLR. [arxiv]
  • Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI. [arxiv]
  • Interactive Spoken Content Retrieval by Deep Reinforcement Learning. [arxiv]
  • Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al., arXiv. [url]
  • Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al., arXiv. [url]
  • Learning to compose words into sentences with reinforcement learning. [url]
  • Loss is its own Reward: Self-Supervision for Reinforcement Learning.[arxiv]
  • Model-Free Episodic Control. [arxiv]
  • Mastering the game of Go with deep neural networks and tree search. [nature] ⭐
  • MazeBase: A Sandbox for Learning from Games .[arxiv]
  • Neural Architecture Search with Reinforcement Learning. [pdf]
  • Neural Combinatorial Optimization with Reinforcement Learning. [arxiv]
  • Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning. [url]
  • Online Sequence-to-Sequence Active Learning for Open-Domain Dialogue Generation. arXiv. [arxiv]
  • Policy Distillation, A. A. Rusu et at., ICLR. [arxiv]
  • Prioritized Experience Replay. [arxiv] ⭐
  • Reinforcement Learning Using Quantum Boltzmann Machines. [arxiv]
  • Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.[arxiv]
  • Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. [arxiv]
  • Sample-efficient Deep Reinforcement Learning for Dialog Control. [url]
  • Self-Correcting Models for Model-Based Reinforcement Learning.[url]
  • Unifying Count-Based Exploration and Intrinsic Motivation. [arxiv]
  • Value Iteration Networks. [arxiv]

2015

  • ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources. arxiv
  • Action-Conditional Video Prediction using Deep Networks in Atari Games. arxiv
  • Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. arxiv
  • [DDPG] Continuous control with deep reinforcement learning. arxiv
  • [NAF] Continuous Deep Q-Learning with Model-based Acceleration. arxiv
  • Dueling Network Architectures for Deep Reinforcement Learning. arxiv
  • Deep Reinforcement Learning with an Action Space Defined by Natural Language.arxiv
  • Deep Reinforcement Learning with Double Q-learning. arxiv
  • Deep Recurrent Q-Learning for Partially Observable MDPs. arxiv
  • DeepMPC: Learning Deep Latent Features for Model Predictive Control. pdf
  • Deterministic Policy Gradient Algorithms. pdf
  • Dueling Network Architectures for Deep Reinforcement Learning. arxiv
  • End-to-End Training of Deep Visuomotor Policies. arxiv
  • Giraffe: Using Deep Reinforcement Learning to Play Chess. arxiv
  • Generating Text with Deep Reinforcement Learning. arxiv
  • How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies. arxiv
  • Human-level control through deep reinforcement learning. nature
  • Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arxiv
  • Learning Simple Algorithms from Examples. arxiv
  • Language Understanding for Text-based Games Using Deep Reinforcement Learning. pdf
  • Learning Continuous Control Policies by Stochastic Value Gradients.pdf
  • Multiagent Cooperation and Competition with Deep Reinforcement Learning. arxiv
  • Maximum Entropy Deep Inverse Reinforcement Learning. arxiv
  • Massively Parallel Methods for Deep Reinforcement Learning. pdf] ⭐
  • On Learning to Think- Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arxiv
  • Playing Atari with Deep Reinforcement Learning. arxiv
  • Recurrent Reinforcement Learning: A Hybrid Approach. arxiv
  • Strategic Dialogue Management via Deep Reinforcement Learning. arxiv
  • Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control. arxiv
  • Trust Region Policy Optimization. pdf
  • Universal Value Function Approximators. pdf
  • Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. arxiv

2014

  • Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.[url]

2013


Surveys

Foundational Papers

  • Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper] (discusses issues in RL such as the "credit assignment problem")
  • An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper] (earliest publication on temporal-difference (TD) learning rule)

Methods

  • Dynamic Programming (DP):

    • Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
  • Monte Carlo:

    • Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
    • Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
  • Temporal-Difference:

    • Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
  • Q-Learning (Off-policy TD algorithm):

    • Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
  • Sarsa (On-policy TD algorithm):

    • On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
    • Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
  • R-Learning (learning of relative values)

  • Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)

    • Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
    • Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
  • Policy Search / Policy Gradient

    • Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
    • Natural Actor-Critic, ECML, 2005. [Paper]
    • Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
    • Relative Entropy Policy Search, AAAI, 2010. [Paper]
    • Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
    • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
    • PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
    • Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
    • Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]
  • Hierarchical RL

    • Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
    • Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]
  • Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)

    • Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
    • Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
    • End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
    • Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
    • Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
    • Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]

Game Playing

Traditional Games

  • Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
  • Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
  • Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]

Computer Games

Robotics

  • Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
  • Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
  • Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
  • Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
  • Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
  • Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
  • Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
  • Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]

Control

  • An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
  • Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2001) [Paper]

Operations Research

  • Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
  • Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

  • Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002) [Paper]

Blogs