Deep RL

Jul

Benchmarking Model-Based Reinforcement Learning
Learning World Graphs to Accelerate Hierarchical Reinforcement Learning
Perspective Taking in Deep Reinforcement Learning Agents
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
Dynamic Face Video Segmentation via Reinforcement Learning
Striving for Simplicity in Off-policy Deep Reinforcement Learning
Intrinsic Motivation Driven Intuitive Physics Learning using Deep Reinforcement Learning with Intrinsic Reward Normalization
A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning
Attentive Multi-Task Deep Reinforcement Learning
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning
Google Research Football: A Novel Reinforcement Learning Environment
Deep Reinforcement Learning in Financial Markets
Dynamic Input for Deep Reinforcement Learning in Autonomous Driving
Characterizing Attacks on Deep Reinforcement Learning
Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey
VRLS: A Unified Reinforcement Learning Scheduler for Vehicle-to-Vehicle Communications
Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges
Arena: a toolkit for Multi-Agent Reinforcement Learning
GPU-Accelerated Atari Emulation for Reinforcement Learning
Photonic architecture for reinforcement learning

Jun

Towards Empathic Deep Q-Learning
Ranking Policy Gradient
Hyp-RL : Hyperparameter Optimization by Reinforcement Learning
Modern Deep Reinforcement Learning Algorithms
A Framework for Automatic Question Generation from Text using Deep Reinforcement Learning
Deep Reinforcement Learning for Unmanned Aerial Vehicle-Assisted Vehicular Networks
Is multiagent deep reinforcement learning the answer or the question? A brief survey
Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement Learning
Cooperative Lane Changing via Deep Reinforcement Learning
A Hierarchical Architecture for Sequential Decision-Making in Autonomous Driving using Deep Reinforcement Learning
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
Language as an Abstraction for Hierarchical Deep Reinforcement Learning
Autonomous Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking
A Survey of Reinforcement Learning Informed by Natural Language
Load Balancing for Ultra-Dense Networks: A Deep Reinforcement Learning Based Approach
Deep Reinforcement Learning Architecture for Continuous Power Allocation in High Throughput Satellites
Harnessing Reinforcement Learning for Neural Motion Planning

April-May

Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
On the Generalization Gap in Reparameterizable Reinforcement Learning
Targeted Attacks on Deep Reinforcement Learning Agents through Adversarial Observations
Inverse Reinforcement Learning in Contextual MDPs
Teaching on a Budget in Multi-Agent Deep Reinforcement Learning
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning
Generation of Policy-Level Explanations for Reinforcement Learning
A Control-Model-Based Approach for Reinforcement Learning
Interactive Teaching Algorithms for Inverse Reinforcement Learning
Snooping Attacks on Deep Reinforcement Learning

March 2019

IRLAS: Inverse Reinforcement Learning for Architecture Search
Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning
M3RL: Mind-aware Multi-agent Management Reinforcement Learning
Concurrent Meta Reinforcement Learning
Horizon: Facebook's Open Source Applied Reinforcement Learning Platform
Using Natural Language for Reward Shaping in Reinforcement Learning
Model-Based Reinforcement Learning for Atari
RLOC: Neurobiologically Inspired Hierarchical Reinforcement Learning Algorithm for Continuous Control of Nonlinear Dynamical Systems
Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning
Hacking Google reCAPTCHA v3 using Reinforcement Learning
Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2
Deep Reinforcement Learning with Feedback-based Exploration
Deep Reinforcement Learning for Autonomous Driving
Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention
Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction
Explaining Reinforcement Learning to Mere Mortals: An Empirical Study
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
On the use of Deep Autoencoders for Efficient Embedded Reinforcement Learning
Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Feb 2019

Distributional reinforcement learning with linear function approximation
Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance
Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning
Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications
Reinforcement Learning for Optimal Load Distribution Sequencing in Resource-Sharing System
Learning to Schedule Communication in Multi-agent Reinforcement Learning
On Reinforcement Learning for Full-length Game of StarCraft
Implicit Policy for Reinforcement Learning
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Visual Rationalizations in Deep Reinforcement Learning for Atari Games
Statistics and Samples in Distributional Reinforcement Learning
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
Investigating Generalisation in Continuous Deep Reinforcement Learning
Model-Free Adaptive Optimal Control of Episodic Fixed-Horizon Manufacturing Processes using Reinforcement Learning
Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Towards the Next Generation Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking
Parenting: Safe Reinforcement Learning from Human Input
Reinforcement Learning Without Backpropagation or a Clock
Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning
A new Potential-Based Reward Shaping for Reinforcement Learning Agent
How to Combine Tree-Search Methods in Reinforcement Learning
Unsupervised Basis Function Adaptation for Reinforcement Learning
Communication Topologies Between Learning Agents in Deep Reinforcement Learning
Logically-Constrained Reinforcement Learning
Hyperbolic Embeddings for Learning Options in Hierarchical Reinforcement Learning
ProLoNets: Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning
A Framework for Automated Cellular Network Tuning with Reinforcement Learning
Deep Reinforcement Learning for Search, Recommendation, and Online Advertising: A Survey
The Value Function Polytope in Reinforcement Learning
Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations
Deep Reinforcement Learning Based High-level Driving Behavior Decision-making Model in Heterogeneous Traffic
Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning
Verifiably Safe Off-Model Reinforcement Learning
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Exploration versus exploitation in reinforcement learning: a stochastic control approach
ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning
End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning
WiseMove: A Framework for Safe Deep Reinforcement Learning for Autonomous Driving
Emergence of Hierarchy via Reinforcement Learning Using a Multiple Timescale Stochastic RNN

Jan 2019

Federated Reinforcement Learning
Verifiable Reinforcement Learning via Policy Extraction
QFlow: A Reinforcement Learning Approach to High QoE Video Streaming over Wireless Networks
Complementary reinforcement learning towards explainable agents
The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition
Hierarchical Reinforcement Learning for Multi-agent MOBA Game
Reinforcement Learning of Markov Decision Processes with Peak Constraints
Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target
Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
A Short Survey on Probabilistic Reinforcement Learning
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
Recurrent Control Nets for Deep Reinforcement Learning
Amplifying the Imitation Effect for Reinforcement Learning of UCAV's Mission Execution
Multi-agent Reinforcement Learning Embedded Game for the Optimization of Building Energy Control and Power System Planning
Representation Learning on Graphs: A Reinforcement Learning Application
Evolutionarily-Curated Curriculum Learning for Deep Reinforcement Learning Agents
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
Exploration versus exploitation in reinforcement learning: a stochastic control approach
Multi-Agent Deep Reinforcement Learning for Dynamic Power Allocation in Wireless Networks
Energy-Efficient Thermal Comfort Control in Smart Buildings via Deep Reinforcement Learning
Relative Importance Sampling For Off-Policy Actor-Critic in Deep Reinforcement Learning
AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning
Improving Coordination in Multi-Agent Deep Reinforcement Learning through Memory-driven Communication
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement learning
Accelerated Methods for Deep Reinforcement Learning
Motion Perception in Reinforcement Learning with Dynamic Objects
A New Tensioning Method using Deep Reinforcement Learning for Surgical Pattern Cutting
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Deterministic Implementations for Reproducibility in Deep Reinforcement Learning
Uncertainty-Based Out-of-Distribution Detection in Deep Reinforcement Learning
Risk-Aware Active Inverse Reinforcement Learning
A dual mode adaptive basal-bolus advisor based on reinforcement learning
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning
Deep Reinforcement Learning for Imbalanced Classification
Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization
Finite-Sample Analyses for Fully Decentralized Multi-Agent Reinforcement Learning
Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning
Floyd-Warshall Reinforcement Learning: Learning from Past Experiences to Reach New Goals
A Critical Investigation of Deep Reinforcement Learning for Navigation
Accelerating Goal-Directed Reinforcement Learning by Model Characterization
Machine Teaching in Hierarchical Genetic Reinforcement Learning: Curriculum Design of Reward Functions for Swarm Shepherding
Reinforcement Learning Using Quantum Boltzmann Machines
Communication-Efficient Distributed Reinforcement Learning
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation
Human-Like Autonomous Car-Following Model with Deep Reinforcement Learning
Adversarial Text Generation Without Reinforcement Learning
End-to-End Video Captioning with Multitask Reinforcement Learning

2018

Accelerated Methods for Deep Reinforcement Learning. arxiv
A Deep Reinforcement Learning Chatbot (Short Version). arxiv
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search. arxiv ⭐
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. arxiv
Composable Deep Reinforcement Learning for Robotic Manipulation. arxiv
Cooperative Multi-Agent Reinforcement Learning for Low-Level Wireless Communication. arxiv
Deep Reinforcement Fuzzing. arxiv
Deep Reinforcement Learning of Cell Movement in the Early Stage of C. elegans Embryogenesis. arxiv
Deep Reinforcement Learning For Sequence to Sequence Models. arxiv code
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods. arxiv
Deep Reinforcement Learning in Portfolio Management. arxiv code
Deep Reinforcement Learning using Capsules in Advanced Game Environments. arxiv
Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft. arxiv
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes. arxiv code
Diversity is All You Need: Learning Skills without a Reward Function. arxiv
Faster Deep Q-learning using Neural Episodic Control. arxiv
Feedback-Based Tree Search for Reinforcement Learning. arxiv
Feudal Reinforcement Learning for Dialogue Management in Large Domains. arxiv
Forward-Backward Reinforcement Learning. arxiv
Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies. arxiv
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arxiv
Kickstarting Deep Reinforcement Learning. arxiv
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. arxiv
Meta Reinforcement Learning with Latent Variable Gaussian Processes. arxiv
Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches. arxiv
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations. arxiv
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents. arxiv
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. arxiv
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. arxiv
Reinforcement Learning from Imperfect Demonstrations. arxiv
Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arxiv
RUDDER: Return Decomposition for Delayed Rewards. arxiv code
Semi-parametric Topological Memory for Navigation. arxiv tensorflow
Shared Autonomy via Deep Reinforcement Learning. arxiv
Setting up a Reinforcement Learning Task with a Real-World Robot. arxiv
Simple random search provides a competitive approach to reinforcement learning. arxiv code
Unsupervised Meta-Learning for Reinforcement Learning. arxiv
Using reinforcement learning to learn how to play text-based games. arxiv

2017

A Deep Reinforcement Learning Chatbot. arxiv
A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. arxiv code
A Deep Reinforced Model for Abstractive Summarization. arxiv
A Distributional Perspective on Reinforcement Learning. arxiv
A Laplacian Framework for Option Discovery in Reinforcement Learning. arxiv ⭐
Boosting the Actor with Dual Critic. arxiv
Bridging the Gap Between Value and Policy Based Reinforcement Learning. arxiv
Car Racing using Reinforcement Learning. pdf
Cold-Start Reinforcement Learning with Softmax Policy Gradients. arxiv
Curiosity-driven Exploration by Self-supervised Prediction. arxiv tensorflow
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arxiv code
DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning. arxiv code
Deep Reinforcement Learning: An Overview. arxiv
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. arxiv code
Deep reinforcement learning from human preferences. arxiv
Deep Reinforcement Learning that Matters. arxiv code
Device Placement Optimization with Reinforcement Learning. arxiv
Distributional Reinforcement Learning with Quantile Regression. arxiv
End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning. arxiv
Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arxiv
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning. arxiv
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arxiv
Learning how to Active Learn: A Deep Reinforcement Learning Approach. arxiv tensorflow
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning. arxiv tensorflow
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. arxiv code ⭐
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arxiv
Micro-Objective Learning : Accelerating Deep Reinforcement Learning through the Discovery of Continuous Subgoals. arxiv
Neural Architecture Search with Reinforcement Learning. arxiv tensorflow
Neural Map: Structured Memory for Deep Reinforcement Learning. arxiv
Observational Learning by Reinforcement Learning. arxiv
Overcoming Exploration in Reinforcement Learning with Demonstrations. arxiv
Practical Network Blocks Design with Q-Learning. arxiv
Rainbow: Combining Improvements in Deep Reinforcement Learning. arxiv
Reinforcement Learning for Architecture Search by Network Transformation. arxiv code
Reinforcement Learning via Recurrent Convolutional Neural Networks. arxiv code
Reinforcement Learning with a Corrupted Reward Channel. arxiv ⭐
Reinforcement Learning with Deep Energy-Based Policies. arxiv code
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads. arxiv
Robust Deep Reinforcement Learning with Adversarial Attacks. arxiv
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arxiv
Shallow Updates for Deep Reinforcement Learning. arxiv code
Stochastic Neural Networks for Hierarchical Reinforcement Learning. pdf code
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing. arxiv code
Task-Oriented Query Reformulation with Reinforcement Learning. arxiv code
Teaching a Machine to Read Maps with Deep Reinforcement Learning. arxiv code
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning. arxiv code
Value Prediction Network. arxiv
Variational Deep Q Network. arxiv
Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation.arxiv
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. arxiv

2016

Asynchronous Methods for Deep Reinforcement Learning. [arxiv] ⭐
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al., ICLR. [arxiv]
A New Softmax Operator for Reinforcement Learning.[url]
Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al., ICML. [arxiv]
Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al., ICLR. [arxiv]
Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al., ICLR. [arxiv]
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al., arXiv. [url]
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML. [arxiv]
Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al., ICML. [arxiv]
Continuous control with deep reinforcement learning. [arxiv] ⭐
Deep Successor Reinforcement Learning. [arxiv]
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop. [arxiv]
Deep Exploration via Bootstrapped DQN. [arxiv] ⭐
Deep Reinforcement Learning for Dialogue Generation. [arxiv] tensorflow
Deep Reinforcement Learning in Parameterized Action Space. [arxiv] ⭐
Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments.[url]
Designing Neural Network Architectures using Reinforcement Learning. arxiv code
Dialogue manager domain adaptation using Gaussian process reinforcement learning. [arxiv]
End-to-End Reinforcement Learning of Dialogue Agents for Information Access. [arxiv]
Generating Text with Deep Reinforcement Learning. [arxiv]
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al., arXiv. [arxiv]
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al., arXiv. [arxiv]
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv. [arxiv]
Hierarchical Object Detection with Deep Reinforcement Learning. [arxiv]
High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al., ICLR. [arxiv]
Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI. [arxiv]
Interactive Spoken Content Retrieval by Deep Reinforcement Learning. [arxiv]
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al., arXiv. [url]
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al., arXiv. [url]
Learning to compose words into sentences with reinforcement learning. [url]
Loss is its own Reward: Self-Supervision for Reinforcement Learning.[arxiv]
Model-Free Episodic Control. [arxiv]
Mastering the game of Go with deep neural networks and tree search. [nature] ⭐
MazeBase: A Sandbox for Learning from Games .[arxiv]
Neural Architecture Search with Reinforcement Learning. [pdf]
Neural Combinatorial Optimization with Reinforcement Learning. [arxiv]
Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning. [url]
Online Sequence-to-Sequence Active Learning for Open-Domain Dialogue Generation. arXiv. [arxiv]
Policy Distillation, A. A. Rusu et at., ICLR. [arxiv]
Prioritized Experience Replay. [arxiv] ⭐
Reinforcement Learning Using Quantum Boltzmann Machines. [arxiv]
Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.[arxiv]
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. [arxiv]
Sample-efficient Deep Reinforcement Learning for Dialog Control. [url]
Self-Correcting Models for Model-Based Reinforcement Learning.[url]
Unifying Count-Based Exploration and Intrinsic Motivation. [arxiv]
Value Iteration Networks. [arxiv]

2015

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources. arxiv
Action-Conditional Video Prediction using Deep Networks in Atari Games. arxiv ⭐
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. arxiv ⭐
[DDPG] Continuous control with deep reinforcement learning. arxiv ⭐
[NAF] Continuous Deep Q-Learning with Model-based Acceleration. arxiv ⭐
Dueling Network Architectures for Deep Reinforcement Learning. arxiv ⭐
Deep Reinforcement Learning with an Action Space Defined by Natural Language.arxiv
Deep Reinforcement Learning with Double Q-learning. arxiv ⭐
Deep Recurrent Q-Learning for Partially Observable MDPs. arxiv ⭐
DeepMPC: Learning Deep Latent Features for Model Predictive Control. pdf
Deterministic Policy Gradient Algorithms. pdf ⭐
Dueling Network Architectures for Deep Reinforcement Learning. arxiv
End-to-End Training of Deep Visuomotor Policies. arxiv ⭐
Giraffe: Using Deep Reinforcement Learning to Play Chess. arxiv
Generating Text with Deep Reinforcement Learning. arxiv
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies. arxiv
Human-level control through deep reinforcement learning. nature ⭐
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arxiv ⭐
Learning Simple Algorithms from Examples. arxiv
Language Understanding for Text-based Games Using Deep Reinforcement Learning. pdf ⭐
Learning Continuous Control Policies by Stochastic Value Gradients.pdf ⭐
Multiagent Cooperation and Competition with Deep Reinforcement Learning. arxiv
Maximum Entropy Deep Inverse Reinforcement Learning. arxiv
Massively Parallel Methods for Deep Reinforcement Learning. pdf] ⭐
On Learning to Think- Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arxiv
Playing Atari with Deep Reinforcement Learning. arxiv
Recurrent Reinforcement Learning: A Hybrid Approach. arxiv
Strategic Dialogue Management via Deep Reinforcement Learning. arxiv
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control. arxiv
Trust Region Policy Optimization. pdf ⭐
Universal Value Function Approximators. pdf
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. arxiv

2014

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.[url]

2013

Evolving large-scale neural networks for vision-based reinforcement learning. [idsia] ⭐
Playing Atari with Deep Reinforcement Learning. [toronto] ⭐

Surveys

Reinforcement Learning: A Survey, JAIR, 1996. [arxiv]
A Tutorial Survey of Reinforcement Learning, Sadhana, 1994. [Paper]
Reinforcement Learning in Robotics, A Survey, IJRR, 2013. [Paper]
A Brief Survey of Deep Reinforcement Learning 2017 [arxiv]
A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation. 2018 [arxiv]
Universal Reinforcement Learning Algorithms: Survey and Experiments. 2017 [arxiv]
Bayesian Reinforcement Learning: A Survey. 2016
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Benchmarking Reinforcement Learning Algorithms on Real-World Robots
DEEP REINFORCEMENT LEARNING: AN OVERVIEW

Foundational Papers

Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [Paper] (discusses issues in RL such as the "credit assignment problem")
An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [Paper] (earliest publication on temporal-difference (TD) learning rule)

Methods

Dynamic Programming (DP):
- Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis]
Monte Carlo:
- Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper]
- Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper]
Temporal-Difference:
- Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper]
Q-Learning (Off-policy TD algorithm):
- Learning from Delayed Rewards, Cambridge, 1989. [Thesis]
Sarsa (On-policy TD algorithm):
- On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report]
- Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper]
R-Learning (learning of relative values)
- A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar]
Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)
- Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper]
- Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code]
Policy Search / Policy Gradient
- Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper]
- Natural Actor-Critic, ECML, 2005. [Paper]
- Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper]
- Relative Entropy Policy Search, AAAI, 2010. [Paper]
- Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper]
- Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper]
- PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper]
- Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper]
- Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]
Hierarchical RL
- Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper]
- Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper]
Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL)
- Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper]
- Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper]
- End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv]
- Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv]
- Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv]
- Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]

Game Playing

Traditional Games

Backgammon - "TD-Gammon" game play using TD(λ) (Tesauro, ACM 1995) [Paper]
Chess - "KnightCap" program using TD(λ) (Baxter, arXiv 1999) [arXiv]
Chess - Giraffe: Using deep reinforcement learning to play chess (Lai, arXiv 2015) [arXiv]

Computer Games

Human-level Control through Deep Reinforcement Learning (Mnih, Nature 2015) [Paper] [Code] [Video]
Flappy Bird Reinforcement Learning [Video]
MarI/O - learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Stanley, Evolutionary Computation 2002) [Paper] [Video]

Robotics

Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper]
Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper] [Video]
Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper] [Video]
Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper] [Video]
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper]
Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper]
Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper] [Video]
Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]

Control

An Application of Reinforcement Learning to Aerobatic Helicopter Flight (Abbeel, NIPS 2006) [Paper] [Video]
Autonomous helicopter control using Reinforcement Learning Policy Search Methods (Bagnell, ICRA 2001) [Paper]

Operations Research

Scaling Average-reward Reinforcement Learning for Product Delivery (Proper, AAAI 2004) [Paper]
Cross Channel Optimized Marketing by Reinforcement Learning (Abe, KDD 2004) [Paper]

Human Computer Interaction

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System (Singh, JAIR 2002) [Paper]

Blogs

Reinforcement Learning (RL)
Simple Reinforcement Learning with Tensorflow Part 0-8
Deep_reinforcement_learning_Course
Introduction to Various Reinforcement Learning Algorithms. Part I (Q-Learning, SARSA, DQN, DDPG)
Machine Learning for Humans, Part 5: Reinforcement Learning
Deep reinforcement learning: where to start
Learning Policies For Learning Policies — Meta Reinforcement Learning (RL²) in Tensorflow
Introduction to Various Reinforcement Learning Algorithms. Part II (TRPO, PPO)
reinforcementlearning.ai-depot.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review-papers.md

review-papers.md

Deep RL

Surveys

Foundational Papers

Methods

Game Playing

Robotics

Control

Operations Research

Human Computer Interaction

Blogs

Files

review-papers.md

Latest commit

History

review-papers.md

File metadata and controls

Deep RL

Surveys

Foundational Papers

Methods

Game Playing

Robotics

Control

Operations Research

Human Computer Interaction

Blogs