Skip to content
PyTorch implementation of BCQ for "Off-Policy Deep Reinforcement Learning without Exploration"
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
BCQ.py updated to match latest version of paper Jan 29, 2019
DDPG.py Add files via upload Dec 18, 2018
README.md Update README.md Dec 18, 2018
generate_buffer.py Add files via upload Dec 18, 2018
main.py Add files via upload Dec 18, 2018
train_expert.py Add files via upload Dec 18, 2018
utils.py Add files via upload Dec 18, 2018

README.md

Off-Policy Deep Reinforcement Learning without Exploration

Code corresponding to the paper. If you use our code please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 0.4 and Python 2.7.

Overview

Main algorithm, Batch-Constrained Q-learning (BCQ), can be found at BCQ.py.

If you are interested in reproducing some of the results from the paper, an expert policy (DDPG) needs to be trained by running train_expert.py. This will save the expert model. A new buffer can then be collected by running generate_buffer.py and adjusting the settings in the code or using the default settings.

If you are interested in the standard forward RL tasks with DDPG or TD3, check out my other Github.

You can’t perform that action at this time.