Skip to content
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CODE_OF_CONDUCT.md initial commit May 8, 2018
CONTRIBUTING.md initial commit May 8, 2018
LICENSE.md
README.md initial commit May 8, 2018
arguments.py initial commit May 8, 2018
configurations.py
distributions.py initial commit May 8, 2018
envs.py
main.py
model.py
storage.py initial commit May 8, 2018
tabular.ipynb initial commit May 8, 2018
utils.py initial commit May 8, 2018
visualize.py initial commit May 8, 2018

README.md

Reward Estimation for Variance Reduction in Deep RL

Link to OpenReview submission

Installation

We based our code primarily off of ikostrikov's pytorch-rl repo. Follow installation instructions there.

How to run

To replicate the exact results from the paper you need to run all 270 runs individually with:

python main.py --run-index [0-269]

To run the standard A2C (Baseline) on pong use the following:

python main.py --env-name PongNoFrameskip-v4

To run A2C with the reward prediction auxilliary task (Baseline+) on pong use the following:

python main.py --env-name PongNoFrameskip-v4 --gamma 0.0 0.99

To run A2C with reward prediction (Ours) on pong use the following:

python main.py --env-name PongNoFrameskip-v4 --reward-predictor --gamma 0.0 0.99

Visualization

run visualize.py to visualize performance (requires Visdom)

You can’t perform that action at this time.