Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

License

Notifications You must be signed in to change notification settings

facebookresearch/reward-estimator-iclr

Repository files navigation

Reward Estimation for Variance Reduction in Deep RL

Link to OpenReview submission

Installation

We based our code primarily off of ikostrikov's pytorch-rl repo. Follow installation instructions there.

How to run

To replicate the exact results from the paper you need to run all 270 runs individually with:

python main.py --run-index [0-269]

To run the standard A2C (Baseline) on pong use the following:

python main.py --env-name PongNoFrameskip-v4

To run A2C with the reward prediction auxilliary task (Baseline+) on pong use the following:

python main.py --env-name PongNoFrameskip-v4 --gamma 0.0 0.99

To run A2C with reward prediction (Ours) on pong use the following:

python main.py --env-name PongNoFrameskip-v4 --reward-predictor --gamma 0.0 0.99

Visualization

run visualize.py to visualize performance (requires Visdom)

About

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published