Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Reward Estimation for Variance Reduction in Deep RL

Link to OpenReview submission

Installation

We based our code primarily off of ikostrikov's pytorch-rl repo. Follow installation instructions there.

How to run

To replicate the exact results from the paper you need to run all 270 runs individually with:

python main.py --run-index [0-269]

To run the standard A2C (Baseline) on pong use the following:

python main.py --env-name PongNoFrameskip-v4

To run A2C with the reward prediction auxilliary task (Baseline+) on pong use the following:

python main.py --env-name PongNoFrameskip-v4 --gamma 0.0 0.99

To run A2C with reward prediction (Ours) on pong use the following:

python main.py --env-name PongNoFrameskip-v4 --reward-predictor --gamma 0.0 0.99

Visualization

run visualize.py to visualize performance (requires Visdom)

About

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Resources

License

Releases

No releases published

Packages

No packages published