Skip to content
NIPS 2017 Value Prediction Network
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
config Initial commit Oct 14, 2017
.gitignore Initial commit Oct 14, 2017
README.md Update README.md Oct 25, 2017
a3c.py Initial commit Oct 14, 2017
async.py Initial commit Oct 14, 2017
envs.py Initial commit Oct 14, 2017
maze.py Initial commit Oct 14, 2017
model.py Fix initialization Oct 29, 2017
q.py
test.py Add test code/script Oct 29, 2017
test_vpn Add test code/script Oct 29, 2017
train.py change default options Oct 14, 2017
train_vpn
util.py Initial commit Oct 14, 2017
vpn.py Initial commit Oct 14, 2017
worker.py Remove unused code Oct 29, 2017

README.md

Introduction

This repository implements NIPS 2017 Value Prediction Network (Oh et al.) in Tensorflow.

@inproceedings{Oh2017VPN,
  title={Value Prediction Network},
  author={Junhyuk Oh and Satinder Singh and Honglak Lee},
  booktitle={NIPS},
  year={2017}
}

Our code is based on OpenAI's A3C implemenation.

Dependencies

Training

The following command trains a value prediction network (VPN) with plan depth of 3 on stochastic Collect domain:

python train.py --config config/collect_deterministic.xml --branch 4,4,4 --alg VPN

train_vpn script contains commands for reproducing the main result of the paper.

Notes

  • Tensorboard shows the performance of the epsilon-greedy policy. This is NOT the learning curve in the paper, because epsilon decreases from 1.0 to 0.05 for the first 1e6 steps. Instead, [logdir]/eval.csv shows the performance of the agent using greedy-policy.
  • Our code supports multi-gpu training. You can specify GPU IDs in --gpu option (e.g., --gpu 0,1,2,3).
You can’t perform that action at this time.