Introduction

This repository implements NIPS 2017 Value Prediction Network (Oh et al.) in Tensorflow.

@inproceedings{Oh2017VPN,
  title={Value Prediction Network},
  author={Junhyuk Oh and Satinder Singh and Honglak Lee},
  booktitle={NIPS},
  year={2017}
}

Our code is based on OpenAI's A3C implemenation.

Dependencies

Tensorflow
Beutiful Soup
Golang
six (for py2/3 compatibility)
tmux (the start script opens up a tmux session with multiple windows)
htop (shown in one of the tmux windows)
gym
gym[atari]
universe
opencv-python
numpy
scipy

Training

The following command trains a value prediction network (VPN) with plan depth of 3 on stochastic Collect domain:

python train.py --config config/collect_deterministic.xml --branch 4,4,4 --alg VPN

train_vpn script contains commands for reproducing the main result of the paper.

Notes

Tensorboard shows the performance of the epsilon-greedy policy. This is NOT the learning curve in the paper, because epsilon decreases from 1.0 to 0.05 for the first 1e6 steps. Instead, [logdir]/eval.csv shows the performance of the agent using greedy-policy.
Our code supports multi-gpu training. You can specify GPU IDs in --gpu option (e.g., --gpu 0,1,2,3).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
.gitignore		.gitignore
README.md		README.md
a3c.py		a3c.py
async.py		async.py
envs.py		envs.py
maze.py		maze.py
model.py		model.py
q.py		q.py
test.py		test.py
test_vpn		test_vpn
train.py		train.py
train_vpn		train_vpn
util.py		util.py
vpn.py		vpn.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Dependencies

Training

Notes

About

Releases

Packages

Languages

junhyukoh/value-prediction-network

Folders and files

Latest commit

History

Repository files navigation

Introduction

Dependencies

Training

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages