PDDM

Combined 2 Papers PDDM and C-51(distrib algo) to learn risk-free actions in complex environments

Deep Dynamics Models for Learning Dexterous Manipulation
Anusha Nagabandi, Kurt Konolige, Sergey Levine, Vikash Kumar.

A Distributional Perspective on Reinforcement Learning
Marc G. Bellemare, Will Dabney, Rémi Munos

Please note that this is research code, and as such, is still under construction. This code implements the model-based RL algorithm presented in PDDM and combines it with distributional rewards from C-51.

Contents of this README:

A. Getting Started
B. Quick Overview
C. Train and visualize some tests
D. Run experiments

A. Getting started

1) Mujoco:

Download and install mujoco (v1.5) to ~/.mujoco, following their instructions
(including setting LD_LIBRARY_PATH in your ~/.bashrc file)

2) If using GPU:

Setup Cuda and CUDNN verions based on your system specs.
Recommended: Cuda 8, 9, or 10.
Also, add the following to your ~/.bashrc:

alias MJPL='LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-367/libGL.so'

3) Setup this repo:

Without GPU support:

cd <path_to_pddm>
conda env create -f environment.yml
source activate pddm-env
pip install -e .

Or, for use with GPU:

cd <path_to_pddm>
conda env create -f environment_gpu.yml
source activate pddm-gpu-env
pip install -e .

Notes:
a) For environment_gpu to work, you'll need a working gpu and cuda/cudnn installation first.
b) Depending on your cuda/cudnn versions, you might need to change the tensorflow-gpu version specified in environment_gpu.yml. Suggestions are 1.13.1 for cuda 10, 1.12.0 for cuda 9, or 1.4.1 for cuda 8.
c) Before running any code, type the following into your terminal to activate the conda environment:
source activate pddm-env
d) The MJPL before the python visualization commands below are needed only if working with GPU

B. Quick Overview

The overall procedure that is implemented in this code is the iterative process of learning a dynamics model and then running an MPC controller which uses that model to perform action selection. The code starts by initializing a dataset of randomly collected rollouts (i.e., collected with a random policy), and then iteratively (a) training a model on the dataset and (b) collecting rollouts (using MPC with that model) and aggregating them into the dataset.

The process of (model training + rollout collection) serves as a single iteration in this code. In other words, the rollouts from iter 0 are the result of planning under a model which was trained on randomly collected data, and the model saved at iter 3 is one that has been trained 4 times (on random data at iter 0, and on on-policy data for iters 1,2,3).

To see available parameters to set, see the files in the configs folder, as well as the list of parameters in convert_to_parser_args.py.

D. Run experiments

Train:

python train.py --config ../config/dclaw_turn.txt --output_dir ../output --use_gpu
python train.py --config ../config/baoding.txt --output_dir ../output --use_gpu
python train.py --config ../config/cube.txt --output_dir ../output --use_gpu

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
pddm		pddm
.gitignore		.gitignore
AUTHORS		AUTHORS
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
environment_gpu.yml		environment_gpu.yml
job_150.yaml		job_150.yaml
pddm_requirements.txt		pddm_requirements.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDDM

A. Getting started

1) Mujoco:

2) If using GPU:

3) Setup this repo:

B. Quick Overview

D. Run experiments

About

Releases

Packages

Contributors 2

Languages

License

niagl/pddm

Folders and files

Latest commit

History

Repository files navigation

PDDM

A. Getting started

1) Mujoco:

2) If using GPU:

3) Setup this repo:

B. Quick Overview

D. Run experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages