UPESI

Code for paper Not Only Domain Randomization: Universal Policy with Embedding System Identification.

Installation

This repo uses the same environment named robolite, which is a modified verison of robosuite to support domain randomisation and inverse kinematics (IK). Our modified environment is also used in another project. To install this environment, just to go to robolite and clone it, then in the cloned folder (./robolite):

pip install -r requirements.txt
pip install -e .

Citation:

Please cite the our paper if you make use of this repo:

@article{ding2021not,
  title={Not Only Domain Randomization: Universal Policy with Embedding System Identification},
  author={Ding, Zihan},
  journal={arXiv preprint arXiv:2109.13438},
  year={2021}
}

Training Procedure

For the universal policy (UP) with embedding system identification (ESI), we use the following commands.

First pretrained models are needed for each environment to rollout samples for further usage (learn the dynamics prediction in our method):

Get pretrained model

Remember to suspend parameter randomization (set randomized_params=None in ./default_params.py) for getting this policy.

python train.py --train --env inverteddoublependulum --process 1 --alg td3

as an example for the InvertedDoublePendulum environment, using TD3 algorithm for training. After training, there will be weights in the data folder. You just need to replace the model path in later scripts with the one you got to make it run.

Go to the directory:

 cd dynamics_predict

Collect training and testing dataset

python train_dynamics.py --collect_train_data --env Env_NAME
python train_dynamics.py --collect_test_data --env Env_NAME

Normailize data Run

 cd ../data/dynamics_data
 jupyter notebook

and open data_process_*ENV_NAME*.ipynb and go through each cell.

Train dynamics embedding (encoder, decoder and dynamics prediction model)

Back to the terminal in dynamics_predict/.

Run the following to lauch training,

python train_dynamics.py --train_embedding --env Env_NAME

and use launch tensorboard --logdir runs to monitor the training process.

Test learned encoder and dynamics predictor Test the preformance of learned encoder and dynamics predictor by applying them in ESI on collected test data:

jupyter notebook

and open test_dynamics_*ENV_NAME*.ipynb and go through each cell, including a Bayesian optimization (BO) process.

Train UP

cd ..
python train.py --train --env *ENV_NAME*dynamics --process NUM

Select the encoder-decoder type in ./environment/*ENV_NAME*dynamics.py to match with the one used in ./dynamics_predict/train_dynamics.py.

Test ESI with UP against other methods

cd dynamics_predict
python compare_methods_*ENV_NAME*.py

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data/dynamics_data		data/dynamics_data
dynamics_predict		dynamics_predict
environment		environment
rl		rl
utils		utils
.gitignore		.gitignore
README.md		README.md
default_params.py		default_params.py
default_params_ppo.py		default_params_ppo.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/dynamics_data

data/dynamics_data

dynamics_predict

dynamics_predict

environment

environment

rl

rl

utils

utils

.gitignore

.gitignore

README.md

README.md

default_params.py

default_params.py

default_params_ppo.py

default_params_ppo.py

train.py

train.py

Repository files navigation

UPESI

Installation

Citation:

Training Procedure

About

Releases

Packages

Languages

quantumiracle/UPESI

Folders and files

Latest commit

History

Repository files navigation

UPESI

Installation

Citation:

Training Procedure

About

Resources

Stars

Watchers

Forks

Languages