OUTPACE

This is a Pytorch implementation of OUTPACE from our paper: "Outcome-directed Reinforcement Learning by Uncertainty & Temporal Distance-Aware Curriculum Goal Generation" (ICLR 2023 Spotlight)

By Daesol Cho*, Seungjae Lee* (*Equally contributed), and H. Jin Kim

A link to our paper can be found on arXiv, and our project website can be found on here.

Setup Instructions

Create a conda environment:

conda env create -f outpace.yml
conda activate outpace

Add the necessary paths:

conda develop meta-nml

Install subfolder dependencies:

cd meta-nml && pip install -r requirements.txt
cd ..
chmod +x install.sh
./install.sh

Install pytorch (use tested on pytorch 1.12.1 with CUDA 11.3)
Set config_path: see config/paths/template.yaml
To run robot arm environment install metaworld:

pip install git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld

Usage

Training and Evaluation

PointUMaze-v0

CUDA_VISIBLE_DEVICES=0 python outpace_train.py env=PointUMaze-v0 aim_disc_replay_buffer_capacity=10000 save_buffer=true adam_eps=0.01

PointNMaze-v0

CUDA_VISIBLE_DEVICES=0 python outpace_train.py env=PointNMaze-v0 aim_disc_replay_buffer_capacity=10000 adam_eps=0.01

PointSpiralMaze-v0

CUDA_VISIBLE_DEVICES=0 python outpace_train.py env=PointSpiralMaze-v0 aim_disc_replay_buffer_capacity=20000 save_buffer=true aim_discriminator_cfg.lambda_coef=50

AntMazeSmall-v0

CUDA_VISIBLE_DEVICES=0 python outpace_train.py env=AntMazeSmall-v0 aim_disc_replay_buffer_capacity=50000

sawyer_peg_pick_and_place

CUDA_VISIBLE_DEVICES=0 python outpace_train.py env=sawyer_peg_pick_and_place aim_disc_replay_buffer_capacity=30000 normalize_nml_obs=true normalize_f_obs=false normalize_rl_obs=false adam_eps=0.01

sawyer_peg_push

CUDA_VISIBLE_DEVICES=0 python outpace_train.py env=sawyer_peg_push aim_disc_replay_buffer_capacity=30000 normalize_nml_obs=true normalize_f_obs=false normalize_rl_obs=false adam_eps=0.01 hgg_kwargs.match_sampler_kwargs.hgg_L=0.5

Our code sourced and modified from official implementation of MURAL, AIM, and HGG Algorithm. Also, we utilize mujoco-maze and metaworld to validate our proposed method.

Citation

If you use this repo in your research, please consider citing the paper as follows.

@inproceedings{choandlee2023outcome,
  title={Outcome-directed Reinforcement Learning by Uncertainty \& Temporal Distance-Aware Curriculum Goal Generation},
  author={Cho, Daesol and Lee, Seungjae and Kim, H Jin},
  booktitle={Proceedings of International Conference on Learning Representations},
  pages={},
  year={2023},
  organization={}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
envs		envs
hgg		hgg
meta-nml		meta-nml
mujoco_maze		mujoco_maze
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env_utils.py		env_utils.py
install.sh		install.sh
logger.py		logger.py
outpace.yml		outpace.yml
outpace_core.py		outpace_core.py
outpace_requirements.txt		outpace_requirements.txt
outpace_train.py		outpace_train.py
outpacesac.py		outpacesac.py
replay_buffer.py		replay_buffer.py
utils.py		utils.py
video.py		video.py

License

jayLEE0301/outpace_official

Folders and files

Latest commit

History

Repository files navigation

OUTPACE

Setup Instructions

Usage

Training and Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages