Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

Code for AAAI'24 paper "Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations".

Installation

First install MuJoCo. For tasks differ in reward functions (Cheetah, Ant), install MuJoCo150 or plus. Set LD_LIBRARY_PATH to point to both the MuJoCo binaries (/$HOME/.mujoco/mujoco200/bin) as well as the gpu drivers.

Then create conda environment by:

conda env create -f environment.yaml

For Hopper and Walker environments, MuJoCo131 is required. Simply install it the same way as MuJoCo200. To switch between different MuJoCo versions:

export MUJOCO_PY_MJPRO_PATH=~/.mujoco/mjpro${VERSION_NUM}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mjpro${VERSION_NUM}/bin

Data Generation

Example of training behavior policies on multiple tasks:

python policy_train.py ./configs/ant-dir.json --gpu 0

It will run SAC to train a policy on each task, you can modify self.work_dir of Workspace in rlkit/torch/sac/pytorch_sac/train.py to specify the directory to save the trained policies.

Generate trajectories from trained policies:

python policy_eavl.py --config ./configs/ant-dir.json

Data will be saved in self.work_dir/gentle_data/$env_name/$goal_idx{i}

Training GENTLE

The configration files to run GENTLE is in ./configs. For example, to train GENTLE on Ant-Dir, first you need to pretrain the dynamics model:

python pretrain_dynamics.py ./configs/ant-dir.json

Then run:

python train_gentle.py ./configs/ant-dir.json

Logs will be written to ./logs/ant-dir/gentle/

Reference

@inproceedings{gentle,
  author={Renzhe Zhou, Chen-Xiao Gao, Zongzhang Zhang, Yang Yu},
  title={Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations},
  booktitle={AAAI Conference on Artificial Intelligence (AAAI)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
rand_param_envs		rand_param_envs
rlkit		rlkit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
policy_eval.py		policy_eval.py
policy_train.py		policy_train.py
pretrain_dynamics.py		pretrain_dynamics.py
train_gentle.py		train_gentle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

rand_param_envs

rand_param_envs

rlkit

rlkit

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

environment.yaml

environment.yaml

policy_eval.py

policy_eval.py

policy_train.py

policy_train.py

pretrain_dynamics.py

pretrain_dynamics.py

train_gentle.py

train_gentle.py

Repository files navigation

Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

Installation

Data Generation

Training GENTLE

Reference

About

Releases

Packages

Languages

License

LAMDA-RL/GENTLE

Folders and files

Latest commit

History

Repository files navigation

Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

Installation

Data Generation

Training GENTLE

Reference

About

Resources

License

Stars

Watchers

Forks

Languages