Hierarchical Adversarial Inverse Reinforcement Learning

Codebase for my paper: Hierarchical Adversarial Inverse Reinforcement Learning

Language: Python

The following parts are included:

Benchmarks built with Mujoco, including Hopper, Walker, Ant box-pushing, and Point maze.
An implementation of the hierarchical imitation learning (HIL) algorithm proposed in our paper.
Implementations of the SOTA IL and HIL algorithms as baselines, including GAIL, AIRL, Option-GAIL, Directed-Info GAIL.

The paper is available at: https://arxiv.org/abs/2210.01969

Please cite this paper:

@article{DBLP:journals/corr/abs-2210-01969,
  author       = {Jiayu Chen and
                  Tian Lan and
                  Vaneet Aggarwal},
  title        = {Hierarchical Adversarial Inverse Reinforcement Learning},
  journal      = {CoRR},
  volume       = {abs/2210.01969},
  year         = {2022},
  url          = {https://doi.org/10.48550/arXiv.2210.01969},
  doi          = {10.48550/arXiv.2210.01969}
}

How to config the environments:

on Ubuntu 18.04
python 3.6
pytorch 1.6
tensorboard 2.5
mujoco_py >= 1.5
gym == 0.19.0
matplotlib
tqdm
seaborn
...

Experiments with Hopper

You need to first enter the folder 'HierAIRL_Hopper'.
To run the code with specific algorithms:

# Option-GAIL:
python ./run_baselines.py --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag option-gail-1k --algo option_gail

# GAIL:
python ./run_baselines.py --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag gail-1k --algo gail

# DI-GAIL:
python ./run_baselines.py --env_type mujoco --env_name Hopper-v2 --n_pretrain_epoch 50 --n_demo 1000 --device "cuda:0" --tag d_info_gail-1k --algo DI_gail

# Option-AIRL
python ./run_main.py --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag option-airl-1k --algo option_airl

# H-AIRL
python ./run_main.py --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag hier-airl-1k --algo hier_airl

# H-GAIL
python ./run_main.py --env_type mujoco --env_name Hopper-v2 --n_demo 1000 --device "cuda:0" --tag hier-gail-1k --algo hier_gail

To run the code with the random seed Y, for which we simply choose 0, 1, or 2, please add '--seed=Y' to the back. The same below for other tasks.
For the hyperparameters, please refer to 'HierAIRL_Hopper/default_config.py'. The same below for other tasks.

Experiments with Walker

You need to first enter the folder 'HierAIRL_Walker'.
To run the code with specific algorithms:

# Option-GAIL:
python ./run_baselines.py --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag option-gail-5k --algo option_gail

# GAIL:
python ./run_baselines.py --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag gail-5k --algo gail

# DI-GAIL:
python ./run_baselines.py --env_type mujoco --env_name Walker2d-v2 --n_pretrain_epoch 50 --n_demo 5000 --device "cuda:0" --tag d_info_gail-5k --algo DI_gail

# Option-AIRL
python ./run_main.py --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag option-airl-5k --algo option_airl

# H-AIRL:
python ./run_main.py --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl

# H-GAIL:
python ./run_main.py --env_type mujoco --env_name Walker2d-v2 --n_demo 5000 --device "cuda:0" --tag hier-gail-5k --algo hier_gail

Experiments with AntPusher

You need to first enter the folder 'HierAIRL_Ant'.
To run the code with specific algorithms:

# Option-GAIL:
python ./run_baselines.py --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag option-gail-10k --algo option_gail

# GAIL:
python ./run_baselines.py --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag gail-10k --algo gail

# DI-GAIL:
python ./run_baselines.py --env_type mujoco --env_name AntPusher-v0 --n_pretrain_epoch 100 --n_demo 10000 --device "cuda:0" --tag d_info_gail-10k --algo DI_gail

# Option-AIRL:
python ./run_main.py --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag option-airl-10k --algo option_airl

# H-AIRL:
python ./run_main.py --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag hier-airl-10k --algo hier_airl

# H-GAIL:
python ./run_main.py --env_type mujoco --env_name AntPusher-v0 --n_demo 10000 --device "cuda:0" --tag hier-gail-10k --algo hier_gail

Experiments on Point Room/Corridor

You need to first enter the folder 'HierAIRL_Point'.
To reproduce the results of expert trajectories, please run the following command, where XXX can be Point4Rooms-v1 or PointCorridor-v1. The results will be available in the folder 'result'.

python ./plot_options_exp.py --env_type mujoco --env_name XXX

To reproduce the results of trajectories of the learned agents, please run the following command, where XXX can be Point4Rooms-v1 or PointCorridor-v1. The results will be available in the folder 'result'.

python ./plot_options.py --env_type mujoco --env_name XXX

To reproduce the learned agents with H-AIRL (i.e., the checkpoints), please run the following command, where XXX can be Point4Rooms-v1 or PointCorridor-v1. The results will be available in the folder 'result'.

python ./run_main.py --env_type mujoco --env_name XXX --n_demo 5000--device "cuda:0" --tag hier-airl-5k --algo hier_airl

Transfer Learning Results on Point Room

You need to first enter the folder 'HierAIRL_Point_Room_transfer'.
To run the code with specific algorithms, please run the following commands, where X can be 0, 1, 2.

# Option-GAIL:
python ./run_baselines.py --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag option-gail-5k --algo option_gail --seed X

# GAIL:
python ./run_baselines.py --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag gail-5k --algo gail --seed X

# H-AIRL:
python ./run_main.py --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 0

# H-AIRL initialized with the checkpoint trained in another task:
python ./run_main.py --env_type mujoco --env_name Point4Rooms-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 1

Transfer Learning Results on Point Corridor

You need to first enter the folder 'HierAIRL_Point_Corridor_transfer'.
To run the code with specific algorithms, please run the following commands.

# Option-GAIL:
python ./run_baselines.py --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag option-gail-5k --algo option_gail --seed X

# GAIL:
python ./run_baselines.py --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag gail-5k --algo gail --seed X

# H-AIRL:
python ./run_main.py --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 0

# H-AIRL initialized with the checkpoint trained in another task:
python ./run_main.py --env_type mujoco --env_name PointCorridor-v1 --n_demo 5000 --device "cuda:0" --tag hier-airl-5k --algo hier_airl --seed X --init 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

HierAIRL_Ant

HierAIRL_Ant

HierAIRL_Hopper

HierAIRL_Hopper

HierAIRL_Point

HierAIRL_Point

HierAIRL_Point_Corridor_transfer

HierAIRL_Point_Corridor_transfer

HierAIRL_Point_Room_transfer

HierAIRL_Point_Room_transfer

HierAIRL_Walker

HierAIRL_Walker

ICRA.pdf

ICRA.pdf

README.md

README.md

Repository files navigation

Hierarchical Adversarial Inverse Reinforcement Learning

The paper is available at: https://arxiv.org/abs/2210.01969

How to config the environments:

Experiments with Hopper

Experiments with Walker

Experiments with AntPusher

Experiments on Point Room/Corridor

Transfer Learning Results on Point Room

Transfer Learning Results on Point Corridor

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
HierAIRL_Ant		HierAIRL_Ant
HierAIRL_Hopper		HierAIRL_Hopper
HierAIRL_Point		HierAIRL_Point
HierAIRL_Point_Corridor_transfer		HierAIRL_Point_Corridor_transfer
HierAIRL_Point_Room_transfer		HierAIRL_Point_Room_transfer
HierAIRL_Walker		HierAIRL_Walker
ICRA.pdf		ICRA.pdf
README.md		README.md

LucasCJYSDL/HierAIRL

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Adversarial Inverse Reinforcement Learning

The paper is available at: https://arxiv.org/abs/2210.01969

How to config the environments:

Experiments with Hopper

Experiments with Walker

Experiments with AntPusher

Experiments on Point Room/Corridor

Transfer Learning Results on Point Room

Transfer Learning Results on Point Corridor

About

Resources

Stars

Watchers

Forks

Languages