GitHub - Garyzyr001/rethinking-airl: Code for the paper "Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof"

Setup

You can install Python libraries using pip install -r requirements.txt. Note that you need a MuJoCo license. Please follow the instructions in [mujoco-py] (https://github.com/openai/mujoco-py) for help.

Example

Train expert

You can train experts using soft actor-critic (SAC) [1,2].

python train_expert.py --cuda --env_id PointMaze-Right --num_steps 1000000 --seed 0

Its seed is named "seed0-20230805-1354" for instance.

Collect demonstrations

You need to collect demonstrations using the trained expert's weight. Note that --std specifies the standard deviation of the Gaussian noise added to the action, and --p_rand specifies the probability the expert acts randomly. We set std to 0.01 not to collect too similar trajectories.

python collect_demo.py \
    --cuda --env_id PointMaze-Right \
    --weight logs/PointMaze-Right/sac/seed0-20230805-1354/model/step1000000/actor.pth \
    --buffer_size 1000000 --std 0.01 --p_rand 0.0 --seed 0

Train AIRL

Experts' demonstrations are provided in buffers/. You can train SAC-AIRL in the source environment using the demonstrations above. For example,

python train_airl.py \
    --cuda --env_id PointMaze-Right \
    --buffer buffers/PointMaze-Right/size1000000_std0.01_prand0.0.pth \
    --num_steps 1500000 --eval_interval 5000 --rollout_length 64 --seed 0 \
    --algo 'airl_sac' --epoch_disc 1 --epoch_policy 32 --batch_size 64 --cuda_id 0

Its seed is named "seed0-20231226-1522" for instance.

Train Reward Transfer Imitation Learning

You can re-optimize the policy in the new environment via the learned reward in the source environment. For example,

python train_transfer_imitation.py \
    --cuda --env_id PointMaze-Left \
    --num_steps 1000000 --seed 0 \
    --algo 'transfer_sac' --airl_env_id PointMaze-Right \
    --load_seed 0 --load_time 20231226-1522 --cuda_id 0

Train TD3-AIRL

You can train TD3-AIRL in the source environment using the demonstrations in buffers/. For example,

python train_airl.py \
    --cuda --env_id PointMaze-Right \
    --buffer buffers/PointMaze-Right/size1000000_std0.01_prand0.0.pth \
    --num_steps 1500000 --eval_interval 5000 --rollout_length 128 --seed 0 \
    --algo 'airl_td3' --epoch_disc 1 --epoch_policy 16 --batch_size 128 --cuda_id 0

References

[1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).

[2] Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Example

Train expert

Collect demonstrations

Train AIRL

Train Reward Transfer Imitation Learning

Train TD3-AIRL

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
airl		airl
buffers/PointMaze-Right		buffers/PointMaze-Right
envs_transfer		envs_transfer
logs/PointMaze-Right/sac/seed0-20230805-1354		logs/PointMaze-Right/sac/seed0-20230805-1354
README.md		README.md
collect_demo.py		collect_demo.py
requirements.txt		requirements.txt
train_airl.py		train_airl.py
train_expert.py		train_expert.py
train_transfer_imitation.py		train_transfer_imitation.py
transfer_model_free.py		transfer_model_free.py

Garyzyr001/rethinking-airl

Folders and files

Latest commit

History

Repository files navigation

Setup

Example

Train expert

Collect demonstrations

Train AIRL

Train Reward Transfer Imitation Learning

Train TD3-AIRL

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages