Skip to content

Garyzyr001/rethinking-airl

Repository files navigation

Setup

You can install Python libraries using pip install -r requirements.txt. Note that you need a MuJoCo license. Please follow the instructions in [mujoco-py] (https://github.com/openai/mujoco-py) for help.

Example

Train expert

You can train experts using soft actor-critic (SAC) [1,2].

python train_expert.py --cuda --env_id PointMaze-Right --num_steps 1000000 --seed 0

Its seed is named "seed0-20230805-1354" for instance.

Collect demonstrations

You need to collect demonstrations using the trained expert's weight. Note that --std specifies the standard deviation of the Gaussian noise added to the action, and --p_rand specifies the probability the expert acts randomly. We set std to 0.01 not to collect too similar trajectories.

python collect_demo.py \
    --cuda --env_id PointMaze-Right \
    --weight logs/PointMaze-Right/sac/seed0-20230805-1354/model/step1000000/actor.pth \
    --buffer_size 1000000 --std 0.01 --p_rand 0.0 --seed 0

Train AIRL

Experts' demonstrations are provided in buffers/. You can train SAC-AIRL in the source environment using the demonstrations above. For example,

python train_airl.py \
    --cuda --env_id PointMaze-Right \
    --buffer buffers/PointMaze-Right/size1000000_std0.01_prand0.0.pth \
    --num_steps 1500000 --eval_interval 5000 --rollout_length 64 --seed 0 \
    --algo 'airl_sac' --epoch_disc 1 --epoch_policy 32 --batch_size 64 --cuda_id 0

Its seed is named "seed0-20231226-1522" for instance.

Train Reward Transfer Imitation Learning

You can re-optimize the policy in the new environment via the learned reward in the source environment. For example,

python train_transfer_imitation.py \
    --cuda --env_id PointMaze-Left \
    --num_steps 1000000 --seed 0 \
    --algo 'transfer_sac' --airl_env_id PointMaze-Right \
    --load_seed 0 --load_time 20231226-1522 --cuda_id 0

Train TD3-AIRL

You can train TD3-AIRL in the source environment using the demonstrations in buffers/. For example,

python train_airl.py \
    --cuda --env_id PointMaze-Right \
    --buffer buffers/PointMaze-Right/size1000000_std0.01_prand0.0.pth \
    --num_steps 1500000 --eval_interval 5000 --rollout_length 128 --seed 0 \
    --algo 'airl_td3' --epoch_disc 1 --epoch_policy 16 --batch_size 128 --cuda_id 0

References

[1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290 (2018).

[2] Haarnoja, Tuomas, et al. "Soft actor-critic algorithms and applications." arXiv preprint arXiv:1812.05905 (2018).

About

Code for the paper "Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages