Imitation from observation algorithm to train agents to perform tasks using only a limited number of pixel-based expert observations and based on a behavioral learning principle.
An encoder that takes videos of agent trajectories and embeds them in a "behavioral space" is trained using contrastive learning (enforcing successful trajectories to lie close together). We use this to encode N expert videos in a region of the behavioral space in blue. The reward function corresponds to the distance of the agent's trajectory to the set of expert trajectories. As the agent progresses, its current trajectories are incorporated as "negative" examples into the contrastive learning in red.
Demonstration videos. The expert on the left and the IfO agent on the right.
conda env create -f env.yml
conda activate ifobl
- Train expert
python train.py task=reacher_hard exp_group=reacher_hard exp_id=1
Watch training on tensorboard
tensorboard --logdir exp_local
- Generate 5000 expert videos
export PYTHONPATH="${PYTHONPATH}:`pwd`" && python scripts/generate_dmc_video.py --env reacher_hard2 --episode_len 60
Use --num-train
and --num-valid
flags to change respectively the number of training and validation videos to generate.
- Pretrain image and video encoders
python train_cmc.py task=reacher_hard2 exp_id=1
Watch training on tensorboard
tensorboard --logdir cmc_exp_local
- Train agent
python train_rlv2.py task=reacher_hard2
Watch training on tensorboard
tensorboard --logdir rlv2_exp_local
Evaluation videos are generated in rlv2_exp_local/reacher_hard/<exp_id>/train_video
directory.
cmc_model.py
: contains the models, neural networks and losses used to train the trajectory encoderdrqv2.py
: contains the implementations of the policy and q-value functions used to train the agents and expertsrl_model.py
: contains the implementations of the policy and q-value functions used to train the state-based agentsfinal_run
: contains the scripts to train the experts and agents for other tasks such as Walker run, Hopper stand and finger turndmc.py
: contains environment creation functions and environment wrappersscripts/generate_dmc_video.py
: shows how to use trained agents in test time
- We reuse Denis Yarats's DrQv2 code to train our RL agents