Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Augmenting GAIL with BC for sample efficient imitation learning

Official implemention of the paper Augmenting GAIL with BC for sample efficient imitation learning in PyTorch.

It builds upon the PyTorch implementation of popular RL algorithms repository (readme below).


  1. Install required packages from requirements.txt file.
  2. Install this package with pip install -e.

Reproducing results

  • To reproduce results for GAIL, run the script. Be sure to change the default log and model paths in a2c_ppo_acktr/ first. The general script to run is
./<method>.sh <Env> <steps>

Where keyword method corresponds to the following Experiment/Baseline

method Experiment/Baseline
gail GAIL
baselinesbc BC pretraining + GAIL finetuning
bcgail Our method
redsail RED & SAIL
alphamujoco Ablation on effect of \alpha
bcnogail Ablation on effect of BC + untrained GAIL

Use the following steps for the following mujoco environments:

Environment Steps
Ant-v2 3000000
HalfCheetah-v2 3000000
Hopper-v2 1000000
Walker2d-v2 3000000
Reacher-v2 2000000

If you like this work and want to use it in your research, consider citing our paper (and the repository if you use it - bibtex below):

      title={Augmenting GAIL with BC for sample efficient imitation learning}, 
      author={Rohit Jena and Changliu Liu and Katia Sycara},


Please use hyper parameters from this readme. With other hyper parameters things might not work (it's RL after all)!

This is a PyTorch implementation of

  • Advantage Actor Critic (A2C), a synchronous deterministic version of A3C
  • Proximal Policy Optimization PPO
  • Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation ACKTR
  • Generative Adversarial Imitation Learning GAIL

Also see the OpenAI posts: A2C/ACKTR and PPO for more information.

This implementation is inspired by the OpenAI baselines for A2C, ACKTR and PPO. It uses the same hyper parameters and the model since they were well tuned for Atari games.

Please use this bibtex if you want to cite this repository in your publications:

  author = {Kostrikov, Ilya},
  title = {PyTorch Implementations of Reinforcement Learning Algorithms},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{}},

Supported (and tested) environments (via OpenAI Gym)

I highly recommend PyBullet as a free open source alternative to MuJoCo for continuous control tasks.

All environments are operated using exactly the same Gym interface. See their documentations for a comprehensive list.

To use the DeepMind Control Suite environments, set the flag --env-name dm.<domain_name>.<task_name>, where domain_name and task_name are the name of a domain (e.g. hopper) and a task within that domain (e.g. stand) from the DeepMind Control Suite. Refer to their repo and their tech report for a full list of available domains and tasks. Other than setting the task, the API for interacting with the environment is exactly the same as for all the Gym environments thanks to dm_control2gym.


In order to install requirements, follow:

# PyTorch
conda install pytorch torchvision -c soumith

# Baselines for Atari preprocessing
git clone
cd baselines
pip install -e .

# Other requirements
pip install -r requirements.txt


Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first. Also see a todo list below.

Also I'm searching for volunteers to run all experiments on Atari and MuJoCo (with multiple random seeds).


It's extremely difficult to reproduce results for Reinforcement Learning methods. See "Deep Reinforcement Learning that Matters" for more information. I tried to reproduce OpenAI results as closely as possible. However, majors differences in performance can be caused even by minor differences in TensorFlow and PyTorch libraries.


  • Improve this README file. Rearrange images.
  • Improve performance of KFAC, see for more information
  • Run evaluation for all games and algorithms


In order to visualize the results use visualize.ipynb.




python --env-name "PongNoFrameskip-v4"


python --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.01


python --env-name "PongNoFrameskip-v4" --algo acktr --num-processes 32 --num-steps 20


Please always try to use --use-proper-time-limits flag. It properly handles partial trajectories (see


python --env-name "Reacher-v2" --num-env-steps 1000000


python --env-name "Reacher-v2" --algo ppo --use-gae --log-interval 1 --num-steps 2048 --num-processes 1 --lr 3e-4 --entropy-coef 0 --value-loss-coef 0.5 --ppo-epoch 10 --num-mini-batch 32 --gamma 0.99 --gae-lambda 0.95 --num-env-steps 1000000 --use-linear-lr-decay --use-proper-time-limits


ACKTR requires some modifications to be made specifically for MuJoCo. But at the moment, I want to keep this code as unified as possible. Thus, I'm going for better ways to integrate it into the codebase.


Load a pretrained model from my Google Drive.

Also pretrained models for other games are available on request. Send me an email or create an issue, and I will upload it.

Disclaimer: I might have used different hyper-parameters to train these models.


python --load-dir trained_models/a2c --env-name "PongNoFrameskip-v4"


python --load-dir trained_models/ppo --env-name "Reacher-v2"


Official implementation of the paper `Augmenting GAIL with BC for sample efficient imitation learning` in PyTorch







No releases published


No packages published