Skip to content

SOTA algorithms for imitation learning (LfD and LfO) - Ranking algorithms for imitation learning (TMLR 2023)

Notifications You must be signed in to change notification settings

hari-sikchi/rank-game

Repository files navigation

A Ranking Game for Imitation Learning

SOTA algorithms for learning from demonstration + (optionally) learning from preferences

Accepted in Transactions of Machine Learning Research (TMLR) 2023.

Harshit Sikchi,   Akanksha Saran,  Wonjoon Goo,   Scott Niekum

Paper ,   Blog Post


Citing

If you find this code useful, please reference in your paper:

@article{
sikchi2022a,
title={A Ranking Game for Imitation Learning},
author={Harshit Sikchi and Akanksha Saran and Wonjoon Goo and Scott Niekum},
journal={Transactions on Machine Learning Research},
year={2022},
url={https://openreview.net/forum?id=d3rHk4VAf0},
note={}
}

Install

via Conda

conda env create -f environment.yml

Install D4RL from source here: https://github.com/rail-berkeley/d4rl (In setup.py comment out dm_control)

Download expert data

Download and extract expert data and preferences data inside the rank-game/expert_data/ folder from this [link](https://drive.google.com/drive/folders/1KJayG61KqiHqtRbxnUrGPBxTX2oDccSn?usp=sharing)

File Structure

  • rank-game (Core method)
    • Training code (Online Imitation Learning from Observations): rank_game_lfo.py
    • Training code (Online Imitation Learning from Demonstrations): rank_game_lfd.py
    • Ranking Loss (PAL/RAL): reward_agent/ranking_losses.py
    • Policy Agent: policy_agent/sac.py
  • Environments: envs/
  • Configurations: configs/
  • Expert Data: expert_data/ (needs to downloaded from <>)

Instructions

  • All the experiments are to be run under the root folder.
  • Config files in configs/ are used to specify hyperparameters for each environment.
  • Please keep all the other values in yml files consistent with hyperparamters given in paper to reproduce the results in our paper.

Running Experiments

Imitation Learning with rank-game

The following command will reproduce the results in our paper without changing the config:

python rank_game_lfo.py --config=<configs/env_name.yml> --seed=<seed>  --exp_name=<experiment name> --obj=<rank-pal-auto/rank-ral-auto> --regularization=${4} --expert_episodes=${5}

Note that as indicated in the paper, training the reward function until the cross-validation loss saturates also works well but takes longer to train. Hence, our default configs train the reward function for iterations that scale linearly with the preference dataset size.

About

SOTA algorithms for imitation learning (LfD and LfO) - Ranking algorithms for imitation learning (TMLR 2023)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages