A Ranking Game for Imitation Learning

SOTA algorithms for learning from demonstration + (optionally) learning from preferences

Accepted in Transactions of Machine Learning Research (TMLR) 2023.

Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum

Citing

If you find this code useful, please reference in your paper:

@article{
sikchi2022a,
title={A Ranking Game for Imitation Learning},
author={Harshit Sikchi and Akanksha Saran and Wonjoon Goo and Scott Niekum},
journal={Transactions on Machine Learning Research},
year={2022},
url={https://openreview.net/forum?id=d3rHk4VAf0},
note={}
}

Install

via Conda

conda env create -f environment.yml

Install D4RL from source here: https://github.com/rail-berkeley/d4rl (In setup.py comment out dm_control)

Download expert data

Download and extract expert data and preferences data inside the rank-game/expert_data/ folder from this [link](https://drive.google.com/drive/folders/1KJayG61KqiHqtRbxnUrGPBxTX2oDccSn?usp=sharing)

File Structure

rank-game (Core method)
- Training code (Online Imitation Learning from Observations): rank_game_lfo.py
- Training code (Online Imitation Learning from Demonstrations): rank_game_lfd.py
- Ranking Loss (PAL/RAL): reward_agent/ranking_losses.py
- Policy Agent: policy_agent/sac.py
Environments: envs/
Configurations: configs/
Expert Data: expert_data/ (needs to downloaded from <>)

Instructions

All the experiments are to be run under the root folder.
Config files in configs/ are used to specify hyperparameters for each environment.
Please keep all the other values in yml files consistent with hyperparamters given in paper to reproduce the results in our paper.

Running Experiments

Imitation Learning with rank-game

The following command will reproduce the results in our paper without changing the config:

python rank_game_lfo.py --config=<configs/env_name.yml> --seed=<seed>  --exp_name=<experiment name> --obj=<rank-pal-auto/rank-ral-auto> --regularization=${4} --expert_episodes=${5}

Note that as indicated in the paper, training the reward function until the cross-validation loss saturates also works well but takes longer to train. Hence, our default configs train the reward function for iterations that scale linearly with the preference dataset size.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
envs		envs
images		images
learning_curves/LfO		learning_curves/LfO
logging_utils		logging_utils
policy_agent		policy_agent
reward_agent		reward_agent
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
rank_game_lfo.py		rank_game_lfo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Ranking Game for Imitation Learning

SOTA algorithms for learning from demonstration + (optionally) learning from preferences

Citing

Install

via Conda

Download expert data

File Structure

Instructions

Running Experiments

Imitation Learning with rank-game

About

Releases

Packages

Languages

hari-sikchi/rank-game

Folders and files

Latest commit

History

Repository files navigation

A Ranking Game for Imitation Learning

SOTA algorithms for learning from demonstration + (optionally) learning from preferences

Citing

Install

via Conda

Download expert data

File Structure

Instructions

Running Experiments

Imitation Learning with rank-game

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages