Beyond-Demonstration

CSE-598 Perception in Robotics Project ASU

Implementation of T-REX and D-REX IRL algorithm that learns a reward capturing the expert's intention from the demonstrations. The learnt reward can be used to train a policy that performs better than the demonstrations. Based on stable_baselines3 and imitation packages.

Given a dataset of ranked sub-optimal demonstrations, a state dependent reward function can be recovered by training a neural network optimized on the preference of better demonstrations over the worst. The Bradley-Terry and Luce-Shephard model of choice is used to train such reward models from preferences. Ranked trajectories can be generated by injecting different levels of noise into a BC policy trained on the demonstrations.

Example of 3 ranked trajectories generated by noise injection.

High Noise	Mid Noise	No Noise

Least Preferred	More Preferred	Most Preferred

Changes to Baseline

We implement the IRL algorithm using the tools available in the imitation library. Notable changes from the paper's implementation are

Luce preference with discount_factor, noise_prob, clipped reward differences (ideas from DRLHP): Comes with imitation library
Mixed sampling: New preference dataset generated every epoch
Fixed horizon rollouts for ranked_trajectories: Horizon length is 1000 steps
Input normalization in reward network (similar to batch normalization): Comes with imitation library
Single reward function No ensemble
Reward scaling with tanh, optimized with AdamW: Scaled reward improves stability
Entropy regularized actor critic policy for BC: Comes with imitation library

Other possible improvements

Use custom rewards (rnn, attention)
A better preference loss (aLRP)

Results

Better than demonstrator performance was observed for HalfCheetah-v3 environment. For Hopper-v3, we achieved equal to demonstration performance.

Demonstration	Learned Policy

Ground Truth reward to Predicted reward corelation (unscaled)

Hopper	HalfCheetah

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
checkpoints		checkpoints
demonstrations		demonstrations
drex		drex
dump		dump
imitation		imitation
media		media
.gitattributes		.gitattributes
.gitignore		.gitignore
Final_Project_Report.pdf		Final_Project_Report.pdf
README.md		README.md
__init__.py		__init__.py
demo_check.py		demo_check.py
expert_demonstrations_record.py		expert_demonstrations_record.py
render_expert.py		render_expert.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpoints

checkpoints

demonstrations

demonstrations

drex

drex

dump

dump

imitation

imitation

media

media

.gitattributes

.gitattributes

.gitignore

.gitignore

Final_Project_Report.pdf

Final_Project_Report.pdf

README.md

README.md

init.py

init.py

demo_check.py

demo_check.py

expert_demonstrations_record.py

expert_demonstrations_record.py

render_expert.py

render_expert.py

Repository files navigation

Beyond-Demonstration

Changes to Baseline

Results

About

Releases

Packages

Languages

prabinrath/Beyond-Demonstration

Folders and files

Latest commit

History

Repository files navigation

Beyond-Demonstration

Changes to Baseline

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages