Skip to content

pcchenxi/LAPO-offlienRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latent-variable Advantage-weighted Policy Optimization for Offline Reinforcement Learning

This is a pytorch implementation of paper Latent-variable advantage-weighted policy optimization for offline reinforcement learning (LAPO) on D4RL dataset.

LAPO-framwork

Requirements

Scripts for D4RL dataset

Maze2d: maze2d-umaze/medium/large-v1

$ python main_d4rl.py --env_name maze2d-umaze-v1 --kl_beta 0.3 --plot

Antmaze: antmaze-umaze/medium/large-diverse-v1

$ python main_d4rl.py --env_name antmaze-umaze-diverse-v1 --doubleq_min 0.7 --plot

Mujoco locomotion: hopper/walker2d/halfcheetah-random/medium/expert-v2

$ python main_d4rl.py --env_name hopper-random-v2

Kitchen: kitchen-complete/partial/mixed-v0

$ python main_d4rl.py --env_name kitchen-complete-v0

Expected results

You will get following results using --seed: 123(red) 456(green) 789(blue)

LAPO-framwork

Citing

If you find this code useful, please cite our paper:

@article{chen2022latent,
  title={Latent-Variable Advantage-Weighted Policy Optimization for Offline RL},
  author={Chen, Xi and Ghadirzadeh, Ali and Yu, Tianhe and Gao, Yuan and Wang, Jianhao and Li, Wenzhe and Liang, Bin and Finn, Chelsea and Zhang, Chongjie},
  journal={arXiv preprint arXiv:2203.08949},
  year={2022}
}

Note

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages