Latent-variable Advantage-weighted Policy Optimization for Offline Reinforcement Learning

This is a pytorch implementation of paper Latent-variable advantage-weighted policy optimization for offline reinforcement learning (LAPO) on D4RL dataset.

Requirements

python=3.7.11
Datasets for Deep Data-Driven Reinforcement Learning (D4RL)
torch=1.10.0

Scripts for D4RL dataset

Maze2d: maze2d-umaze/medium/large-v1

$ python main_d4rl.py --env_name maze2d-umaze-v1 --kl_beta 0.3 --plot

Antmaze: antmaze-umaze/medium/large-diverse-v1

$ python main_d4rl.py --env_name antmaze-umaze-diverse-v1 --doubleq_min 0.7 --plot

Mujoco locomotion: hopper/walker2d/halfcheetah-random/medium/expert-v2

$ python main_d4rl.py --env_name hopper-random-v2

Kitchen: kitchen-complete/partial/mixed-v0

$ python main_d4rl.py --env_name kitchen-complete-v0

Expected results

You will get following results using --seed: 123(red) 456(green) 789(blue)

Citing

If you find this code useful, please cite our paper:

@article{chen2022latent,
  title={Latent-Variable Advantage-Weighted Policy Optimization for Offline RL},
  author={Chen, Xi and Ghadirzadeh, Ali and Yu, Tianhe and Gao, Yuan and Wang, Jianhao and Li, Wenzhe and Liang, Bin and Finn, Chelsea and Zhang, Chongjie},
  journal={arXiv preprint arXiv:2203.08949},
  year={2022}
}

Note

If you have any questions, please contact me: pcchenxi@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
algos		algos
figs		figs
LICENSE		LICENSE
README.md		README.md
logger.py		logger.py
main_d4rl.py		main_d4rl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algos

algos

figs

figs

LICENSE

LICENSE

README.md

README.md

logger.py

logger.py

main_d4rl.py

main_d4rl.py

Repository files navigation

Latent-variable Advantage-weighted Policy Optimization for Offline Reinforcement Learning

Requirements

Scripts for D4RL dataset

Expected results

Citing

Note

About

Releases

Packages

Languages

License

pcchenxi/LAPO-offlienRL

Folders and files

Latest commit

History

Repository files navigation

Latent-variable Advantage-weighted Policy Optimization for Offline Reinforcement Learning

Requirements

Scripts for D4RL dataset

Expected results

Citing

Note

About

Resources

License

Stars

Watchers

Forks

Languages