Meta-Constrained Policy Optimization (Meta-CPO) for Safe and Fast Adaptation for Nonstationary Domains.

This repository is an adaptation of the CPO algorithm, transforming it into a Meta-learning framework. The modification involves leveraging Differentiable Convex Programming to facilitate the relaxation of gradient computations between parameters. The integration of CPO into the meta-learning framework was achieved through the application of the model-free meta-framework introduced by MAML. The primary objective of this algorithm is to undergo testing within the Safety Gymnasium, offering an intuitive experimental platform to showcase its effectiveness in the context of Autonomous Driving tasks. For a detailed theory about achieving safety guarantee, please refer to our paper Constrained Meta-RL with DCO

Citing Meta-CPO

If you find Meta-CPO useful and informative, please cite it in your publications.

@inproceedings{cho2024constrained,
  title={Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee with Differentiable Convex Programming},
  author={Cho, Minjae and Sun, Chuangchuang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={19},
  pages={20975--20983},
  year={2024}
}

Pre-requisites

Constrained Policy Optimization For constrained optimization..
Model-Agnostic Meta-Learning For model-free meta-learning framework..
Safety Gymnasium For experiments...
Differentiable Convex Programming To enable efficient gradient computations between meta and local parameters

Usage

To create a conda environment, run the following commands:

conda create --name myenv python==3.10.*
conda activate myenv

Then, install the required packages using pip:

pip install -r requirements.txt

In the code, we have already implemented testing domains within the safety_gymnasium folder specifically for the Button and Circle tasks to evaluate adaptive performance (it is in safety_gymnasium/tasks/safe_navigation/). If one wishes to evaluate and replicate in different environmental settings, they will need to implement their own custom environments and refer to the details in the Custom Environment section below. To conduct experiments, choose any desired agent and set the environment as either Safety[Agent]Stcircle or Safety[Agent]Stbutton by specifying the env_name in utils/apr_parse.py. Then, execute the following command with appropriate hyperparameter settings:

python3 main.py

Custom Environment

To create your custom environments, we refer to Safety Gym documentation, our code implementation in safety_gym folder "safety_gymnasium/tasks/safe_navigation/, and Table 1 of our paper. In our implementation task_level_0 is used as a fixed environment to evaluate training performance, task_level_1 is to generate environments with stochastic environmental parameters, and task_level_2 is used to generate a meta_testing environment. Other minor changes may be required to adapt to its safety gym package.

Simulation

PB_CPOMeta.mp4

PB_TRPOMeta.mp4

PB_CPO.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
algos		algos
core		core
models		models
safety_gymnasium		safety_gymnasium
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algos

algos

core

core

models

models

safety_gymnasium

safety_gymnasium

utils

utils

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Meta-Constrained Policy Optimization (Meta-CPO) for Safe and Fast Adaptation for Nonstationary Domains.

Citing Meta-CPO

Pre-requisites

Usage

Custom Environment

Simulation

Code Reference

About

Releases

Packages

Languages

License

Mgineer117/Meta-CPO

Folders and files

Latest commit

History

Repository files navigation

Meta-Constrained Policy Optimization (Meta-CPO) for Safe and Fast Adaptation for Nonstationary Domains.

Citing Meta-CPO

Pre-requisites

Usage

Custom Environment

Simulation

Code Reference

About

Resources

License

Stars

Watchers

Forks

Languages