Skip to content

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning. IROS 2023.

License

Notifications You must be signed in to change notification settings

nikeke19/Safe-Mult-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multiplicative Value Function for Safe RL

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning.
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. By splitting responsibilities, we facilitate the learning task leading to increased sample efficiency.

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning
Nick Bührer, Zhejun Zhang, Alexander Liniger, Fisher Yu and Luc Van Gool.

IROS 2023
Project Website with Videos
arXiv Paper

@inproceedings{buehrer2023saferl,
  title     = {A Multiplicative Value Function for Safe and Efficient Reinforcement Learning},
  author    = {B{\"u}hrer, Nick and Zhang, Zhejun and Liniger, Alexander and Yu, Fisher and Van Gool, Luc},
  booktitle = {International Conference on Intelligent Robots and Systems (IROS)},
  year = {2023}
}

Installation

Create the conda environment by running

conda env create -f conda_env.yaml

Note that our implementation is build upon stable-baselines3 1.2.0 (as defined in the yaml) and might not work with newer versions.

Running experiments

All the experiments can be launched from main.py. For the experiment configuration, we use hydra. The following environments are supported for now:

  • Lunar Lander Safe
  • Car Racing Safe
  • Point Robot Navigation

For running PPO Mult V1 in Lunar Lander Safe, simply execute:

python main.py +lunar_lander=ppo_mult_v1

For executing the Lagrangian baseline PPO Lagrange in Car Racing Safe, simply execute:

python main.py +car_racing=ppo_lagrange

All the experiment configs can be found under the experiments folder. In the example of lunar lander, the experiment is under experiments/lunar_lander/ppo_mult_v1.yaml.

Releases

No releases published

Packages

No packages published

Languages