Intro

This repository contains the code to replicate the experiments of the paper "ROSARL: Reward-Only Safe Reinforcement Learning". The paper introduces a new framework for safe RL where the agent learns safe policies solely from scalar rewards using any suitable RL algorithm. This is achieved by replacing the rewards at unsafe terminal states by the minmax penalty, which is the strict upperbound reward whose optimal policy minimises the probability of reaching unsafe states.

Supported RL Algorithms and General Usage

ROSARL is compatible with any RL Algorithm. One can simply estimate the minmax penalty during learning, and replace the environments rewards at unsafe states with it. See learning_minmax_penalty.py for a simple method of estimating the minmax penalty during learning by using the value function being learned by an RL algorithm.

Running the Safety Gym experiments

Installation

These experiments use the safety gym code (but modified to include environments that terminate when reaching unsafe states), and the safety starter agents code (but modified to include TRPO-Minmax, which is just TRPO but modified to use the learned Minmax penalty).

First install openai mujoco. Then install the required packages:

pip install -r requirements.txt

cd safety_ai_gym/safety-gym

pip install -e .

cd safety_ai_gym/safety-starter-agents

pip install -e .

Getting Started

Example Script: To run TRPO-Minmax on the Safexp-PointGoal1-TerminalUnsafe-v0 environment from Safety Gym, using neural networks of size (64,64):

from safe_rl import trpo_minmax
import gym, safety_gym

trpo_minmax(
	env_fn = lambda : gym.make('Safexp-PointGoal1-TerminalUnsafe-v0'),
	ac_kwargs = dict(hidden_sizes=(64,64))
	)

Reproduce Experiments from Paper: To reproduce an experiment from the paper, run:

cd /path/to/safety-starter-agents/scripts
python experiment.py --algo ALGO --task TASK --robot ROBOT --unsafe_terminate UNSAFE_TERMINATE --seed SEED 
	--exp_name EXP_NAME --cpu CPU

where

ALGO is in ['ppo', 'ppo_lagrangian', 'trpo', 'trpo_lagrangian', 'cpo'].
TASK is in ['goal1', 'goal2', 'button1', 'button2', 'push1', 'push2'] .
ROBOT is in ['point', 'car', 'doggo'].
UNSAFE_TERMINATE is in [0,1,2]. 0 is the original safety-gym environments that do not terminate when unsafe states are reached. 2 is the modified safety-gym environments that terminate when unsafe states are reached.
SEED is an integer. In the paper experiments, we used seeds of 0, 10, and 20, but results may not reproduce perfectly deterministically across machines.
CPU is an integer for how many CPUs to parallelize across.

EXP_NAME is an optional argument for the name of the folder where results will be saved. The save folder will be placed in /path/to/safety-starter-agents/data.

Plot Results: Plot results with:

cd /path/to/safety-starter-agents/scripts
python plot.py data/path/to/experiment

Watch Trained Policies: Test policies with:

cd /path/to/safety-starter-agents/scripts
python enjoy.py data/path/to/experiment

TRPO	Success
TRPO	Failure
TRPO Lagrangian	Success
TRPO Lagrangian	Failure
CPO	Success
CPO	Failure
TRPO Minmax (Ours)	Success
TRPO Minmax (Ours)	Failure

Cite the Paper

@article{NangueTasse2023,
    author = {Nangue Tasse, Geraud and Love, Tamlin and Nemecek, Mark and James, Steven and Rosman, Benjamin},
    title = {{ROSARL: Reward-Only Safe Reinforcement Learning}},
    year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
lavaworld		lavaworld
safety_ai_gym		safety_ai_gym
theory		theory
LICENSE		LICENSE
README.md		README.md
learning_minmax_penalty.py		learning_minmax_penalty.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intro

Supported RL Algorithms and General Usage

Running the Safety Gym experiments

Installation

Getting Started

Cite the Paper

About

Releases

Packages

Languages

License

geraudnt/rosarl

Folders and files

Latest commit

History

Repository files navigation

Intro

Supported RL Algorithms and General Usage

Running the Safety Gym experiments

Installation

Getting Started

Cite the Paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages