This repository contains the code used for the following publications:
- Probabilistic Guarantees for Safe Deep Reinforcement Learning (FORMATS 2020)
- Verifying Reinforcement Learning up to Infinity (IJCAI 2021)
- Verified Probabilistic Policies for Deep Reinforcement Learning (NFM 2022)
These instructions will help with setting up the project
Create a virtual environment with conda:
conda env create -f environment.yml
conda activate safedrl
This will take care of installing all the dependencies needed by python
In addition, download PRISM from the following link: https://github.com/phate09/prism
Ensure you have Gradle installed (https://gradle.org/install/)
Code has been tested using Gradle 5.0 and OpenJDK 11
It seems that according to this post gradle changed standard for defining configurations so gradle will not work for version >=7.0
Before running any code, in a new terminal go to the PRISM project folder and run
gradle run
This will enable the communication channel between PRISM and the rest of the repository
Run the train_pendulum.py
inside agents/dqn
to train the agent on the inverted
pendulum problem and record the location of the saved agent
Run the domain_analysis_sym.py
inside runnables/symbolic/dqn
changing paths to
point to the saved network
download and unzip experiment_collection_final.zip in the 'save' directory
run tensorboard --logdir=./save/experiment_collection_final
(results for the output range analysis experiments are in experiment_collection_ora_final.zip)
run either:
training/tune_train_PPO_bouncing_ball.py
training/tune_train_PPO_car.py
training/tune_train_PPO_cartpole.py
download and unzip pretrained_agents.zip in the 'save' directory
run verification/run_tune_experiments.py
(to monitor the progress of the algorithm
run tensorboard --logdir=./save/experiment_collection_final
)
The results in tensorboard can be filtered using regular expressions (eg. " bouncing_ball.* template: 0") on the search bar on the left:
The name of the experiment contains the name of the problem (bouncing_ball, cartpole, stopping car), the amount of adversarial noise ("eps", only for stopping_car), the time steps length for the dynamics of the system ("tau", only for cartpole) and the choice of restriction in order of complexity (0 being box, 1 being the chosen template, and 2 being octagon).
The table in the paper is filled by using some of the metrics reported in tensorboard:
- max_t: Avg timesteps
- seen: Avg polyhedra
- time_since_restore: Avg clock time (s)