Verified Safe Reinforcement Learning for Neural Network Dynamic Models (NeurIPS 2024)

Run generate_grid.py to generate the grid for verification.
Train a vanilla controller by setting Line 170 in moving_obs/train.py to False (use_reachability = False), and comment out Line 149 (ppo_agent.load) in train.py.
Train with bounds by setting Line 170 to True and loading the checkpoint from the vanilla controller (ppo_agent.load) .
The controller for each k-th step reachability safety will be stored in the outputs folder. If a controller is not fully verified for a given k, the filename will have the suffix _not_verified.pth.
If you observe a significant decrease in reward or reach the target verification step, stop training. Run check_collide.py to verify the safety of the desired input region.
Based on the results from check_collide.py (modify target_steps), split the input region and continue from Step 3 for each input region cluster (load from a selected checkpoint).
Stop if all input regions are verified safe for the corresponding controller.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fixed_obs		fixed_obs
moving_obs		moving_obs
README.md		README.md

Provide feedback