This is the official implementation of B2R, a new method for offline safe RL that fixes a core "symmetry fallacy" in Decision Transformer-style models in Safe-RL. B2R is evaluated on the DSRL benchmark.
The Idea: Stop treating safety and reward symmetrically. Safety is a hard boundary, while reward is a flexible target. B2R implements this insight via Boundary-to-Region supervision, which realigns all cost-to-go signals to the true safety budget. This provides dense, region-wide supervision without changing the model's architecture.
- Fixes a Core Flaw: Solves the brittle deployment and sparse signal problems caused by the "symmetry fallacy."
- Zero Architectural Change: Drastically improves safety without modifying the Transformer model or its objective function.
- SOTA Safety & Performance:
- Satisfies safety in 35/38 tasks.
- Achieves the highest reward in 20/38 tasks.
- Consistently safer than CDT while delivering competitive rewards.
- Robust: Degrades gracefully under safe data scarcity (down to 5-20%).
- Flexible: A single model can be trained on multiple safety budgets simultaneously, enabling it to satisfy different constraints at deployment.
In short: B2R adds robust safety guarantees to the simple and powerful Decision Transformer framework by fundamentally changing how cost signals are supervised.
# Clone the repository
git clone https://github.com/Huikangsu/B2R.git
cd B2R
# Install the package
pip install -e .Before training, you must download and process the DSRL datasets. We provide a one-click script that automatically downloads the HDF5 datasets from HuggingFace and converts them into the required pickle format.
This script will:
1. Download missing .hdf5 files to data/hdf5_dataset/
2. Process and save .pkl files to data/dsrl_dataset/
python make_dsrl_data.py
python main/B2R_main.py --env CarButton1 --cost_limit 20 --seed 0- DSRL – Benchmark for offline safe RL
- Decision Transformer – Transformer-based RL modeling
@inproceedings{b2r2025,
title={Boundary-to-Region Supervision for Offline Safe Reinforcement Learning},
author={Huikang Su, Dengyun Peng, Zifeng Zhuang, YuHan Liu, Qiguang Chen, Donglin Wang, Qinghe Liu},
booktitle={NeurIPS},
year={2025}
}This project is licensed under the MIT License.
