Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (NeurIPS 2025)

This is the official implementation of B2R, a new method for offline safe RL that fixes a core "symmetry fallacy" in Decision Transformer-style models in Safe-RL. B2R is evaluated on the DSRL benchmark.

The Idea: Stop treating safety and reward symmetrically. Safety is a hard boundary, while reward is a flexible target. B2R implements this insight via Boundary-to-Region supervision, which realigns all cost-to-go signals to the true safety budget. This provides dense, region-wide supervision without changing the model's architecture.

Highlights

Fixes a Core Flaw: Solves the brittle deployment and sparse signal problems caused by the "symmetry fallacy."
Zero Architectural Change: Drastically improves safety without modifying the Transformer model or its objective function.
SOTA Safety & Performance:
- Satisfies safety in 35/38 tasks.
- Achieves the highest reward in 20/38 tasks.
- Consistently safer than CDT while delivering competitive rewards.
Robust: Degrades gracefully under safe data scarcity (down to 5-20%).
Flexible: A single model can be trained on multiple safety budgets simultaneously, enabling it to satisfy different constraints at deployment.

In short: B2R adds robust safety guarantees to the simple and powerful Decision Transformer framework by fundamentally changing how cost signals are supervised.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/Huikangsu/B2R.git
cd B2R

# Install the package
pip install -e .

Data Preparation

Before training, you must download and process the DSRL datasets. We provide a one-click script that automatically downloads the HDF5 datasets from HuggingFace and converts them into the required pickle format.

This script will:
1. Download missing .hdf5 files to data/hdf5_dataset/
2. Process and save .pkl files to data/dsrl_dataset/
python make_dsrl_data.py

Training

python main/B2R_main.py --env CarButton1 --cost_limit 20 --seed 0

🙏 Acknowledgements

DSRL – Benchmark for offline safe RL
Decision Transformer – Transformer-based RL modeling

📄 Citation

@inproceedings{b2r2025,
  title={Boundary-to-Region Supervision for Offline Safe Reinforcement Learning},
  author={Huikang Su, Dengyun Peng, Zifeng Zhuang, YuHan Liu, Qiguang Chen, Donglin Wang, Qinghe Liu},
  booktitle={NeurIPS},
  year={2025}
}

🛠 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
eval		eval
main		main
model		model
trainer		trainer
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
figure_1_score0.97.jpg		figure_1_score0.97.jpg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (NeurIPS 2025)

Highlights

🚀 Quick Start

Installation

Data Preparation

Training

🙏 Acknowledgements

📄 Citation

🛠 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (NeurIPS 2025)

Highlights

🚀 Quick Start

Installation

Data Preparation

Training

🙏 Acknowledgements

📄 Citation

🛠 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages