Skip to content

HuikangSu/B2R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Boundary-to-Region Supervision for Offline Safe Reinforcement Learning (NeurIPS 2025)

This is the official implementation of B2R, a new method for offline safe RL that fixes a core "symmetry fallacy" in Decision Transformer-style models in Safe-RL. B2R is evaluated on the DSRL benchmark.

The Idea: Stop treating safety and reward symmetrically. Safety is a hard boundary, while reward is a flexible target. B2R implements this insight via Boundary-to-Region supervision, which realigns all cost-to-go signals to the true safety budget. This provides dense, region-wide supervision without changing the model's architecture.

Figure 0

Highlights

  • Fixes a Core Flaw: Solves the brittle deployment and sparse signal problems caused by the "symmetry fallacy."
  • Zero Architectural Change: Drastically improves safety without modifying the Transformer model or its objective function.
  • SOTA Safety & Performance:
    • Satisfies safety in 35/38 tasks.
    • Achieves the highest reward in 20/38 tasks.
    • Consistently safer than CDT while delivering competitive rewards.
  • Robust: Degrades gracefully under safe data scarcity (down to 5-20%).
  • Flexible: A single model can be trained on multiple safety budgets simultaneously, enabling it to satisfy different constraints at deployment.

In short: B2R adds robust safety guarantees to the simple and powerful Decision Transformer framework by fundamentally changing how cost signals are supervised.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/Huikangsu/B2R.git
cd B2R

# Install the package
pip install -e .

Data Preparation

Before training, you must download and process the DSRL datasets. We provide a one-click script that automatically downloads the HDF5 datasets from HuggingFace and converts them into the required pickle format.

This script will:
1. Download missing .hdf5 files to data/hdf5_dataset/
2. Process and save .pkl files to data/dsrl_dataset/
python make_dsrl_data.py

Training

python main/B2R_main.py --env CarButton1 --cost_limit 20 --seed 0

🙏 Acknowledgements

📄 Citation

@inproceedings{b2r2025,
  title={Boundary-to-Region Supervision for Offline Safe Reinforcement Learning},
  author={Huikang Su, Dengyun Peng, Zifeng Zhuang, YuHan Liu, Qiguang Chen, Donglin Wang, Qinghe Liu},
  booktitle={NeurIPS},
  year={2025}
}

🛠 License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages