Scaling-Aware Best-of-N Estimation of Risk
A Python package for predicting large-scale adversarial risk in Large Language Models under Best-of-N sampling.
Paper: https://arxiv.org/pdf/2601.22636
Standard LLM safety evaluations use single-shot (ASR@1) metrics, but real attackers can exploit parallel sampling to repeatedly probe models. SABER provides a principled statistical framework to:
- Predict ASR@N at large budgets from small measurements
- Estimate how many attempts are needed to reach a target success rate
- Quantify uncertainty in adversarial risk predictions
Attack success rates scale according to a power law governed by the Beta distribution of per-query vulnerabilities:
ASR@N ≈ 1 - Γ(α+β)/Γ(β) · N^(-α)
The parameter α controls how fast risk amplifies with more attempts.
pip install saber-riskOr from source:
git clone https://github.com/microsoft/saber
cd saber
pip install -e .import numpy as np
from saber import SABER
# Your jailbreak evaluation data:
# k[i] = number of successful jailbreaks for query i
# n[i] = number of attempts for query i
k = np.array([3, 5, 0, 2, 8, 1, 4, 0, 6, 2])
n = 100 # 100 attempts per query
# Fit and predict
model = SABER()
model.fit(k, n)
# Predict ASR at N=1000 attempts
result = model.predict(N=1000)
print(f"ASR@1000 = {result.asr:.2%}")
# With confidence interval
result = model.predict(N=1000, confidence=0.95)
print(f"ASR@1000 = {result.asr:.2%} [{result.ci_lower:.2%}, {result.ci_upper:.2%}]")from saber import SABER
# 1. Collect jailbreak data
# Run n attempts per query, count successes k
k = [...] # successes per query
n = 100 # trials per query (or array for heterogeneous budgets)
# 2. Fit the model
model = SABER()
model.fit(k, n)
# 3. Predict ASR at target budget
asr_1000 = model.predict(N=1000).asr
# Budget estimation
result = model.budget_for_asr(target=0.95)
print(f"Need {result.budget:.0f} attempts for 95% ASR")
# Fluent API
asr = SABER().fit(k, n).predict(1000).asrFull documentation is available in the docs/ directory. To build:
cd docs
pip install -r requirements.txt
make html- Quick Start - Getting started guide
- API Reference - Complete API documentation
- Advanced Usage - Model selection, scaling curves, low-level API
If you use SABER in your research, please cite:
@misc{feng2026statisticalestimationadversarialrisk,
title={Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling},
author={Mingqian Feng and Xiaodong Liu and Weiwei Yang and Chenliang Xu and Christopher White and Jianfeng Gao},
year={2026},
eprint={2601.22636},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.22636},
}For any questions regarding the package or paper, feel free to reach out to:
- Mingqian Feng - mfeng7@ur.rochester.edu
- Xiaodong Liu - xiaodl@microsoft.com
- Weiwei Yang - weiwei.yang@microsoft.com
MIT License - see LICENSE for details.
