calibench

One-line uncertainty audit for ML models.

demo.mp4

Quickstart

pip install calibench

import numpy as np
from calibench import audit

y_true = np.array([1.0, 2.5, 3.1, 4.8])
y_pred = np.array([1.1, 2.3, 3.4, 4.6])
y_std = np.array([0.5, 1.0, 0.8, 1.2])

report = audit(y_true, y_pred, y_std)
print(report)

calibench audit
===============
Verdict: well-calibrated (temperature = 1.03)

                      Before       After
ECE                   0.0320      0.0110
Miscalibration area   0.0180      0.0060
Sharpness             2.4100      2.5500
Coverage @ 90%        0.8900      0.9100
Spearman r            0.7100      0.7300

What it does

Pass predictions, uncertainties, and ground truth. Get back a structured report that tells you whether your uncertainty estimates are trustworthy, where they fail, and how to fix them.

audit() runs five calibration metrics before and after automatic recalibration via temperature scaling, then returns a Report with a verdict, all numbers, and ready-made plots.

report.plot()

report.to_dict()

report.to_markdown()

The Report includes a fitted TemperatureScaler you can apply to new predictions:

calibrated_std = report.calibrator.transform(new_std)

Metrics

Function	What it measures
`miscalibration_area`	Area between the reliability curve and the ideal diagonal
`expected_calibration_error`	Binned ECE across confidence levels
`sharpness`	Average width of prediction intervals
`coverage_at_confidence`	Fraction of true values falling within a given confidence interval
`spearman_correlation`	Rank correlation between predicted uncertainty and absolute error

Calibrators

IsotonicCalibrator fits an isotonic regression to map raw predicted standard deviations to calibrated ones. It is flexible and nonparametric.

from calibench import IsotonicCalibrator

cal = IsotonicCalibrator()
calibrated_std = cal.fit_transform(predicted_std, residuals)

TemperatureScaler learns a single scalar T that rescales all uncertainties. It is simple and robust when the model is uniformly miscalibrated.

from calibench import TemperatureScaler

scaler = TemperatureScaler()
calibrated_std = scaler.fit_transform(predicted_std, residuals)
print(f"Learned temperature = {scaler.temperature:.3f}")

Visualisations

reliability_diagram plots expected vs observed coverage at multiple quantile levels.

calibration_curve plots mean predicted standard deviation against mean absolute residual in bins.

uncertainty_vs_error_scatter produces a scatter plot of predicted uncertainty against absolute error, with a y=x reference line.

All plotting functions return a matplotlib Axes object and accept an optional ax parameter to draw on an existing figure.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
assets		assets
scripts		scripts
src/calibench		src/calibench
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

calibench

Quickstart

What it does

Metrics

Calibrators

Visualisations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

calibench

Quickstart

What it does

Metrics

Calibrators

Visualisations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages