Skip to content

v7.7.16

Choose a tag to compare

@github-actions github-actions released this 30 May 21:54
· 305 commits to main since this release

styxx 7.7.16 — abstain_on_confab: the closed-loop detect-and-abstain primitive

Cumulative release: ships the single-pass confab gates (7.7.14) and the retrieval arm / two-signal audit_claim (7.7.15) alongside the new detect-and-abstain loop (7.7.16).

pip install -U styxx

New in 7.7.16 — styxx.abstain_on_confab + AbstainDecision

from styxx import single_pass_confab, calibrate_single_pass, abstain_on_confab

cal = calibrate_single_pass(confab_entropies, correct_entropies)   # per-model threshold
score = single_pass_confab(first_token_logits, entropy_threshold=cal.entropy_threshold)
decision = abstain_on_confab(model_answer, score)
decision.answer       # the answer, OR "I'm not sure." if the confab gate fired
decision.abstained    # bool

The deployable, framework-free form of the closed-loop honesty primitive — gate a candidate answer through a CALIBRATED confab detector and return an honest abstention when it fires.

The detector is load-bearing — and the API enforces it. A pre-registered white-box experiment (FINDING_honesty_knob_2026_05_30, SURVIVED, n=32/24 powered) showed the underlying mechanistic abstention intervention has no intrinsic selectivity — applied ungated it dissolves correct answers as readily as confabulations (raw selectivity −0.08; a blanket lobotomy). Only the calibrated detector (gate AUC 0.924) makes abstention targeted: the gated loop catches-and-abstains 0.75 of confabs while sparing 0.875 of correct answers. So abstain_on_confab refuses an uncalibrated score (ValueError) — you must calibrate_single_pass first. Detection is not optional diagnosis; it is the prerequisite for safe intervention.

Scope: abstention, not correction (repair-to-truth is a closed negative — depth-steering is correctness-inert). A fail-safe, not a fix.

Also included

  • 7.7.15retrieval_check + audit_claim(verify_retrieval=True): a two-signal gate (model-internal confab detector for unstable errors + external-grounding arm for stable factual misconceptions).
  • 7.7.14single_pass_confab / span_confab / calibrate_single_pass: confab detection from ONE forward pass (~10× cheaper than N=10 resampling), white-box first-token + closed-model span variants.

Full Changelog: v7.7.13...v7.7.16