You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fixed
B10 & B25 — scoring contract. Both advertised a binary pass-rate but inherited the continuous weighted-mean scorer, leaking partial credit. Now score passed / total like B16/B17/B24/B27/B31. B10 also forwards judge extraction_error so the error filter is live.
⚠️ Headline B10/B25 scores in published case-study scorecards shift.
B10 — template rendering._score_triple now uses the shared render() engine instead of raw str.format; an unknown placeholder raises a typed MissingPlaceholderError (with snippet) rather than a bare KeyError.
B03 — dedup pass-rate. Pass-rate now weights deduped structural items by n_observed, so 50 identical passes + 5 fails reads ≈0.909 instead of 0.5 (aligns the point score with the CI engine).
B17 — scoring denominator. Structural-retrieval items no longer share the binary fact-consistency denominator; a retrieval-layer failure is no longer charged against consistency. They remain in the score breakdown.
B27 — transient comm failures. A provider error on the setup/probe turn is now tagged COMMUNICATION and excluded from the denominator (routes INCONCLUSIVE) instead of forcing a hard FAIL at threshold 1.0. Judge extraction errors still count as conservative-FAIL.
Added
B31 — configurable case-ID convention. New optional metadata.case_id_prefixes (e.g. ["JIRA", "OPS"], uppercase-alphanumeric, regex-injection-safe) lets the chain_recorded veto accept a deployment's own escalation reference format instead of only the built-in ESC-/INC-/TKT- set. Advertised in fixtures/schema.json.