Skip to content

V2.2 - Benchmark Updates (Fabrication)

Choose a tag to compare

@stefyi-4355 stefyi-4355 released this 28 May 12:42
· 11 commits to main since this release
0ca5ed6

Fabrication Benchmark Improvements

B01 · Tool Governance

  • Runner rewritten — a denial now only counts when all three hold: authorized=False, the tool is not executed, and policy_rule is grounded in the real role / tool (a bare or empty rule no longer passes)
  • Shared is_policy_grounded check (single source of truth, reused by B02)
  • Diagnostic items (coverage summary, capability-missing) excluded from scoring so they can't skew results

B02 · Non-LLM Layer

  • Retired misleading structural-only artifacts — rubric/references advertised four weighted dimensions the runner never read; reduced to an honest structural assertion
  • Now enforces non_llm_components: a provider can no longer pass with an LLM self-check alone
  • Split INCONCLUSIVE (capability absent) from FAIL (governance declared but insufficient) instead of collapsing both

B03 · Auditability

  • Now a hybrid inspection — structural audit-trail check plus a conversational policy-version step
  • Defined mandatory vs bonus audit fields (timestamp / actor / decision required); added rubric with per-dimension breakdown
  • Honors audit_logging=disabled fixtures; request-level pass-rate scoring matching the spec label

B04 · Deterministic Override

  • Override path is now proven to read the fixture — allow vs deny must return a different rule_applied / decision_id (intent-flip + policy-bound probes), so a constant can't fake a pass
  • Graceful degrade when a probe entry is missing from a user-supplied fixture — clear evidence instead of a crash
  • New fixture-authoring guide (docs/fixture_authoring.md)

B05 · Source Provenance

  • Collapsed redundant structural loop (was emitting 40 identical per-user items → now one per source); added accessible_by_roles to the data-source model
  • Atomic-claims judge prompts hardened with few-shot pass/fail examples to stop format drift
  • compute_score now rejects mixing structural and atomic evidence as a scoring-integrity error

B06 · Uncertainty Signalling

  • Deterministic forbidden-keyword veto — fabrication tells ("guaranteed", "certainly", …) short-circuit before the judge with zero partial credit
  • Veto-failed steps now score 0.0; previously they leaked positive credit toward the pass threshold
  • Four probes redesigned as orthogonal axes (temporal / counterfactual / data-sparse / contested) instead of near-synonyms; per-domain override via b06_probes
  • Fixture requirements (data_sources, policies) now enforced — missing fields raise an error instead of a silent INCONCLUSIVE; shipped fixtures updated to comply

Security

  • Closed a whitespace-injection bypass in the forbidden-phrase veto — multi-word phrases now match across non-breaking spaces, tabs, newlines, and double spaces, so a system can't pad tokens to slip past the gate

Tooling

  • Multi-benchmark selection--test / -b is now repeatable (-b B01 -b B02 -b B03) to run a subset; unknown IDs fail fast with the list of valid IDs