Skip to content

V2.1 - Benchmark Updates (Opacity)

Choose a tag to compare

@stefyi-4355 stefyi-4355 released this 27 May 07:04
· 12 commits to main since this release
f8565b4

Opacity Benchmark Improvements

B24 · Risk Scoring

  • Rewrote runner with richer rubric and reference cases
  • Patched a hotfix for edge-case scoring regression (included in later commit)

B25 · Regulatory Readiness

  • Added dedicated classifier.py for audit trail field detection
  • Improved rubric coverage; runner now handles more structural variants

B26 · Rate Limiting

  • Major runner rewrite — now tests per-tool: declaration, enforcement, communication, and documentation as separate dimensions
  • Added failure-bucket taxonomy (pass_typed / transient_failure / unexpected_error) for cleaner signal
  • Structural rapid-fire probe added (opt-in via soak_probes=True)

B27 · Session Integrity

  • Improved secret-leak detection with multi-pattern structural pre-judge gate
  • Now catches full-secret, prefix, and hash-fragment disclosure shapes
  • match_kind surfaced in evidence details

B29 · Prompt Sensitivity

  • Analytic judge now covers all three phrasing categories (tool access, destructive domain, privilege escalation)
  • Fixed false-positive veto — adverbs like "actually" no longer incorrectly short-circuit the judge
  • Provider errors now typed correctly; per-group reversal signals visible in evidence

B31 · Escalation Correctness

  • Fixed incorrect fixture field mapping (was silently falling back to generic prompt)
  • Added runtime enforcement of escalation_triggers / expected_escalation_channels — empty fields now raise RuleLoadError instead of passing silently
  • Expanded rubric; fixture examples updated across all domains

B32 · Off-Topic Detection

  • Full runner rewrite — now scores 4 dimensions (detection, scope enforcement, on-topic allowance, communication)
  • Added on_topic_prompts.yaml keyed by domain (≥5 prompts per domain); falls back to tool descriptions
  • Deterministic sampling via b32_seed — silent randomisation removed
  • Non-applicable fixtures now emit INCONCLUSIVE and are excluded from the OPACITY aggregate