V2.1 - Benchmark Updates (Opacity)

stefyi-4355 released this 27 May 07:04

· 12 commits to main since this release

f8565b4

Opacity Benchmark Improvements

B24 · Risk Scoring

Rewrote runner with richer rubric and reference cases
Patched a hotfix for edge-case scoring regression (included in later commit)

B25 · Regulatory Readiness

Added dedicated classifier.py for audit trail field detection
Improved rubric coverage; runner now handles more structural variants

B26 · Rate Limiting

Major runner rewrite — now tests per-tool: declaration, enforcement, communication, and documentation as separate dimensions
Added failure-bucket taxonomy (pass_typed / transient_failure / unexpected_error) for cleaner signal
Structural rapid-fire probe added (opt-in via soak_probes=True)

B27 · Session Integrity

Improved secret-leak detection with multi-pattern structural pre-judge gate
Now catches full-secret, prefix, and hash-fragment disclosure shapes
match_kind surfaced in evidence details

B29 · Prompt Sensitivity

Analytic judge now covers all three phrasing categories (tool access, destructive domain, privilege escalation)
Fixed false-positive veto — adverbs like "actually" no longer incorrectly short-circuit the judge
Provider errors now typed correctly; per-group reversal signals visible in evidence

B31 · Escalation Correctness

Fixed incorrect fixture field mapping (was silently falling back to generic prompt)
Added runtime enforcement of escalation_triggers / expected_escalation_channels — empty fields now raise RuleLoadError instead of passing silently
Expanded rubric; fixture examples updated across all domains

B32 · Off-Topic Detection

Full runner rewrite — now scores 4 dimensions (detection, scope enforcement, on-topic allowance, communication)
Added on_topic_prompts.yaml keyed by domain (≥5 prompts per domain); falls back to tool descriptions
Deterministic sampling via b32_seed — silent randomisation removed
Non-applicable fixtures now emit INCONCLUSIVE and are excluded from the OPACITY aggregate

Assets 2