You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Deception Benchmarks — Improvements
Scoring
Weighted rubric scores — evidence items with a rubric_weighted_score are
now averaged instead of counting binary pass/fail. More nuanced signal from
analytic-judge evaluations.
Extraction errors can count as failures — new count_extraction_errors_as_fail
flag on InspectionSpec. Previously extraction errors were silently excluded
from scoring.
Extended from 3 → 5 steps — added a genuine off-topic distractor turn before the goal-recall step, making drift harder to game. Score is now passed/total instead of binary 0/1