multivon-eval 0.11.1 — robustness hardening: honest UNKNOWN over confident wrong
·
25 commits
to main
since this release
Robustness hardening from an adversarial audit that ran the staleness /
provenance / scanner / bootstrap surface against malformed inputs, symlink
tricks, unicode edge cases, and concurrent writers. The theme: every failure
the audit found was a place where the tool either crashed with a raw
traceback or — worse — reported something false. Both violate the same
contract: honest UNKNOWN over confident wrong.
Fixed
- A syntax-broken file no longer reads as REMOVED. The scanner silently
returned zero records for files it couldn't parse (syntax errors, non-UTF8
encodings), so staleness reported every baselined site in them as REMOVED —
and--fail-on removedfailed CI with a misleading verdict. Unscannable
files now surface as a distinct UNSCANNABLE tier ("file exists but could
not be parsed — verdict unknown, NOT removed"), a warning line names each
file with its reason in all three renderers, JSON gainsskipped_files,
and--fail-on removedno longer trips. Skipped files are a report-time
concept — never written into baselines. - Symlinks resolving outside the repo root are skipped, not recorded —
previously they wrote machine-specific absolute paths into the baseline,
producing false REMOVED+ADDED churn on every other checkout. - Fingerprints are NFC-normalized (
SCANNER_VERSION3 → 4) — composed
vs decomposed unicode ("é" as one codepoint vs e+combining-accent) is an
editor/OS artifact, not a prompt change; it previously fingerprinted as
drift. Old baselines print the standing "rescan recommended" warning. match-statement capture patterns disqualify module constants —
case PROMPT:rebinds via a str field the scanner didn't see, letting a
rebound constant read as static (a false "static" poisons every verdict).- Clean errors instead of tracebacks:
staleness stampon malformed
JSONL (file:line in the message),staleness baselineon a nonexistent
path or missing--outdir,bootstrapon a malformed traces file, and
--site …#xxwith a non-integer position — all exit 2 with actionable
messages.multivon-eval … | headno longer dumps a BrokenPipeError. attribution scan /typo/pathexits 2 instead of a green "No SDK
prompt call sites found" — a typo'd CI path looked permanently passing.- The documented 10K trace cap is now enforced with a loud truncation
warning, and a malformed final trace line (the normal shape of an
interrupted streamed dump) skips with a warning while malformed interior
lines stay a hard error. - Bootstrap artifacts are emitted atomically (temp dir + rename) — a
Ctrl-C mid-emission can no longer leave a half-writteneval_suite.py
that looks complete.schema_version: trueno longer passes the int
check (bool ⊂ int).
34 new tests across the touched surface; 1038 green.