Release multivon-eval 0.11.1 — robustness hardening: honest UNKNOWN over confident wrong · multivon-ai/multivon-eval

Robustness hardening from an adversarial audit that ran the staleness /
provenance / scanner / bootstrap surface against malformed inputs, symlink
tricks, unicode edge cases, and concurrent writers. The theme: every failure
the audit found was a place where the tool either crashed with a raw
traceback or — worse — reported something false. Both violate the same
contract: honest UNKNOWN over confident wrong.

Fixed

A syntax-broken file no longer reads as REMOVED. The scanner silently
returned zero records for files it couldn't parse (syntax errors, non-UTF8
encodings), so staleness reported every baselined site in them as REMOVED —
and --fail-on removed failed CI with a misleading verdict. Unscannable
files now surface as a distinct UNSCANNABLE tier ("file exists but could
not be parsed — verdict unknown, NOT removed"), a warning line names each
file with its reason in all three renderers, JSON gains skipped_files,
and --fail-on removed no longer trips. Skipped files are a report-time
concept — never written into baselines.
Symlinks resolving outside the repo root are skipped, not recorded —
previously they wrote machine-specific absolute paths into the baseline,
producing false REMOVED+ADDED churn on every other checkout.
Fingerprints are NFC-normalized (SCANNER_VERSION 3 → 4) — composed
vs decomposed unicode ("é" as one codepoint vs e+combining-accent) is an
editor/OS artifact, not a prompt change; it previously fingerprinted as
drift. Old baselines print the standing "rescan recommended" warning.
match-statement capture patterns disqualify module constants —
case PROMPT: rebinds via a str field the scanner didn't see, letting a
rebound constant read as static (a false "static" poisons every verdict).
Clean errors instead of tracebacks: staleness stamp on malformed
JSONL (file:line in the message), staleness baseline on a nonexistent
path or missing --out dir, bootstrap on a malformed traces file, and
--site …#xx with a non-integer position — all exit 2 with actionable
messages. multivon-eval … | head no longer dumps a BrokenPipeError.
attribution scan /typo/path exits 2 instead of a green "No SDK
prompt call sites found" — a typo'd CI path looked permanently passing.
The documented 10K trace cap is now enforced with a loud truncation
warning, and a malformed final trace line (the normal shape of an
interrupted streamed dump) skips with a warning while malformed interior
lines stay a hard error.
Bootstrap artifacts are emitted atomically (temp dir + rename) — a
Ctrl-C mid-emission can no longer leave a half-written eval_suite.py
that looks complete. schema_version: true no longer passes the int
check (bool ⊂ int).

34 new tests across the touched surface; 1038 green.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multivon-eval 0.11.1 — robustness hardening: honest UNKNOWN over confident wrong

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Fixed

Uh oh!