multivon-eval 0.15.0 — view --dir report browser
·
9 commits
to main
since this release
multivon-eval view --dir (#15) — designed by a deliberation panel that rejected the framing ("studio") and landed on extending the existing view command instead. A local report browser for a folder of runs, read-only and stateless, on the same stdlib http.server harness view already uses — zero new dependencies, fully offline.
Added
- INDEX (
view --dir runs/) — a sortable table of every eval-report JSON in a directory: suite, model, when, n, pass rate with a Wilson CI bar, error/flaky badges, cost. A positive structural validator decides what's a report (requires the real{summary.pass_rate, cases[]}shape) — junk JSON collapses into one "k files skipped" line rather than being parsed as an empty report. error-rate >= 10% is flagged.--recursiveopt-in (off by default). - OPEN (
/r/<idx>) — the existingEvalReport.to_html()served verbatim with a breadcrumb; no renderer fork. - DIFF (
/diff?a=&b=) — wrapsreport_a.compare(report_b): pass-rate / avg-score deltas, McNemar p with a significance label, and four buckets (Regressed open + colored, Fixed, Still failing, Unchanged). Regressed rows stack both runs' judge reasons (looked up by case_input) as prose, so you see exactly why a verdict flipped. - Single-file
view <report.json>is unchanged;view <dir>andview --dir <dir>both enter directory mode.