Skip to content

multivon-eval 0.15.0 — view --dir report browser

Choose a tag to compare

@siddharthsrivastava siddharthsrivastava released this 15 Jun 20:38
· 9 commits to main since this release

multivon-eval view --dir (#15) — designed by a deliberation panel that rejected the framing ("studio") and landed on extending the existing view command instead. A local report browser for a folder of runs, read-only and stateless, on the same stdlib http.server harness view already uses — zero new dependencies, fully offline.

Added

  • INDEX (view --dir runs/) — a sortable table of every eval-report JSON in a directory: suite, model, when, n, pass rate with a Wilson CI bar, error/flaky badges, cost. A positive structural validator decides what's a report (requires the real {summary.pass_rate, cases[]} shape) — junk JSON collapses into one "k files skipped" line rather than being parsed as an empty report. error-rate >= 10% is flagged. --recursive opt-in (off by default).
  • OPEN (/r/<idx>) — the existing EvalReport.to_html() served verbatim with a breadcrumb; no renderer fork.
  • DIFF (/diff?a=&b=) — wraps report_a.compare(report_b): pass-rate / avg-score deltas, McNemar p with a significance label, and four buckets (Regressed open + colored, Fixed, Still failing, Unchanged). Regressed rows stack both runs' judge reasons (looked up by case_input) as prose, so you see exactly why a verdict flipped.
  • Single-file view <report.json> is unchanged; view <dir> and view --dir <dir> both enter directory mode.