Skip to content

v1.1.0 — Targeted eval runs and report validation

Choose a tag to compare

@lukefwalton lukefwalton released this 13 Jun 09:31
· 96 commits to main since this release
5fbac7c

Applying learnings from my production answer engine:

New:

  • Stable gold query ids (q01q10) for targeted runs
  • npm run eval -- --ids, --from-report, --list, and --help
  • JSON eval reports under artifacts/eval/ for rerunning failures
  • forbidAnswerPatterns gold guard (e.g. no raw URLs in answer prose on boundary queries)
  • Strict eval report validation before reruns (JSON syntax, result shape, count consistency)
  • Reject aborted --fail-fast reports for --from-report

Docs:

  • Eval workflow documented in eval/README.md
  • README: retrieved-vs-cited UI lesson, repo positioning, related writing, Zenodo DOI
  • Contribution guidance: “Code the invariant. Document the scaling pattern. Comment the footgun.”

No changes to the no-leak boundary or citation-grounding contract.