v1.1.0 — Targeted eval runs and report validation

lukefwalton released this 13 Jun 09:31

· 96 commits to main since this release

5fbac7c

Applying learnings from my production answer engine:

New:

Stable gold query ids (q01–q10) for targeted runs
npm run eval -- --ids, --from-report, --list, and --help
JSON eval reports under artifacts/eval/ for rerunning failures
forbidAnswerPatterns gold guard (e.g. no raw URLs in answer prose on boundary queries)
Strict eval report validation before reruns (JSON syntax, result shape, count consistency)
Reject aborted --fail-fast reports for --from-report

Docs:

Eval workflow documented in eval/README.md
README: retrieved-vs-cited UI lesson, repo positioning, related writing, Zenodo DOI
Contribution guidance: “Code the invariant. Document the scaling pattern. Comment the footgun.”

No changes to the no-leak boundary or citation-grounding contract.

Assets 2