v1.1.0 — Targeted eval runs and report validation
Applying learnings from my production answer engine:
New:
- Stable gold query ids (
q01–q10) for targeted runs npm run eval -- --ids,--from-report,--list, and--help- JSON eval reports under
artifacts/eval/for rerunning failures forbidAnswerPatternsgold guard (e.g. no raw URLs in answer prose on boundary queries)- Strict eval report validation before reruns (JSON syntax, result shape, count consistency)
- Reject aborted
--fail-fastreports for--from-report
Docs:
- Eval workflow documented in eval/README.md
- README: retrieved-vs-cited UI lesson, repo positioning, related writing, Zenodo DOI
- Contribution guidance: “Code the invariant. Document the scaling pattern. Comment the footgun.”
No changes to the no-leak boundary or citation-grounding contract.