User Story
As Maya, in order to get better extraction quality without changing the underlying model, I want an extraction variant whose system prompt has been automatically optimized against the evaluation suite.
Preconditions
Acceptance Criteria
Success Metrics
- Optimized variant beats baseline on at least one metric (recall, precision, or sensitivity)
- Optimization is reproducible via a single CLI command
Notes
- Class topic: prompt optimization, Assignment 10
- Use Opus as the backward model if it fits the budget; Sonnet otherwise
- Document hyperparameters for reproducibility
- Follow-on stories (not in scope here):
shaping/sonnet-optimized-v1, filling/sonnet-optimized-v1 — same harness, different eval kind
Definition of Done
User Story
As Maya, in order to get better extraction quality without changing the underlying model, I want an extraction variant whose system prompt has been automatically optimized against the evaluation suite.
Preconditions
Acceptance Criteria
src/services/prompt-optimization/— thin wrapper around TextGrad (or equivalent) that drives our existingEvaluationKindharness as its training signalsrc/services/extraction/prompts/optimized-v1.txt(or similar)extraction/sonnet-optimized-v1that loads the optimized prompt at constructionsonnetvssonnet-optimized-v1on all fixturescatalog/experiments/pdf-field-extraction/sonnet-optimized-v1.mdincluding: optimization setup (epochs, batch size, forward/backward models), before/after prompt snippets, and metric deltascatalog/experiments/_roadmap.mdupdated with shipped status and one-line findingSuccess Metrics
Notes
shaping/sonnet-optimized-v1,filling/sonnet-optimized-v1— same harness, different eval kindDefinition of Done