-
Notifications
You must be signed in to change notification settings - Fork 0
Examples
Sasha Lopashev edited this page Jun 27, 2026
·
1 revision
Examples should be concrete enough to benchmark. Each example needs a baseline, an optimized plan, acceptance criteria, and an evidence bundle.
Baseline:
- Retrieve 20 chunks.
- Send all chunks to a frontier model.
- Generate answer.
- Return answer without structured validation.
Migaki:
- Represent workflow as mIR.
- Deduplicate overlapping or exact duplicate chunks.
- Preserve stable instructions as a cacheable prefix.
- Estimate token and cost deltas.
- Route lightweight ranking to a cheaper model or mock backend.
- Use a stronger model for final synthesis.
- Validate answer against cited chunks.
- Retry only synthesis if validation fails.
- Export evidence bundle.
Measures:
- baseline tokens,
- optimized tokens,
- baseline cost,
- optimized cost,
- baseline latency,
- optimized latency,
- validator pass rate,
- plan diff,
- trace and replay artifact.
Baseline:
- Load repository context.
- Load changed files.
- Load style guide.
- Ask one frontier model for review.
- Ask the same model to produce final comments.
Migaki:
- Mark style guide as fixed and cacheable.
- Mark changed files as non-droppable.
- Drop unrelated historical conversation.
- Run static checks deterministically.
- Route simple lint explanation to a cheaper model or mock backend.
- Use a stronger model for architectural review.
- Validate final comments against changed lines.
- Export evidence bundle.
Measures:
- comment acceptance rate,
- false-positive rate,
- validator pass rate,
- context diff,
- cost delta,
- latency delta.
Baseline:
- Ask a model for JSON.
- Parse the response.
- Retry the entire prompt when parsing fails.
Migaki:
- Represent schema requirement in mIR.
- Lower to provider-native structured output when available.
- Use post-validation fallback when native support is unavailable.
- Retry only the invalid node.
- Emit evidence showing the provider capability path.
Measures:
- schema validity,
- retry count,
- cost per valid result,
- backend downgrade warnings.