-
Notifications
You must be signed in to change notification settings - Fork 0
Whitepaper Notes
The current whitepaper is strongest when it argues that AI execution is no longer primitive. The ecosystem already has agent frameworks, gateways, durable workflows, tool protocols, observability, evals, and provider-specific execution features. The problem is that these systems optimize locally and rarely share a portable execution object.
Migaki's strongest thesis is:
The unit of optimization should not be a single prompt. The unit of optimization should be the execution graph.
- Migaki is an execution optimizer for probabilistic model/tool graphs.
- mIR should represent intent separately from provider-specific execution.
- Context should be a first-class execution artifact.
- Provider neutrality must still be capability-aware.
- Every pass should emit plan diffs, warnings, and evidence.
- Optimization should target validated task outcomes, not identical answers.
- Routing is useful only when benchmarked against simple baselines and measured under task-specific constraints.
- v0 should be boring, falsifiable, and hard to dismiss.
The whitepaper lists many execution concerns: routing, retrieval, context assembly, caching, retries, fallback, validation, approvals, replay, audit, retention, and observability. That breadth is plausible for the long-term IR, but v0 needs a narrower contract:
- accept a logical plan,
- apply a small set of deterministic or low-risk passes,
- lower to one or two backends,
- execute or simulate execution,
- report evidence.
The phrase is directionally right, but it can become vague. Each demo should name its acceptance criteria before optimization:
- schema validity,
- source-grounding score,
- validator pass rate,
- regression threshold,
- human acceptance rate,
- task-specific success metric.
mIR should be expressive enough to optimize execution, but not a universal representation of all agent behavior. Planning, dialogue policy, product workflow, and domain logic should remain outside mIR unless they directly affect execution planning.
Evidence bundles are central to credibility, but full traces can contain sensitive prompts, documents, tool outputs, and user data. Evidence design needs redaction, retention classes, replay modes, and privacy-aware export from the beginning.
Provider APIs change quickly. Capability registries should be versioned and testable. Backend behavior should declare assumptions in evidence bundles so a plan is not silently judged against stale capabilities.
Prefer:
- execution optimizer,
- probabilistic model/tool graph,
- portable logical plan,
- provider-specific lowering,
- execution evidence bundle,
- validated quality threshold.
Avoid:
- AI compiler,
- same answer, lower cost,
- provider-neutral execution across all models,
- replaces agent frameworks,
- minimum confidence,
- optimization certificate.
Migaki will be credible if it can show a workflow where execution changes are visible, constrained, and measured:
- fewer tokens,
- lower estimated or actual cost,
- lower latency where possible,
- unchanged validator pass rate within a declared threshold,
- no hidden prompt mutation,
- a replayable evidence artifact.