Skip to content

Whitepaper Notes

Sasha Lopashev edited this page Jun 27, 2026 · 1 revision

Whitepaper Notes

The current whitepaper is strongest when it argues that AI execution is no longer primitive. The ecosystem already has agent frameworks, gateways, durable workflows, tool protocols, observability, evals, and provider-specific execution features. The problem is that these systems optimize locally and rarely share a portable execution object.

Migaki's strongest thesis is:

The unit of optimization should not be a single prompt. The unit of optimization should be the execution graph.

Claims Worth Keeping

  • Migaki is an execution optimizer for probabilistic model/tool graphs.
  • mIR should represent intent separately from provider-specific execution.
  • Context should be a first-class execution artifact.
  • Provider neutrality must still be capability-aware.
  • Every pass should emit plan diffs, warnings, and evidence.
  • Optimization should target validated task outcomes, not identical answers.
  • Routing is useful only when benchmarked against simple baselines and measured under task-specific constraints.
  • v0 should be boring, falsifiable, and hard to dismiss.

Places to Be More Precise

"Execution Optimizer" Needs a Narrow Contract

The whitepaper lists many execution concerns: routing, retrieval, context assembly, caching, retries, fallback, validation, approvals, replay, audit, retention, and observability. That breadth is plausible for the long-term IR, but v0 needs a narrower contract:

  • accept a logical plan,
  • apply a small set of deterministic or low-risk passes,
  • lower to one or two backends,
  • execute or simulate execution,
  • report evidence.

"Validated Behavioral Equivalence" Needs Concrete Metrics

The phrase is directionally right, but it can become vague. Each demo should name its acceptance criteria before optimization:

  • schema validity,
  • source-grounding score,
  • validator pass rate,
  • regression threshold,
  • human acceptance rate,
  • task-specific success metric.

mIR Should Not Become a Dumping Ground

mIR should be expressive enough to optimize execution, but not a universal representation of all agent behavior. Planning, dialogue policy, product workflow, and domain logic should remain outside mIR unless they directly affect execution planning.

Evidence Retention Can Conflict With Privacy

Evidence bundles are central to credibility, but full traces can contain sensitive prompts, documents, tool outputs, and user data. Evidence design needs redaction, retention classes, replay modes, and privacy-aware export from the beginning.

Provider Capability Drift Is a First-Class Problem

Provider APIs change quickly. Capability registries should be versioned and testable. Backend behavior should declare assumptions in evidence bundles so a plan is not silently judged against stale capabilities.

Language Guardrails

Prefer:

  • execution optimizer,
  • probabilistic model/tool graph,
  • portable logical plan,
  • provider-specific lowering,
  • execution evidence bundle,
  • validated quality threshold.

Avoid:

  • AI compiler,
  • same answer, lower cost,
  • provider-neutral execution across all models,
  • replaces agent frameworks,
  • minimum confidence,
  • optimization certificate.

Working Bet

Migaki will be credible if it can show a workflow where execution changes are visible, constrained, and measured:

  • fewer tokens,
  • lower estimated or actual cost,
  • lower latency where possible,
  • unchanged validator pass rate within a declared threshold,
  • no hidden prompt mutation,
  • a replayable evidence artifact.

Clone this wiki locally