Skip to content

Immutable Flows, Fork-to-Recover, Auto-Running Create

Latest

Choose a tag to compare

@bw19 bw19 released this 29 Jun 02:53

v1.43.0 consolidates the workflow engine's recovery and lifecycle model around a single primitive. The orchestration engine (dwarf, bumped to v0.8.4) now treats a terminal flow as immutable — it is never re-run in place. The in-place re-run verbs (Restart, RestartFrom, Recover) and the breakpoint verbs (BreakBefore, ResumeBreak) are gone; in their place, Fork clones a flow's prefix up to any recorded step into a new, self-contained running flow and re-executes from there, leaving the original untouched. Flow creation is likewise simplified: Create folds create-and-run into one transaction, so there is no Start endpoint and a flow is running the moment Create returns. FlowOptions loses the StartAt delayed-start field and gains ThreadKey for explicit thread membership. Alongside the lifecycle work, the LLM service gains a tool-call metric, graph annotations are removed, the flow diagrams render correctly on current mermaid, and the database layer is hardened across SQLite and SQL Server. The upgrade skill performs the migration mechanically.

Highlights

  • Terminal flows are immutable; recovery is Fork. A completed/failed/cancelled flow is frozen — the only operations on it are read and removal. Fork(stepKey, stateOverrides) clones the prefix up to a chosen step (including one inside a subgraph) into a new running flow and re-executes from there, never touching the original.
  • The in-place re-run and breakpoint verbs are removed. Restart, RestartFrom, Recover, BreakBefore, and ResumeBreak — and their foreman endpoints — are gone, replaced by Fork for recovery and flow.Interrupt/Resume for authored pauses.
  • Create auto-runs; there is no Start. Create-and-run is one transaction, so Create, Continue, and Fork all return a flow that is already running. A deferred start is authored in the workflow itself — an entry task that calls flow.Interrupt, released by Resume.
  • FlowOptions cleanup. StartAt is removed; ThreadKey is added as the explicit-policy way to join a thread. Derived operations take no options — Continue inherits the thread's policy, Fork inherits the origin's.
  • Subgraph-aware lifecycle and cleanup. Lifecycle operations address the root flow key and reject a subgraph-child key with 400; Delete/Purge remove a flow's whole subtree; Query.IncludeSubgraphs and FlowSummary.Subgraph make subgraph runs discoverable.
  • LLM tool-call metric. microbus_llm_tool_calls counts each resolved tool invocation, labeled by tool_url, tool_type (function/web/workflow), and outcome (ok/error).
  • Database and reliability hardening. A denormalized root_flow_id pointer, READ_COMMITTED_SNAPSHOT and a fan-in deadlock fix for SQL Server, atomic deletion of DeleteOnCompletion flows, and host-call panic isolation.

Recovery and Lifecycle

Immutable Flows and Fork

A terminal flow is immutable: the engine never re-runs it in place. To recover from a failure or explore an alternative, Fork(stepKey, stateOverrides) clones the flow's prefix up to the chosen step into a new, self-contained running flow and re-executes from that step with stateOverrides applied to it; the original is never modified.

// Re-run from a chosen step (its key comes from History) with an edit that lets it succeed.
newFlowKey, err := client.Fork(ctx, stepKey, map[string]any{"amount": 0})

The fork point may be any recorded step, including one inside a subgraph — the clone re-runs from there and bubbles back up to the root. The fork inherits the origin flow's scheduling and baggage and forces notify-on-stop off. Because the fork is an ordinary new flow rather than a mutation of the original, a partially-failed fan-out is recovered by forking one failed branch at a time, and the original failed flows remain as an audit trail until Purged. This single primitive replaces the Recover/Restart/RestartFrom re-run verbs and the BreakBefore/ResumeBreak breakpoints introduced in earlier releases; the operator loop is now List → read Error → fix → Fork, documented in Reliability and Recovery.

Create Auto-Runs

The engine folds create-and-run into one transaction, so the foreman exposes no Start endpoint and Create, Continue, and Fork all return a flow that is already running. Run remains Create + Await. A deferred start is no longer a flag — author it in the workflow with an entry task that calls flow.Interrupt, then release it with Resume when ready. The transient created status is no longer observable.

FlowOptions: StartAt Out, ThreadKey In

FlowOptions.StartAt (delayed start) is removed — the deferred-start use case is served by flow.Interrupt. FlowOptions.ThreadKey is added: it places the new flow into an existing thread (pass any flow key already in the thread) while letting you set scheduling explicitly. It is the explicit-policy counterpart to Continue, which joins a thread by inheriting its policy wholesale. Policy is authored once at genesis (Create/Run); Continue and Fork take no options and inherit their source's policy.

Subgraph-Aware Lifecycle and Cleanup

Lifecycle mutations (Resume, Cancel, Delete, Continue) now act on the whole flow tree and must be addressed by the root flow key — a subgraph-child key is rejected with 400. Introspection (Snapshot, History, Step) and Fork still accept any key. Delete removes a flow and its whole subgraph subtree; Purge selects matching root flows and deletes each one's whole subtree (so IncludeSubgraphs is rejected), capped at 1000 root flows per call. Query.IncludeSubgraphs opts subgraph children into List results, and FlowSummary.Subgraph marks which kind each flow is — together they find every run of a graph that executed as a subgraph.

LLM Tool-Call Metric

The LLM service emits a new ToolCalls counter (OTel name microbus_llm_tool_calls, queried in Prometheus as microbus_llm_tool_calls_total), labeled by tool_url, tool_type (function/web/workflow), and outcome (ok/error). It records one increment per resolved tool invocation and is emitted at the two points a tool actually resolves — direct bus tools in the live Chat loop and workflow tools' subgraph branch — so the synchronous loop and the ChatLoop workflow are both covered without double counting. The Grafana LLM Overview dashboard gains tool-call panels alongside the existing token metrics.

Database and Reliability Hardening

  • Denormalized root_flow_id. Every flow row carries a direct pointer to its tree root, so subtree membership (used by Delete/Purge cascades and subgraph queries) is a single indexed lookup rather than a recursive walk.
  • SQL Server concurrency. READ_COMMITTED_SNAPSHOT is enabled to cut lock contention, and the fan-in successor_id write now targets by primary key to avoid a deadlock under parallel branches.
  • Atomic DeleteOnCompletion. A disposable flow is now deleted inside the same transaction that marks it completed, so there is no observable window where a reader sees a transient completed outcome instead of the intended uniform 404.
  • Host-call panic isolation. A panic in a host callback (LoadGraph/ExecuteTask/FlowStopped) is recovered and surfaced as a step error rather than taking down the worker.

Diagram Rendering and Deprecated Annotations

  • Flow diagrams render correctly on current mermaid. The flow and graph renderers no longer emit the per-subgraph direction TB keyword. mermaid 11.16.0 began honoring it, which detached edges that crossed subgraph boundaries in the execution-history and graph diagrams (visible in agentstudio); dropping it restores correct edge routing without pinning a mermaid version. A fork live-update race in agentstudio's flow detail view is also fixed.
  • Graph annotations are removed. Graph.Annotate/Annotation and the renderer's WithAnnotationColor are gone. Read-only graph inspectors (Nodes, Transitions, Reducers, URLOf, IsFanIn, …) are unchanged.

Metric Naming

Counter instrument names no longer carry a _total suffix (microbus_llm_tokens, dwarf_flows_started, dwarf_steps_executed, …). _total is a Prometheus naming convention that the OTLP→Prometheus path re-appends automatically, so PromQL queries and existing dashboards are unaffectedmicrobus_llm_tokens is still queried as microbus_llm_tokens_total. Only consumers reading the native OTLP instrument name see the change.

Breaking Changes

The upgrade skill handles each of these mechanically.

  • Restart, RestartFrom, and Recover are removed — both the flow/engine operations and the foreman endpoints. Recover a failed flow by Forking from the failed step (its key comes from History).
  • BreakBefore and ResumeBreak are removed. Author a pause into the workflow with flow.Interrupt and release it with Resume; use Fork to explore an alternative path from a recorded step.
  • The foreman Start endpoint is removed. Create runs the flow immediately. Drop any client.Start(ctx, flowKey) call; a deferred start is an entry task that calls flow.Interrupt.
  • FlowOptions.StartAt is removed. Replace a delayed-start flow with an entry task that calls flow.Interrupt, released by Resume.
  • Continue no longer takes a *FlowOptions. It inherits the thread's policy. To add a turn with explicit policy, use Create with FlowOptions.ThreadKey.
  • Graph.Annotate/Annotation and GraphRenderer.WithAnnotationColor are removed. Remove any annotation calls.
  • Lifecycle operations reject a subgraph-child flow key with 400. Address Resume/Cancel/Delete/Continue by the root flow key.

Migration

From inside a Microbus project, ask Claude Code to upgrade Microbus:

Get the latest version of Microbus.

The upgrade bumps go.mod to v1.43.0 (which requires dwarf v0.8.4), refreshes .claude/rules/ and .claude/skills/, and runs the versioned upgrade-v1-43-0 routine, which flags Restart/RestartFrom/Recover/BreakBefore/ResumeBreak and client.Start call sites for migration to Fork / flow.Interrupt, drops FlowOptions.StartAt and the *FlowOptions argument to Continue, and removes Graph.Annotate/Annotation calls. The orchestrator then regenerates every microservice with genservice, runs go mod tidy && go vet ./... && go test ./..., and you review the diff.

Documentation