v1.43.0 consolidates the workflow engine's recovery and lifecycle model around a single primitive. The orchestration engine (dwarf, bumped to v0.8.4) now treats a terminal flow as immutable — it is never re-run in place. The in-place re-run verbs (Restart, RestartFrom, Recover) and the breakpoint verbs (BreakBefore, ResumeBreak) are gone; in their place, Fork clones a flow's prefix up to any recorded step into a new, self-contained running flow and re-executes from there, leaving the original untouched. Flow creation is likewise simplified: Create folds create-and-run into one transaction, so there is no Start endpoint and a flow is running the moment Create returns. FlowOptions loses the StartAt delayed-start field and gains ThreadKey for explicit thread membership. Alongside the lifecycle work, the LLM service gains a tool-call metric, graph annotations are removed, the flow diagrams render correctly on current mermaid, and the database layer is hardened across SQLite and SQL Server. The upgrade skill performs the migration mechanically.
Highlights
- Terminal flows are immutable; recovery is
Fork. Acompleted/failed/cancelledflow is frozen — the only operations on it are read and removal.Fork(stepKey, stateOverrides)clones the prefix up to a chosen step (including one inside a subgraph) into a new running flow and re-executes from there, never touching the original. - The in-place re-run and breakpoint verbs are removed.
Restart,RestartFrom,Recover,BreakBefore, andResumeBreak— and their foreman endpoints — are gone, replaced byForkfor recovery andflow.Interrupt/Resumefor authored pauses. Createauto-runs; there is noStart. Create-and-run is one transaction, soCreate,Continue, andForkall return a flow that is alreadyrunning. A deferred start is authored in the workflow itself — an entry task that callsflow.Interrupt, released byResume.FlowOptionscleanup.StartAtis removed;ThreadKeyis added as the explicit-policy way to join a thread. Derived operations take no options —Continueinherits the thread's policy,Forkinherits the origin's.- Subgraph-aware lifecycle and cleanup. Lifecycle operations address the root flow key and reject a subgraph-child key with
400;Delete/Purgeremove a flow's whole subtree;Query.IncludeSubgraphsandFlowSummary.Subgraphmake subgraph runs discoverable. - LLM tool-call metric.
microbus_llm_tool_callscounts each resolved tool invocation, labeled bytool_url,tool_type(function/web/workflow), andoutcome(ok/error). - Database and reliability hardening. A denormalized
root_flow_idpointer,READ_COMMITTED_SNAPSHOTand a fan-in deadlock fix for SQL Server, atomic deletion ofDeleteOnCompletionflows, and host-call panic isolation.
Recovery and Lifecycle
Immutable Flows and Fork
A terminal flow is immutable: the engine never re-runs it in place. To recover from a failure or explore an alternative, Fork(stepKey, stateOverrides) clones the flow's prefix up to the chosen step into a new, self-contained running flow and re-executes from that step with stateOverrides applied to it; the original is never modified.
// Re-run from a chosen step (its key comes from History) with an edit that lets it succeed.
newFlowKey, err := client.Fork(ctx, stepKey, map[string]any{"amount": 0})The fork point may be any recorded step, including one inside a subgraph — the clone re-runs from there and bubbles back up to the root. The fork inherits the origin flow's scheduling and baggage and forces notify-on-stop off. Because the fork is an ordinary new flow rather than a mutation of the original, a partially-failed fan-out is recovered by forking one failed branch at a time, and the original failed flows remain as an audit trail until Purged. This single primitive replaces the Recover/Restart/RestartFrom re-run verbs and the BreakBefore/ResumeBreak breakpoints introduced in earlier releases; the operator loop is now List → read Error → fix → Fork, documented in Reliability and Recovery.
Create Auto-Runs
The engine folds create-and-run into one transaction, so the foreman exposes no Start endpoint and Create, Continue, and Fork all return a flow that is already running. Run remains Create + Await. A deferred start is no longer a flag — author it in the workflow with an entry task that calls flow.Interrupt, then release it with Resume when ready. The transient created status is no longer observable.
FlowOptions: StartAt Out, ThreadKey In
FlowOptions.StartAt (delayed start) is removed — the deferred-start use case is served by flow.Interrupt. FlowOptions.ThreadKey is added: it places the new flow into an existing thread (pass any flow key already in the thread) while letting you set scheduling explicitly. It is the explicit-policy counterpart to Continue, which joins a thread by inheriting its policy wholesale. Policy is authored once at genesis (Create/Run); Continue and Fork take no options and inherit their source's policy.
Subgraph-Aware Lifecycle and Cleanup
Lifecycle mutations (Resume, Cancel, Delete, Continue) now act on the whole flow tree and must be addressed by the root flow key — a subgraph-child key is rejected with 400. Introspection (Snapshot, History, Step) and Fork still accept any key. Delete removes a flow and its whole subgraph subtree; Purge selects matching root flows and deletes each one's whole subtree (so IncludeSubgraphs is rejected), capped at 1000 root flows per call. Query.IncludeSubgraphs opts subgraph children into List results, and FlowSummary.Subgraph marks which kind each flow is — together they find every run of a graph that executed as a subgraph.
LLM Tool-Call Metric
The LLM service emits a new ToolCalls counter (OTel name microbus_llm_tool_calls, queried in Prometheus as microbus_llm_tool_calls_total), labeled by tool_url, tool_type (function/web/workflow), and outcome (ok/error). It records one increment per resolved tool invocation and is emitted at the two points a tool actually resolves — direct bus tools in the live Chat loop and workflow tools' subgraph branch — so the synchronous loop and the ChatLoop workflow are both covered without double counting. The Grafana LLM Overview dashboard gains tool-call panels alongside the existing token metrics.
Database and Reliability Hardening
- Denormalized
root_flow_id. Every flow row carries a direct pointer to its tree root, so subtree membership (used byDelete/Purgecascades and subgraph queries) is a single indexed lookup rather than a recursive walk. - SQL Server concurrency.
READ_COMMITTED_SNAPSHOTis enabled to cut lock contention, and the fan-insuccessor_idwrite now targets by primary key to avoid a deadlock under parallel branches. - Atomic
DeleteOnCompletion. A disposable flow is now deleted inside the same transaction that marks it completed, so there is no observable window where a reader sees a transientcompletedoutcome instead of the intended uniform404. - Host-call panic isolation. A panic in a host callback (
LoadGraph/ExecuteTask/FlowStopped) is recovered and surfaced as a step error rather than taking down the worker.
Diagram Rendering and Deprecated Annotations
- Flow diagrams render correctly on current mermaid. The flow and graph renderers no longer emit the per-subgraph
direction TBkeyword. mermaid 11.16.0 began honoring it, which detached edges that crossed subgraph boundaries in the execution-history and graph diagrams (visible in agentstudio); dropping it restores correct edge routing without pinning a mermaid version. A fork live-update race in agentstudio's flow detail view is also fixed. - Graph annotations are removed.
Graph.Annotate/Annotationand the renderer'sWithAnnotationColorare gone. Read-only graph inspectors (Nodes,Transitions,Reducers,URLOf,IsFanIn, …) are unchanged.
Metric Naming
Counter instrument names no longer carry a _total suffix (microbus_llm_tokens, dwarf_flows_started, dwarf_steps_executed, …). _total is a Prometheus naming convention that the OTLP→Prometheus path re-appends automatically, so PromQL queries and existing dashboards are unaffected — microbus_llm_tokens is still queried as microbus_llm_tokens_total. Only consumers reading the native OTLP instrument name see the change.
Breaking Changes
The upgrade skill handles each of these mechanically.
Restart,RestartFrom, andRecoverare removed — both theflow/engine operations and the foreman endpoints. Recover a failed flow byForking from the failed step (its key comes fromHistory).BreakBeforeandResumeBreakare removed. Author a pause into the workflow withflow.Interruptand release it withResume; useForkto explore an alternative path from a recorded step.- The foreman
Startendpoint is removed.Createruns the flow immediately. Drop anyclient.Start(ctx, flowKey)call; a deferred start is an entry task that callsflow.Interrupt. FlowOptions.StartAtis removed. Replace a delayed-start flow with an entry task that callsflow.Interrupt, released byResume.Continueno longer takes a*FlowOptions. It inherits the thread's policy. To add a turn with explicit policy, useCreatewithFlowOptions.ThreadKey.Graph.Annotate/AnnotationandGraphRenderer.WithAnnotationColorare removed. Remove any annotation calls.- Lifecycle operations reject a subgraph-child flow key with
400. AddressResume/Cancel/Delete/Continueby the root flow key.
Migration
From inside a Microbus project, ask Claude Code to upgrade Microbus:
Get the latest version of Microbus.
The upgrade bumps go.mod to v1.43.0 (which requires dwarf v0.8.4), refreshes .claude/rules/ and .claude/skills/, and runs the versioned upgrade-v1-43-0 routine, which flags Restart/RestartFrom/Recover/BreakBefore/ResumeBreak and client.Start call sites for migration to Fork / flow.Interrupt, drops FlowOptions.StartAt and the *FlowOptions argument to Continue, and removes Graph.Annotate/Annotation calls. The orchestrator then regenerates every microservice with genservice, runs go mod tidy && go vet ./... && go test ./..., and you review the diff.
Documentation
- Updated: Reliability and Recovery — the
List→Forkoperator loop replaces the removed re-run verbs. - Updated: Building Agentic Workflows —
Createauto-runs,Forkfor recovery, and the removal of breakpoints. - Updated: Priority and Fairness —
FlowOptionslosesStartAtand gainsThreadKey. - Updated:
foremanpackage reference —Fork, auto-runningCreate, subgraph-aware lifecycle and cleanup. - Updated:
workflowpackage reference — theFlowOptionsshape. - Updated:
llmpackage reference and LLM Integration — the tool-call metric and auto-runningCreate.