Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 41 additions & 12 deletions blog/langgraph-budget-control-durable-execution-retries-fan-out.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "LangGraph Budget Control: Bounding Durable Execution, Retries, and Fan-Out"
title: "LangGraph Budget Control for Durable Execution, Retries, and Fan-Out"
date: 2026-03-22
author: Cycles Team
tags: [langgraph, budgets, engineering, durable-execution, best-practices]
Expand All @@ -8,7 +8,7 @@ blog: true
sidebar: false
---

# LangGraph Budget Control: Bounding Durable Execution, Retries, and Fan-Out
# LangGraph Budget Control for Durable Execution, Retries, and Fan-Out

A team builds an insurance claim processor in LangGraph. The graph has six nodes — classify, extract, validate, enrich, review, decide — with checkpointing enabled so runs can pause and resume. It works well in development.

Expand All @@ -28,7 +28,7 @@ One-shot agents have a simple cost model: one pass through the workflow, one bil

Durable graph agents — whether built on LangGraph, Temporal, or Restate — break this model. Runs checkpoint, pause, resume, retry, and branch. The cost of a single logical run is not "one pass." It is the sum of every attempt, across every checkpoint, across every branch.

Three properties of durable execution change how budget exposure works:
Three properties of durable execution change how bounded exposure works:

**Checkpoints create replay surfaces.** When a graph resumes from a checkpoint, it can re-execute nodes that already consumed tokens and triggered side effects. If the budget system does not know which nodes already ran, it cannot prevent double-charging.

Expand Down Expand Up @@ -62,6 +62,8 @@ LangGraph supports retry policies at multiple levels: individual tool calls, nod

A graph with 3 retries per node and 3 retries per graph can produce up to 9 executions of a single node. Add an SDK-level retry on transient HTTP errors (another 3×), and you are looking at 27 executions of a node you expected to run once. At $0.45 per node execution, a $0.45 step becomes a $12.15 step.

These three retry layers operate at different levels of the stack. SDK retries replay a single HTTP call — transparent to the node, cost = one LLM call per attempt. Node retries re-execute the node function, which may contain multiple LLM calls and tool invocations — cost = the full node body per attempt. Graph-level retries resume from a checkpoint and re-enter the node from persisted state, replaying everything above. Each layer compounds the cost of the layers below it.

This is the same geometric multiplication pattern behind the [retry storm failure that cost $1,800 in 12 minutes](/blog/ai-agent-failures-budget-controls-prevent). Durable execution does not prevent retry storms — it makes them more likely, because the framework is designed to keep trying.

### 3. Fan-out branches racing for shared budget
Expand All @@ -82,7 +84,7 @@ This failure mode is unique to durable execution. One-shot agents don't survive

The [reserve-commit lifecycle](/blog/ai-agent-budget-control-enforce-hard-spend-limits) already solves the core problem of pre-execution budget enforcement. For durable graph execution, the same pattern applies — but scoped to the graph's structure:

**Run-level budget.** A total ceiling for the entire graph execution, including all retries and fan-outs. This is the outer bound. No combination of retries, replays, or parallel branches can exceed it.
**Run-level budget.** A hard limit for the entire graph execution, including all retries and fan-outs. No combination of retries, replays, or parallel branches can exceed it.

**Node-level reservation.** Before each node executes, reserve the estimated cost from the run budget. The reservation is atomic — if the budget is insufficient, the node does not start. The run receives a clear budget-exhausted signal instead of silently proceeding.

Expand All @@ -105,9 +107,11 @@ The [reserve-commit lifecycle](/blog/ai-agent-budget-control-enforce-hard-spend-

## What This Looks Like in Practice

Cycles integrates with LangChain and LangGraph through a [custom callback handler](/how-to/integrating-cycles-with-langchain) that wraps every LLM call with a reservation. The handler creates a reservation on `on_llm_start`, commits on `on_llm_end`, and releases on `on_llm_error`:
Cycles — a runtime authority for autonomous agents — integrates with LangGraph through a [LangChain callback handler](/how-to/integrating-cycles-with-langchain) on the model. The handler fires on every LLM call inside every node: it creates a reservation on `on_llm_start`, commits actual cost on `on_llm_end`, and releases on `on_llm_error`. The reservation boundary sits at the model call, not at the graph edge — so a node that makes three LLM calls gets three reservations.

```python
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI
from runcycles import CyclesClient, CyclesConfig, Subject
from budget_handler import CyclesBudgetHandler # see integration guide
Expand All @@ -123,15 +127,34 @@ handler = CyclesBudgetHandler(
),
)

# The handler attaches to the model, not the graph.
# Every LLM call inside any node gets a pre-execution budget check.
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

def classify(state: dict) -> dict:
# ← on_llm_start fires here: reservation created
result = llm.invoke(state["messages"])
# ← on_llm_end fires here: actual cost committed
return {"messages": [result]}

graph = StateGraph(dict)
graph.add_node("classify", classify)
graph.add_node("extract", extract)
graph.add_node("enrich", enrich)
graph.add_edge(START, "classify")
graph.add_edge("classify", "extract")
graph.add_edge("extract", "enrich")
graph.add_edge("enrich", END)

app = graph.compile(checkpointer=MemorySaver())
```

Every LLM call the graph makes — across nodes, retries, and fan-out branches — gets a pre-execution budget check. The handler tracks reservations by LangChain's `run_id`, so concurrent calls within parallel branches are handled correctly.
When LangGraph resumes from a checkpoint and re-enters a node, the handler treats it like any other LLM call the reservation fires again. Idempotency keys on commits (run ID + node ID + attempt number) prevent double-charging: a retried node creates a new reservation, while a replayed-from-checkpoint node that already committed is recognized as settled.

For fan-out, each sub-graph uses a scoped subject with its own budget allocation:
For fan-out, each parallel review node gets its own model instance with a budget-scoped handler:

```python
# Parent carves sub-budgets for parallel branches
# Each review node gets its own budget-scoped handler
for branch in ["liability", "medical", "property", "general"]:
branch_handler = CyclesBudgetHandler(
client=client,
Expand All @@ -141,11 +164,17 @@ for branch in ["liability", "medical", "property", "general"]:
agent=f"review-{branch}",
),
)
# Each branch is budget-bounded independently
run_branch(state, callbacks=[branch_handler])
branch_llm = ChatOpenAI(model="gpt-4o", callbacks=[branch_handler])

def make_review_node(model):
def review(state: dict) -> dict:
return {"messages": [model.invoke(state["messages"])]}
return review

graph.add_node(f"review_{branch}", make_review_node(branch_llm))
```

Idempotency keys on reservations and commits ensure that replayed nodes do not double-charge. The key includes the run ID, node ID, and attempt number — so a retried node creates a new reservation, while a replayed-from-checkpoint node recognizes the existing commit.
Each parallel node's LLM calls are budget-bounded independently. The scoped `Subject` per branch means Cycles tracks spend separately — no shared-pool race condition.

For the full callback handler implementation and runnable examples, see [Integrating Cycles with LangChain](/how-to/integrating-cycles-with-langchain).

Expand All @@ -161,7 +190,7 @@ The difference is not subtle. It is the difference between a cost surprise and a
| Process crash mid-node | Reservation leaked, budget permanently reduced | Uncommitted reservation auto-released on retry |
| Overnight batch of 500 graph runs | No per-run limit, total cost unknown until morning | Each run bounded, batch total = sum of run budgets |

The insurance claim processor from the opening scenario would have stopped at $180. The retry replays would have been idempotent — committed nodes would not re-charge. The fan-out branches would have received sub-budgets. The run-level ceiling would have prevented any single execution from exceeding its allocation.
The insurance claim processor from the opening scenario would have stopped at $180 — enforcement before the action, not observation after. The retry replays would have been idempotent — committed nodes would not re-charge. The fan-out branches would have received sub-budgets. The run-level hard limit would have prevented any single execution from exceeding its allocation.

## Next Steps

Expand Down
Loading