Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ export default defineConfig({
text: 'Integrations',
collapsed: false,
items: [
{ text: 'Ecosystem', link: '/how-to/ecosystem', badge: { text: 'New', type: 'tip' } },
{ text: 'Overview', link: '/how-to/integrations-overview' },
{ text: 'MCP (Claude, Cursor, Windsurf)', link: '/how-to/integrating-cycles-with-mcp' },
{ text: 'OpenAI', link: '/how-to/integrating-cycles-with-openai' },
Expand Down Expand Up @@ -264,6 +265,13 @@ export default defineConfig({
{ text: 'Security Hardening', link: '/how-to/security-hardening' },
{ text: 'Changelog', link: '/changelog' },
]
},
{
text: 'Community',
collapsed: true,
items: [
{ text: 'Built with Cycles Badges', link: '/community/badges' },
]
}
],
},
Expand Down
271 changes: 271 additions & 0 deletions blog/agent-budget-patterns-visual-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
---
title: "AI Agent Budget Patterns: A Practical Guide"
date: 2026-03-19
author: Cycles Team
tags: [patterns, budgets, architecture, guide]
description: "A practical reference for structuring AI agent budgets — covering tenant isolation, workflow caps, run-level limits, graceful degradation, and more."
blog: true
sidebar: false
---

# AI Agent Budget Patterns: A Practical Guide

Every team running AI agents in production eventually faces the same question: how should we structure our budgets? Too coarse and a single runaway agent burns through the allocation. Too granular and the overhead of managing hundreds of micro-budgets becomes its own problem. This guide covers the six patterns we see most often, with concrete examples and trade-offs for each.

<!-- more -->

These patterns aren't mutually exclusive — most production systems combine two or three. The [common budget patterns](/how-to/common-budget-patterns) page in our docs covers the Cycles-specific implementation details; this post focuses on the architectural thinking behind each approach.

## Pattern 1: Tenant isolation budgets

**When to use:** Multi-tenant platforms where each customer or team gets their own AI agent access and you need hard spend isolation between them.

The simplest and most common starting point. Each tenant gets an independent budget that cannot be exceeded, regardless of what other tenants are doing.

```python
# Tenant isolation: each tenant has a completely independent budget
tenant_budget = cycles.create_budget(
scope=f"tenant:{tenant_id}",
limit_dollars=500.00,
period="monthly",
on_exhausted="deny"
)

# Every agent call for this tenant checks against their budget
async def run_agent_for_tenant(tenant_id, task):
budget = cycles.get_budget(scope=f"tenant:{tenant_id}")
result = await budget.execute(
agent.run(task),
estimated_cost=estimate_task_cost(task)
)
return result
```

**Trade-offs:**
- Provides complete blast-radius isolation — one tenant's runaway agent cannot affect others
- Simple to reason about and explain to customers
- Can lead to underutilization: if Tenant A uses 10% of their budget and Tenant B hits 100%, there's no sharing
- Requires careful initial sizing — set too low and legitimate workloads get blocked

This pattern maps directly to how [tenant, workflow, and run budgets](/how-to/how-to-model-tenant-workflow-and-run-budgets-in-cycles) work in Cycles.

## Pattern 2: Workflow-level caps

**When to use:** When different agent workflows have different cost profiles and risk levels, and you want to cap each independently.

A code review agent and a deep research agent have very different cost characteristics. Workflow-level caps let you set appropriate limits for each.

```python
# Different workflows get different budgets
workflow_budgets = {
"code-review": {"limit": 2.00, "per": "run"},
"deep-research": {"limit": 25.00, "per": "run"},
"summarization": {"limit": 5.00, "per": "run"},
"chat": {"limit": 1.00, "per": "session"},
}

async def run_workflow(workflow_type, input_data):
config = workflow_budgets[workflow_type]
budget = cycles.create_budget(
scope=f"workflow:{workflow_type}:{run_id}",
limit_dollars=config["limit"],
period=config["per"]
)
return await budget.execute(agent.run(input_data))
```

**Trade-offs:**
- Right-sized limits for each use case reduce both waste and false denials
- Makes cost profiles explicit and auditable
- Requires understanding the cost distribution of each workflow upfront
- New workflows need budget configuration before deployment

## Pattern 3: Per-run budgets with graceful degradation

**When to use:** When you want agents to produce _some_ result even when they hit budget limits, rather than failing entirely.

This is the pattern that separates production-grade agent systems from prototypes. Instead of a hard stop at budget exhaustion, the agent downgrades its approach.

```python
async def research_with_degradation(query, budget_dollars=10.00):
budget = cycles.create_budget(
scope=f"run:{run_id}",
limit_dollars=budget_dollars
)

# Phase 1: Use the best model
remaining = budget.remaining()
if remaining > 5.00:
result = await budget.execute(
agent.run(query, model="claude-opus-4-20250514")
)
# Phase 2: Fall back to a cheaper model
elif remaining > 1.00:
result = await budget.execute(
agent.run(query, model="claude-sonnet-4-20250514")
)
# Phase 3: Return cached/partial results
else:
result = get_cached_or_partial_result(query)
result.metadata["degraded"] = True

return result
```

**Trade-offs:**
- Users get a result instead of an error, improving perceived reliability
- Requires designing multiple quality tiers for each workflow
- The "degraded" signal needs to propagate to the user — silent degradation erodes trust
- More complex to test: you need to validate each fallback tier

We cover degradation strategies in detail in [How to Think About Degradation Paths](/how-to/how-to-think-about-degradation-paths-in-cycles-deny-downgrade-disable-or-defer).

## Pattern 4: Shared pool with priority tiers

**When to use:** When you want to maximize utilization of a fixed budget across multiple agents or users, with guarantees for high-priority work.

Instead of giving each consumer a fixed allocation, you share a pool but enforce priority ordering when the pool runs low.

```python
# Shared pool with priority tiers
pool = cycles.create_budget(
scope="org:engineering",
limit_dollars=5000.00,
period="monthly"
)

# Priority tiers determine who gets denied first
PRIORITY_THRESHOLDS = {
"critical": 0.0, # Only denied at $0 remaining
"high": 0.10, # Denied below 10% remaining
"normal": 0.25, # Denied below 25% remaining
"low": 0.50, # Denied below 50% remaining
"bulk": 0.70, # Denied below 70% remaining (off-peak only)
}

async def execute_with_priority(task, priority="normal"):
remaining_fraction = pool.remaining() / pool.limit
threshold = PRIORITY_THRESHOLDS[priority]

if remaining_fraction <= threshold:
raise BudgetExhaustedError(
f"Pool at {remaining_fraction:.0%}, "
f"threshold for '{priority}' is {threshold:.0%}"
)

return await pool.execute(task)
```

**Trade-offs:**
- Higher overall utilization — no budget sits idle while another is exhausted
- Critical work is protected even under heavy load
- Harder to predict per-team or per-user costs for billing purposes
- Requires agreement on what constitutes "critical" vs. "low" priority
- Risk of low-priority work getting permanently starved in busy periods

## Pattern 5: Shadow mode rollout

**When to use:** When you're introducing budget controls to an existing system and need to validate limits before enforcing them.

This is less a budget _structure_ and more a deployment pattern, but it's essential for any team that isn't starting from scratch. Shadow mode tracks what _would_ have been denied without actually denying anything.

```python
# Shadow mode: log but don't enforce
budget = cycles.create_budget(
scope=f"tenant:{tenant_id}",
limit_dollars=100.00,
period="daily",
mode="shadow" # Track but don't enforce
)

# In shadow mode, execute() always succeeds but logs violations
result = await budget.execute(agent.run(task))

# After a validation period, check the shadow logs
shadow_report = cycles.get_shadow_report(
scope=f"tenant:{tenant_id}",
period="last_7_days"
)
# Output: "23 calls would have been denied. Peak overage: $47.30."
# Now you can tune the limit before switching to enforce mode.
```

**Trade-offs:**
- Zero risk of breaking production workflows during rollout
- Generates real data for sizing budgets accurately
- Adds latency (the budget check still happens, just without enforcement)
- Teams sometimes stay in shadow mode too long, delaying the value of enforcement

Our [shadow mode rollout guide](/how-to/shadow-mode-in-cycles-how-to-roll-out-budget-enforcement-without-breaking-production) walks through the full process, including how to analyze shadow logs and choose enforcement cutover criteria.

## Pattern 6: Hybrid model (tokens + dollars)

**When to use:** When you need to track both the raw resource consumption (tokens) and the monetary cost (dollars), because they don't always move in lockstep.

Token counts and dollar costs diverge when you use multiple models, when pricing changes, or when non-LLM tools (web search, code execution) are part of the agent's toolkit.

```python
# Hybrid budget: track both dimensions
budget = cycles.create_budget(
scope=f"run:{run_id}",
limits={
"tokens": 500_000, # Hard cap on token consumption
"dollars": 15.00, # Hard cap on dollar spend
},
on_exhausted="deny_and_notify"
)

async def execute_hybrid(task):
# Both limits are checked atomically
result = await budget.execute(
agent.run(task),
estimated={
"tokens": estimate_tokens(task),
"dollars": estimate_cost(task),
}
)
return result

# Useful for cases where a cheap model uses many tokens
# or an expensive model uses few
```

**Trade-offs:**
- Catches scenarios that a single-dimension budget misses (e.g., a cheap model looping uses few dollars but millions of tokens)
- Useful for capacity planning beyond just cost
- More complex to configure and explain to users
- Requires accurate estimation for both dimensions

## Combining patterns

Most production systems layer two or three of these patterns. A common combination:

1. **Tenant isolation** (Pattern 1) as the outer boundary
2. **Workflow caps** (Pattern 2) within each tenant
3. **Graceful degradation** (Pattern 3) within each workflow run
4. **Shadow mode** (Pattern 5) for rollout

This gives you hard isolation between customers, right-sized limits per use case, user-friendly behavior at the limits, and a safe path to enforcement.

```
Tenant Budget ($500/mo)
├── Code Review Workflow ($2/run)
│ └── Per-run with degradation
├── Research Workflow ($25/run)
│ └── Per-run with degradation
└── Chat Workflow ($1/session)
└── Hard deny at limit
```

The [budget allocation and management guide](/how-to/budget-allocation-and-management-in-cycles) covers how to implement these hierarchies in Cycles, and the [cost estimation cheat sheet](/how-to/cost-estimation-cheat-sheet) helps with initial sizing for each tier.

## Choosing your starting point

If you're unsure where to begin:

- **Single-tenant, single-agent:** Start with Pattern 3 (per-run with degradation)
- **Multi-tenant SaaS:** Start with Pattern 1 (tenant isolation) + Pattern 5 (shadow mode)
- **Internal platform with multiple teams:** Start with Pattern 4 (shared pool with priority)
- **Migrating from no controls:** Start with Pattern 5 (shadow mode) to gather data first

The most important step isn't picking the perfect pattern — it's having _any_ budget boundary in the execution path. You can always refine the structure later. You can't un-spend money that an uncontrolled agent already burned.
Loading
Loading