Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ export default defineConfig({
{ text: 'Cycles vs Guardrails AI', link: '/concepts/cycles-vs-guardrails-ai' },
{ text: 'Cycles vs Provider Caps', link: '/concepts/cycles-vs-provider-spending-caps' },
{ text: 'Cycles vs Token Counters', link: '/concepts/cycles-vs-custom-token-counters' },
{ text: 'Coding Agents Need Budget Authority', link: '/concepts/coding-agents-need-runtime-budget-authority' },
{ text: 'Coding Agents Need Runtime Authority', link: '/concepts/coding-agents-need-runtime-budget-authority' },
{ text: 'Why Agents Do Not Replace Cycles', link: '/concepts/why-coding-agents-do-not-replace-cycles' },
{ text: 'Glossary', link: '/glossary' },
]
Expand Down
4 changes: 2 additions & 2 deletions blog/ai-agent-budget-control-enforce-hard-spend-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ For a detailed shadow mode guide, see [Shadow Mode: How to Roll Out Budget Enfor

- **[What is Cycles?](/quickstart/what-is-cycles)** — the runtime that implements the reserve-commit enforcement pattern
- **[End-to-End Tutorial](/quickstart/end-to-end-tutorial)** — walk through the full reserve → execute → commit lifecycle hands-on
- **[From Observability to Enforcement](/concepts/from-observability-to-enforcement-how-teams-evolve-from-dashboards-to-budget-authority)** — the maturity curve from dashboards and alerts to runtime budget authority
- **[From Observability to Enforcement](/concepts/from-observability-to-enforcement-how-teams-evolve-from-dashboards-to-budget-authority)** — the maturity curve from dashboards and alerts to runtime authority
- **[AI Agent Budget Patterns: A Practical Guide](/blog/agent-budget-patterns-visual-guide)** — six common patterns with code examples and trade-offs
- **[Multi-Tenant AI Cost Control](/blog/multi-tenant-ai-cost-control-per-tenant-budgets-quotas-isolation)** — per-tenant budgets, quotas, and isolation for SaaS platforms
- **[Vibe Coding a Budget Wrapper vs. Owning a Budget Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority)** — why the gap between a prototype and production enforcement is larger than it looks
- **[Vibe Coding a Budget Wrapper vs. Owning a Runtime Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority)** — why the gap between a prototype and production enforcement is larger than it looks
- **[Getting Started](/quickstart/getting-started-with-the-python-client)** — integrate with the [Python](/quickstart/getting-started-with-the-python-client), [TypeScript](/quickstart/getting-started-with-the-typescript-client), or [MCP Server](/quickstart/getting-started-with-the-mcp-server) client
16 changes: 8 additions & 8 deletions blog/ai-agent-cost-management-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ This guide presents a maturity model for AI agent cost management. Five tiers, f
| 1 | Monitoring | Dashboards and cost visibility | No | Hours |
| 2 | Alerting | Automated notifications on thresholds | No | Minutes |
| 3 | Soft Limits | Rate limiting, provider caps, counters | Partially | Seconds (but leaky) |
| 4 | Hard Enforcement | Pre-execution budget authority | Yes | Milliseconds (before execution) |
| 4 | Hard Enforcement | Pre-execution runtime authority | Yes | Milliseconds (before execution) |

Each tier builds on the one below it. You don't skip tiers — you add capabilities. A team at Tier 4 still uses dashboards (Tier 1) and alerts (Tier 2). The difference is that dashboards are no longer the _last_ line of defense.

Expand Down Expand Up @@ -142,19 +142,19 @@ Alerts are essential. They are not sufficient. Every dollar spent between "alert

## Tier 4: Hard Enforcement

**What it looks like:** A dedicated budget authority service sits in the execution path of every LLM call. Before an agent calls a model, it requests authorization from the budget service. The service atomically reserves the estimated cost. If the budget is exhausted, the call is denied before it executes. The agent receives a clear signal and can degrade gracefully.
**What it looks like:** A dedicated runtime authority service sits in the execution path of every LLM call. Before an agent calls a model, it requests authorization from the budget service. The service atomically reserves the estimated cost. If the budget is exhausted, the call is denied before it executes. The agent receives a clear signal and can degrade gracefully.

This is the tier where prevention replaces response. There is no gap between detection and action because the check happens _before_ the spend.

**How it works:**

1. Agent estimates the cost of the next LLM call
2. Agent requests a reservation from the budget authority
3. Budget authority atomically checks the balance and decrements it
2. Agent requests a reservation from the runtime authority
3. Runtime authority atomically checks the balance and decrements it
4. If approved: the call proceeds, and actual cost is reconciled afterward
5. If denied: the agent receives a budget-exhausted signal and follows its degradation path

The atomic check-and-decrement is critical. It's what prevents the TOCTOU race condition from Tier 3. No matter how many concurrent agents check simultaneously, the budget authority serializes the reservations. If the budget has $5 left and two agents each request $4, one succeeds and one is denied. Always.
The atomic check-and-decrement is critical. It's what prevents the TOCTOU race condition from Tier 3. No matter how many concurrent agents check simultaneously, the runtime authority serializes the reservations. If the budget has $5 left and two agents each request $4, one succeeds and one is denied. Always.

**What you gain:**

Expand All @@ -169,7 +169,7 @@ The atomic check-and-decrement is critical. It's what prevents the TOCTOU race c

**What Cycles provides at this tier:**

[Cycles](/) is built specifically for Tier 4. It's an open-source budget authority system that enforces hard spend limits before execution. The core API is a reserve-execute-commit loop that works across any model provider and any agent framework.
[Cycles](/) is built specifically for Tier 4. It's an open-source runtime authority system that enforces hard spend limits before execution. The core API is a reserve-execute-commit loop that works across any model provider and any agent framework.

Budgets can be scoped at any level — per tenant, per workflow, per run, or any combination. When a budget is exhausted, the denial includes enough context for the agent to make an intelligent decision: fall back to a cheaper model, return a partial result, or stop and explain why.

Expand Down Expand Up @@ -200,9 +200,9 @@ The best-run teams we see operate at all tiers simultaneously:

- **Tier 1 (Monitoring):** Dashboards showing real-time and historical spend by tenant, workflow, and model. Used for capacity planning, cost optimization, and trend analysis.
- **Tier 2 (Alerting):** Alerts on anomalies that enforcement alone doesn't catch — unusual patterns, new cost trends, budget utilization approaching limits. These are informational alerts for humans, not enforcement mechanisms.
- **Tier 4 (Hard Enforcement):** Cycles budget authority in the execution path. Every call is authorized before execution. Budgets are scoped per-tenant and per-run.
- **Tier 4 (Hard Enforcement):** Cycles runtime authority in the execution path. Every call is authorized before execution. Budgets are scoped per-tenant and per-run.

Notice Tier 3 is absent. That's intentional. Once you have Tier 4, rate limits and application counters are redundant for cost control. (For more on why building your own enforcement layer is deceptively complex, see [Vibe Coding a Budget Wrapper vs. Owning a Budget Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority).) You might still have rate limits for other reasons (protecting downstream services, fairness), but they're no longer your cost control mechanism.
Notice Tier 3 is absent. That's intentional. Once you have Tier 4, rate limits and application counters are redundant for cost control. (For more on why building your own enforcement layer is deceptively complex, see [Vibe Coding a Budget Wrapper vs. Owning a Runtime Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority).) You might still have rate limits for other reasons (protecting downstream services, fairness), but they're no longer your cost control mechanism.

The monitoring and alerting layers serve a different purpose once enforcement is in place. They shift from "detect overspend" to "understand cost patterns and optimize." An alert that says "Tenant X is using 80% of their monthly budget on day 15" isn't an emergency — enforcement prevents overspend. But it's a signal that you should review their budget allocation or their agent efficiency.

Expand Down
10 changes: 5 additions & 5 deletions blog/claude-code-cursor-windsurf-budget-limits-mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ For the six MCP integration patterns (simple reserve/commit, preflight, graceful

This is where MCP-based enforcement becomes more useful than a simple kill switch.

When the session budget is getting low, the budget authority does not just deny — it returns `ALLOW_WITH_CAPS` with constraints the agent can use to self-regulate:
When the session budget is getting low, the runtime authority does not just deny — it returns `ALLOW_WITH_CAPS` with constraints the agent can use to self-regulate:

- **`maxTokens: 500`** — the agent generates shorter responses, completing the thought in fewer words
- **`maxStepsRemaining: 3`** — the agent knows to wrap up in three more steps, so it prioritizes finishing the current task over starting new ones
Expand Down Expand Up @@ -181,16 +181,16 @@ Prompt-based tracking breaks in practice because:
- **Agents hallucinate token counts.** An agent told to "track your token usage" will estimate, round, lose count, or simply stop tracking after the context grows long enough.
- **Instructions degrade under long contexts.** A system prompt instruction to "stop after $10" competes with every other instruction in the context. In a 100k-token conversation, budget tracking is easily forgotten.
- **No atomicity.** If two concurrent sessions share a budget, prompt-based tracking cannot prevent both from spending simultaneously. Each agent sees its own estimate of what is left.
- **No enforcement.** The agent can choose to ignore its own tracking. A budget authority cannot be ignored — it returns ALLOW or DENY, and the tool call does not happen.
- **No enforcement.** The agent can choose to ignore its own tracking. A runtime authority cannot be ignored — it returns ALLOW or DENY, and the tool call does not happen.

The MCP-based approach is structurally different. The budget authority is an external process with its own state, atomicity guarantees, and enforcement semantics. The agent calls `cycles_reserve` and gets back a decision. It cannot negotiate, hallucinate, or reason its way around that decision. The money is either available or it is not.
The MCP-based approach is structurally different. The runtime authority is an external process with its own state, atomicity guarantees, and enforcement semantics. The agent calls `cycles_reserve` and gets back a decision. It cannot negotiate, hallucinate, or reason its way around that decision. The money is either available or it is not.

For the extended argument about why the gap between a wrapper and an authority is larger than it looks, see [Vibe Coding a Budget Wrapper vs. Owning a Budget Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority).
For the extended argument about why the gap between a wrapper and an authority is larger than it looks, see [Vibe Coding a Budget Wrapper vs. Owning a Runtime Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority).

## Next Steps

- **[Getting Started with the MCP Server](/quickstart/getting-started-with-the-mcp-server)** — per-host configuration for Claude Desktop, Claude Code, Cursor, and Windsurf
- **[Integrating Cycles with MCP](/how-to/integrating-cycles-with-mcp)** — advanced patterns: preflight decisions, graceful degradation, long-running operations, fire-and-forget events
- **[Caps and the Three-Way Decision Model](/protocol/caps-and-the-three-way-decision-model-in-cycles)** — protocol reference for ALLOW, ALLOW_WITH_CAPS, and DENY
- **[End-to-End Tutorial](/quickstart/end-to-end-tutorial)** — walk through the complete reserve-commit lifecycle hands-on
- **[Vibe Coding a Budget Wrapper vs. Owning a Budget Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority)** — why external budget authority beats self-policing
- **[Vibe Coding a Budget Wrapper vs. Owning a Runtime Authority](/blog/vibe-coding-budget-wrapper-vs-budget-authority)** — why external runtime authority beats self-policing
12 changes: 6 additions & 6 deletions blog/cycles-vs-llm-proxies-and-observability-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ By Monday morning, the agent has made 4,700 calls and consumed $2,800. The team

Every tool in the stack worked exactly as designed. None of them prevented the overspend.

The missing layer was not routing or visibility. It was **budget authority** — a pre-execution decision about whether the next action should proceed given the remaining budget.
The missing layer was not routing or visibility. It was **runtime authority** — a pre-execution decision about whether the next action should proceed given the remaining budget.

## Three layers, three questions

Expand Down Expand Up @@ -95,7 +95,7 @@ The gap appears when you need to **enforce** a budget, not just **track** spend.

### Using both together

The proxy and the budget authority serve different purposes. They compose naturally.
The proxy and the runtime authority serve different purposes. They compose naturally.

```
Agent
Expand Down Expand Up @@ -179,7 +179,7 @@ By the time an alert fires and a human responds, the system has already spent. T

### Using both together

Observability and budget authority form a feedback loop.
Observability and runtime authority form a feedback loop.

**Observability informs budgets.** Trace data shows you what runs actually cost — the distribution of per-run spend, which models drive the most cost, which workflows are bursty. This is how you set accurate budget limits instead of guessing.

Expand All @@ -200,7 +200,7 @@ Each layer in a production LLM stack answers a different question.
```
Agent
├─ Cycles (budget authority) → Should this action proceed?
├─ Cycles (runtime authority) → Should this action proceed?
├─ LLM Proxy (routing layer) → Which model handles this call?
Expand All @@ -224,7 +224,7 @@ The question is not "which one should I use?"

It is "which layer is missing?"

For most teams running autonomous agents, the missing layer is budget authority.
For most teams running autonomous agents, the missing layer is runtime authority.

## Next steps

Expand All @@ -234,6 +234,6 @@ For most teams running autonomous agents, the missing layer is budget authority.
- [The True Cost of Uncontrolled AI Agents](/blog/true-cost-of-uncontrolled-agents) — real-world costs of running agents without budget limits
- [5 Real-World AI Agent Failures That Budget Controls Would Have Prevented](/blog/ai-agent-failures-budget-controls-prevent) — concrete failure scenarios with dollar math
- [AI Agent Cost Management: The Complete Guide](/blog/ai-agent-cost-management-guide) — the five-tier maturity model from no controls to hard enforcement
- [You Can Vibe Code a Budget Wrapper](/blog/vibe-coding-budget-wrapper-vs-budget-authority) — why building a prototype is easy but owning a budget authority is not
- [You Can Vibe Code a Budget Wrapper](/blog/vibe-coding-budget-wrapper-vs-budget-authority) — why building a prototype is easy but owning a runtime authority is not
- [End-to-End Tutorial](/quickstart/end-to-end-tutorial) — set up Cycles with a working agent in under 30 minutes
- [Shadow Mode Rollout](/how-to/shadow-mode-in-cycles-how-to-roll-out-budget-enforcement-without-breaking-production) — evaluate budget enforcement on real traffic without blocking anything
2 changes: 1 addition & 1 deletion blog/how-much-do-ai-agents-cost.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ Ten users triggering multi-agent workflows simultaneously means 160 concurrent L

Knowing your costs is the first step. Controlling them is the next.

Agent costs are a function of call patterns, not just token prices. A 10% change in model pricing matters far less than a runaway loop that makes 500 calls instead of 50. We wrote about [why monitoring alone isn't sufficient](/blog/true-cost-of-uncontrolled-agents#the-observability-gap) and how [pre-execution budget authority](/blog/true-cost-of-uncontrolled-agents#budget-authority-as-infrastructure) closes the gap.
Agent costs are a function of call patterns, not just token prices. A 10% change in model pricing matters far less than a runaway loop that makes 500 calls instead of 50. We wrote about [why monitoring alone isn't sufficient](/blog/true-cost-of-uncontrolled-agents#the-observability-gap) and how [pre-execution runtime authority](/blog/true-cost-of-uncontrolled-agents#runtime-authority-as-infrastructure) closes the gap.

[Cycles](/) provides this layer. Every LLM call checks against a budget before executing. When the budget is exhausted, the call is denied and the agent degrades gracefully.

Expand Down
6 changes: 3 additions & 3 deletions blog/introducing-cycles-blog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@ title: Introducing the Cycles Blog
date: 2026-03-14
author: Cycles Team
tags: [announcement]
description: We're launching our blog to share engineering insights, product updates, and best practices for budget authority in autonomous systems.
description: We're launching our blog to share engineering insights, product updates, and best practices for runtime authority in autonomous systems.
blog: true
sidebar: false
---

# Introducing the Cycles Blog

We're excited to launch the Cycles blog — a space for engineering deep-dives, product updates, and practical guidance on budget authority for autonomous agents.
We're excited to launch the Cycles blog — a space for engineering deep-dives, product updates, and practical guidance on runtime authority for autonomous agents.

<!-- more -->

Expand All @@ -23,7 +23,7 @@ We're excited to launch the Cycles blog — a space for engineering deep-dives,

## Why a Blog?

Our docs cover the _how_. The blog covers the _why_ — the thinking behind design decisions, the incidents that shaped the protocol, and the patterns we see across teams adopting budget authority.
Our docs cover the _how_. The blog covers the _why_ — the thinking behind design decisions, the incidents that shaped the protocol, and the patterns we see across teams adopting runtime authority.

## Next Steps

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Every major AI provider offers some form of spending limit — OpenAI monthly ca

**Reactive, not preventive.** Most provider caps operate on billing cycles. They tell you what happened; they do not block the next model call in real time. By the time the cap triggers, the damage is done — and it affects everyone.

The structural problem is clear: provider caps protect the provider's exposure to you, not your exposure to individual customers. For multi-tenant AI platforms, the enforcement boundary must exist **per customer, inside your runtime**. This is the problem [Cycles](/) was built to solve — budget authority as infrastructure, enforced before execution, scoped to each tenant.
The structural problem is clear: provider caps protect the provider's exposure to you, not your exposure to individual customers. For multi-tenant AI platforms, the enforcement boundary must exist **per customer, inside your runtime**. This is the problem [Cycles](/) was built to solve — runtime authority as infrastructure, enforced before execution, scoped to each tenant.

## What Per-Tenant Budget Enforcement Looks Like

Expand Down
Loading
Loading