Where should the swarm cost-ceiling live — agent runtime or external gateway? #24

OpsToInnovator · 2026-06-19T02:11:32Z

OpsToInnovator
Jun 19, 2026

Looking through the Cost Recorder & Caps work in newcore and the MAX_COST_USD swarm-level ceiling — clean implementation, and the right primitive to have early. Wanted to surface an architectural question while it's still cheap to move on, because I think Velocity is sitting on a fork in the road that other agentic platforms (Cursor, Aider, Open-Devin) have all hit and answered differently.

The question: should budget enforcement live inside the agent runtime (where it can reason about whole-task plans, pre-allocate worst-case spend per sub-task, and abort planning if the task can't fit in budget), or in front of the model layer (where it sees every provider call regardless of which agent or sub-agent made it, and enforces hard caps that even buggy/runaway agents can't bypass)?

Both are valid, and they have different failure modes:

Inside the runtime — best when the agent knows the full plan up front. Worst when an agent enters a retry loop, fans out to sub-agents that don't share state, or when a user runs multiple swarms against the same provider quota in parallel.
In front of the model layer — best at containing damage from anything the runtime didn't anticipate (retries, fallbacks, sub-agent fan-out, user-spawned parallel swarms). Worst at task-level reasoning, because by the time the request hits the proxy, the planning context is gone.

The honest answer in production is probably both, with a clean handoff. Velocity sits beautifully in the first camp. The second camp is where I've been building — a small AGPL proxy called Bulwark that enforces per-key daily/monthly USD caps and a "Bedtime Mode" that trips when today's spend hits 2× the rolling baseline during sleeping hours. Different layer of the stack, same family of problem.

Three specific things I'd be curious about:

Distributed cost state. The current MAX_COST_USD ceiling appears to be per-swarm. What's the plan when a user runs three Velocity instances against the same OpenAI account — do the swarms share a cost view, or is each ceiling independent? (We hit this on Bulwark and ended up needing Durable-Object-style single-writer semantics for budget state; Workers KV's eventual consistency wasn't enough.)
Reservation vs. post-hoc accounting. Do agents reserve worst-case spend before dispatching a sub-task, or do they accumulate cost as calls return? Both work, but the second one bounds a "thundering herd of sub-agents" pattern by accepting some overshoot in exchange for hot-path latency. We picked the second on Bulwark and documented the tradeoff openly in DESIGN.md — happy to compare notes if you've landed somewhere different.
Handoff surface. If a tool like Bulwark sits in front of Velocity, what's the cleanest way for the runtime to communicate "this sub-task should cost about X" to the proxy, and for the proxy to push back "you have $Y remaining in this window"? An x-budget-hint request header + a x-budget-remaining response header would be a minimal start, but I'm curious whether you'd want richer two-way negotiation or prefer to keep the layers ignorant of each other.

Not pitching anything — genuinely interested in where you've landed on the proxy-vs-runtime split, because I think it'll affect what's worth building in the gateway layer vs leaving to the runtime. Happy to write up Bulwark's design tradeoffs in more depth if it's useful as a reference point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Where should the swarm cost-ceiling live — agent runtime or external gateway? #24

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Where should the swarm cost-ceiling live — agent runtime or external gateway? #24

Uh oh!

OpsToInnovator Jun 19, 2026

Replies: 0 comments

OpsToInnovator
Jun 19, 2026