Where should the swarm cost-ceiling live — agent runtime or external gateway? #24
OpsToInnovator
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Looking through the Cost Recorder & Caps work in
newcoreand theMAX_COST_USDswarm-level ceiling — clean implementation, and the right primitive to have early. Wanted to surface an architectural question while it's still cheap to move on, because I think Velocity is sitting on a fork in the road that other agentic platforms (Cursor, Aider, Open-Devin) have all hit and answered differently.The question: should budget enforcement live inside the agent runtime (where it can reason about whole-task plans, pre-allocate worst-case spend per sub-task, and abort planning if the task can't fit in budget), or in front of the model layer (where it sees every provider call regardless of which agent or sub-agent made it, and enforces hard caps that even buggy/runaway agents can't bypass)?
Both are valid, and they have different failure modes:
The honest answer in production is probably both, with a clean handoff. Velocity sits beautifully in the first camp. The second camp is where I've been building — a small AGPL proxy called Bulwark that enforces per-key daily/monthly USD caps and a "Bedtime Mode" that trips when today's spend hits 2× the rolling baseline during sleeping hours. Different layer of the stack, same family of problem.
Three specific things I'd be curious about:
Distributed cost state. The current
MAX_COST_USDceiling appears to be per-swarm. What's the plan when a user runs three Velocity instances against the same OpenAI account — do the swarms share a cost view, or is each ceiling independent? (We hit this on Bulwark and ended up needing Durable-Object-style single-writer semantics for budget state; Workers KV's eventual consistency wasn't enough.)Reservation vs. post-hoc accounting. Do agents reserve worst-case spend before dispatching a sub-task, or do they accumulate cost as calls return? Both work, but the second one bounds a "thundering herd of sub-agents" pattern by accepting some overshoot in exchange for hot-path latency. We picked the second on Bulwark and documented the tradeoff openly in DESIGN.md — happy to compare notes if you've landed somewhere different.
Handoff surface. If a tool like Bulwark sits in front of Velocity, what's the cleanest way for the runtime to communicate "this sub-task should cost about X" to the proxy, and for the proxy to push back "you have $Y remaining in this window"? An
x-budget-hintrequest header + ax-budget-remainingresponse header would be a minimal start, but I'm curious whether you'd want richer two-way negotiation or prefer to keep the layers ignorant of each other.Not pitching anything — genuinely interested in where you've landed on the proxy-vs-runtime split, because I think it'll affect what's worth building in the gateway layer vs leaving to the runtime. Happy to write up Bulwark's design tradeoffs in more depth if it's useful as a reference point.
Beta Was this translation helpful? Give feedback.
All reactions