RFC: A real-time budget decision plane for AI agent runs #1
iamapsrajput
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
AI agents don't consume tokens the way chat does. An agent runs a loop: observe, think, act, repeat, and each iteration resends the accumulated context. By step 20 of a run with file reads, a single call can exceed 50K input tokens. Reported cases from the past year include a developer hitting $4,200 in API fees over one weekend of autonomous refactoring. The mechanism is structural: unbounded loops resending growing context.
Existing gateway budgets attach to keys, users, or teams over daily or monthly windows. The damage unit for agents is the run: one autonomous session that needs a ceiling in dollars, not a monthly quota it can exhaust in an hour. And when today's budget checks do fire, the request dies with an opaque error the agent can't adapt to.
This RFC proposes a real-time budget decision plane: a run-scoped budget authority that atomically reserves estimated spend before provider calls (reserve → commit → refund, so parallel tool calls can't race past a ceiling), reconciles against actual usage after calls, fails closed on unknown prices, and exposes machine-readable budget state so agents can downshift to cheaper, capability-valid models before they exhaust the run.
Full document: RFC
Delivery is planned as a LiteLLM pre-call hook first (composes with existing deployments, replaces nothing), a standalone sidecar later. Apache 2.0.
Where I most want pushback
Reply to any of these by number:
actuals_only(no estimation trust required) the necessary on-ramp?effective_max_output_tokensbasis) acceptable, or does it over-reserve so aggressively for parallel workloads that soft-gate margins are preferable?Background: this design grew out of the budget layer of an LLM router I've built and maintained since 2025. I'll keep RFC.md updated as feedback lands; a changelog at the bottom of the document tracks what changed between versions.
Beta Was this translation helpful? Give feedback.
All reactions