blog: Reserving Authority When You Can't Pause#651
Open
amavashev wants to merge 3 commits into
Open
Conversation
New pillar post on voice / realtime agent budgets. Net-new surface
for the corpus — first post in the voice-agents pillar.
The post identifies the structural constraint: reserve-commit assumes
the agent can wait synchronously for ALLOW. Voice and realtime agents
can't — a 100ms sync gate on each audio frame would push the
conversation past the ~700ms natural-feel threshold. The fix is not to
abandon the gate; it is to position it where the latency budget can
absorb it.
Four patterns covered:
1. Predictive reservation, true-up later (reserve N minutes upfront,
commit actual at call end)
2. Tier-aware gating (sync gate on the slow-path tool calls,
predictive reservation on the fast-path audio)
3. Time-bounded floor authority (per-second auto-replenish for very
high-throughput deployments)
4. Speculative commit with deny window (slow-path only, since audio is
unrecoverable)
Stack matrix for OpenAI Realtime / Vapi / Retell AI / ElevenLabs shows
where the gate can sit in each. Voice-specific failure modes (talking-
to-itself loops, stuck conversations, premium-tier escalation runaway,
hold-music wall-clock blind spots) get their own table. Mirrors the
PocketOS two-layer fix: per-call provider caps + agent-side runtime
authority.
Internal cross-links to tracking-tokens-in-a-streaming-llm-response
(closest sibling), estimate-drift (calibration), ai-agent-action-
control (parent tier model), retry-storms-and-idempotency, when-budget-
runs-out, multi-tenant cost control, plus the just-shipped trilogy
(memory-writes, merge, computer-use).
External citations: OpenAI Realtime delivery scale page, callsphere
pricing analysis, implicit acknowledgment of ElevenLabs/Vapi/Retell
pricing models.
Reviews: internal cycles 1-3 (scorecard 9.3/10), glossary linker added
7 contextual links. Cycle 1 fact-check caught and fixed:
- OpenAI Realtime latency attribution softened (the OpenAI page is
bot-blocked from WebFetch, so "OpenAI targets X ms" became
"production deployments typically land at X ms with industry
guidance converging on ~700ms").
- Retell AI pricing range widened from $0.07-$0.15 to $0.07-$0.31
(the original ceiling was understated).
- Opener cost figure corrected ($390 in 17 minutes was implausible
given the post's own pricing table; lowered to $90).
- Retell AI capitalization standardized ("Retell AI" everywhere
brand-list contexts, not bare "Retell").
Three new pillar tags introduced: voice-agents, realtime, latency.
…-pause-to-reserve Apply/skip tally: 9 applied, 2 pushed back. Applied: - `response.function_call` → `response.function_call_arguments.*`: the OpenAI Realtime API uses function-call output items and the function_call_arguments streaming events; my original event name was not a real Realtime server event. Fixed in both the prose and the stack-by-stack table. - 80-150 ms relay hop: removed the specific band attribution. The OpenAI page does not state it. Generic phrasing: "a forwarding hop sized to fit inside the conversation's latency budget." - ElevenLabs row: clarified the $0.08-$0.24/min framing. Hosting is $0.08/min flat or $0.16/min burst; the $0.24 ceiling derives once LLM and telephony layer on at cost. - Vapi row: labeled the $0.115-$0.42/min range as an estimate (it's derived from $0.05/min orchestration plus a BYOK provider stack at cost; the actual all-in depends on provider choices). - 17-minute "$1.50-$8.00 model spend alone": tightened to "against the per-minute stack rates above" since the rates in the table mix all-in / provider / orchestration models. - Provider-layer caps: softened from "OpenAI, Vapi, Retell AI, and ElevenLabs all expose per-call or per-session limits" to "to whatever degree each provider exposes them — typically through per-session budget headers, dashboard caps, or programmatic limits." Pricing pages don't uniformly establish hard caps. - "Most production voice teams use this only..." for speculative commit: softened to "This pattern is usually safer on the slow-path tool layer." - Description trimmed 162 → 152 chars: changed "—" to ":", "sit synchronously in the path" to "sync on the hot path." - `reserve-commit` glossary link: pointed to /protocol/how-reserve- commit-works-in-cycles instead of /glossary#reservation (reserve-commit is a lifecycle term, not the reservation entry). Skipped, with reason: - Body cross-link count (11) above 5-8 pillar target: three of the eleven are the trilogy references in a single closing sentence that names the sibling extension series (memory-writes, merge, computer-use). They are coherent as a triple, not redundant. - 2026-05-20 publish date: intentional sequence after the trilogy (5/16, 5/18, 5/19, 5/20). Codex verified upstream: ElevenLabs/Vapi/Retell AI pricing pages, OpenAI Realtime API event surface (function_call_arguments.delta / .done are the actual streaming events), and the cycles-docs main- branch internal targets. Sibling links to memory-writes, merge, and computer-use treated as just-merged via PR #648-#650.
This was referenced May 15, 2026
…o 2026-06-06 Moved from 2026-05-20 to 2026-06-06 to match the weekly publishing cadence for the action-authority extension arc. Sequence: memory 5/16, merge 5/23, computer-use 5/30, voice 6/06.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First post in a new pillar: voice / realtime agent governance. Net-new surface for the corpus.
The post identifies the structural constraint: reserve-commit assumes the agent can wait synchronously for ALLOW, but voice agents can't — a 100ms sync gate on each audio frame would push the conversation past the ~700ms natural-feel threshold. The fix is not to abandon the gate; it is to position it where the latency budget can absorb it.
Four patterns covered:
Voice-specific failure modes (talking-to-itself loops, stuck conversations, premium-tier escalation runaway, hold-music wall-clock blind spots) get their own table. Mirrors the PocketOS two-layer fix: per-call provider caps + agent-side runtime authority.
Stack matrix for OpenAI Realtime / Vapi / Retell AI / ElevenLabs shows where the gate can sit in each.
Author: Albert Mavashev
Date: 2026-05-20
Word count: ~3,100 body
Reviews
Codex verified upstream:
response.function_call_arguments.delta/.doneare the actual events — my original `response.function_call` was wrong; fixed)Cycle 1 fact-check caught and fixed:
Per-dimension scores
Overall: 9.3 / 10
Test plan
Dependencies and order
This post links to three sibling posts that are on PR branches awaiting merge: `/blog/agent-memory-writes-are-actions-too` (PR #648), `/blog/when-coding-agents-press-merge` (PR #649), `/blog/computer-use-agents-have-no-tool-boundary` (PR #650). Merge order: #648 → #649 → #650 → this PR so the trilogy + this voice post all land with working cross-links.
Tags
Three new pillar tags introduced for the voice surface: `voice-agents`, `realtime`, `latency`. Future voice posts should align on these. Other tags (`budgets`, `runtime-authority`, `agents`, `engineering`, `RISK_POINTS`) match corpus convention.