fix(voice/#602): migrate OpenAIRealtimeAgentAdapter to GA Realtime wire protocol#604
Conversation
BDD contract for migrating OpenAIRealtimeAgentAdapter from the retired beta Realtime wire protocol to GA. 15 scenarios mapping all 7 ACs of #602. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Red step (TDD): asserts the GA wire contract -- no OpenAI-Beta header, GA session.update shape (session.type + session.audio.{input,output} object formats), recv decodes response.output_audio.delta. Fails against the current beta adapter, reproducing the beta_api_shape_disabled root cause. Closes the AC6 test-coverage gap that let the regression ship.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re protocol OpenAI retired the beta Realtime wire protocol at GA (2026-05-12); the adapter spoke it end-to-end, so every run was server-closed with beta_api_shape_disabled. Migrate all three coupled layers: drop the OpenAI-Beta header (auth-only handshake); rebuild session.update to the GA shape (session.type=realtime, audio nested under session.audio.input/output with object formats, voice/transcription/turn_detection relocated); branch recv_audio on the GA response.output_audio.* event names, accepting legacy beta names defensively with a one-time warning (live gpt-realtime* may still emit them). Update module+class docstrings and the existing beta-shaped unit tests to GA, plus a defensive-legacy test. Covers AC1-AC5, AC7. Python-only; the TS adapter delegates the wire shape to the @openai/agents SDK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
interrupt() is unchanged by the GA migration, but AC5 and the feature scenario require explicit coverage that it emits response.cancel on the socket. Closes the last AC coverage gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address /review feedback (no must-fix; all polish): consolidate test_..._connects_and_sends_pcm16 into a focused send/commit/response round-trip that asserts the AC5 input_audio_buffer.commit + response.create sequence (closing a declared-scenario coverage gap), with the GA session-shape assertions now owned solely by the regression guard (removes ~50 lines of duplication); prove the one-time legacy warning fires exactly once across two legacy events; hygiene (module-level import logging, canonical call_args.kwargs accessor, getMessage()); and consolidate the recv_audio defensive-alias comments into one block noting they should be removed once GA names are confirmed stable at a live endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The OpenAI Realtime transport shipped (real WebSocket), but its e2e tests still carried a dead PendingTransportError stub-probe and a 'transport not yet shipped' docstring -- actively misleading on the exact adapter this PR fixes, and part of why the coverage story was murky. Remove the dead probe + unused requires_transport_ready param and rewrite the docstrings to describe the live, key-gated GA-handshake check. Still auto-marked integration (deselected in PR CI via -m 'not integration', runs nightly with keys); the role=USER routing unit test had no drift and is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spec-to-test fidelity (no adapter behavior change): split the AC4 transcript test into agent-side + user-side (matching the two declared scenarios); add the AC7 docstring-drift guard that was previously unbound (asserts no realtime=v1, GA event names present, auth-only header); add a tools/instructions top-level placement test; fix two scenario tags from @integration to @Unit (their tests are mocked, no-live-key unit tests); and correct an over-spec I introduced -- the session.update scenario referenced tool_choice, which the minimal adapter never sets, now reads tools+instructions to match the implementation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Automated low-risk assessment This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.
This PR requires a manual review before merging. |
|
✅ prove-it re-run on HEAD Adapter source is unchanged since the prior clean marker ( |
|
Test requirement before merge (per this PR's own caveat — recording it here so it isn't lost): The unit/integration layer is fully proven (60 passed, falsification-tested regression guard
Until that live run is captured, this PR is unit-green but not e2e-verified. |
#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t-gen (#610) * docs(voice/#606): expand STT/TTS doc comments and relax audio-to-text judge criteria Adds deliberate-choice rationale comments to OPENAI_STT_MODEL and OPENAI_TTS_MODEL in both JS (voice-models.ts) and Python (voice_models.py), noting no gpt-5-family transcription/TTS models exist on the public API as of 2026-06. Also documents the Python-only OPENAI_BOT_STT_MODEL gap in the TS file. Relaxes the multimodal-audio-to-text judge criteria from overly-specific assertions (exact voice gender, exact repeat phrasing) to behavioural checks (processed audio, coherent response, non-text format acknowledgement). Updates the stale skip comment to reflect the model swap in PR #607. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(voice/#606): update feature-file contract counts to match post-#561/#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(voice/#606): add AC4/AC5 doc comments — STT lock rationale + TTS callable-swap pattern - openai-realtime.ts: explain why `input.transcription.model` is locked to OPENAI_STT_MODEL and not exposed as a constructor option (Realtime API only accepts transcription-class models; callers who need a different model subclass the adapter) - openai-tts.ts: document that the TTS model is not a parameter by design — the pattern is to swap the whole TTSCallable rather than parameterise this one; link to OPENAI_TTS_MODEL for the current-gen rationale Closes #606 (AC4 + AC5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(examples/voice/#606): correct stale comment — model swap + unskip are in #607, not this branch Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
What & why
scenario.OpenAIRealtimeAgentAdapter(Python) spoke OpenAI's retired beta Realtime wire protocol, so every realtime run was server-closed at the handshake with4000/beta_api_shape_disabled. OpenAI removed the beta interface globally at GA (2026-05-12). This migrates the adapter to the GA wire protocol.Fixes #602.
The fix — three coupled layers (Python-only)
The beta coupling was three layers deep; they move together (there is no working intermediate state — a header-only fix just surfaces the next-layer rejection):
OpenAI-Beta: realtime=v1header (auth-onlyAuthorization: Bearer).session.update— GA shape:session.type="realtime"; audio nested undersession.audio.{input,output}; formats as objects ({"type":"audio/pcm","rate":24000});voice/transcription/turn_detectionrelocated undersession.audio.recv_audio— branch on the GAresponse.output_audio.*event names, defensively accepting the legacyresponse.audio.*names with a one-time warning (livegpt-realtime*models have been reported still emitting the beta names despite the GA docs).Module + class docstrings updated to GA.
Out of scope (confirmed in the issue's Investigation): ephemeral client-secret minting; the TypeScript adapter; Realtime/transcription model renames.
Tests
test_openai_realtime_adapter_uses_ga_wire_protocol_not_beta— verified RED against the beta code and GREEN post-fix; runs in CI with no live key. Closes the coverage gap that let the regression ship (the previous unit test asserted the beta shape and stayed green while prod was 100% broken).interrupt()→response.cancel; send →input_audio_buffer.commit→response.createround-trip; GA transcript observability (last_agent_transcript/last_user_transcript).*_e2e.pycarried a deadPendingTransportErrorprobe + a "transport not yet shipped" docstring — misleading on the very adapter this PR fixes; they're now honest live, key-gated GA-handshake checks, stillintegration-marked → deselected in PR CI, run nightly).specs/openai-realtime-ga-migration.feature— 15 scenarios mapping all 7 ACs, now bound 1:1 to tests: split the AC4 transcript test into agent-side + user-side; added the previously-unbound AC7 docstring-drift guard and a tools/instructions top-level placement test; corrected two@integration→@unittags (their tests are mocked, no-key) and atool_choiceover-spec. Every@unitscenario has a test;@e2e/@integrationare live-gated.python/tests/voice/test_adapters.py: 60 passed (17 realtime);pyright: 0 errors.Review
Multi-agent
/review(principles, hygiene, test, security): zero must-fix, security clean. Should-fix items addressed in this PR (test dedup, AC5 commit-sub-path coverage, hygiene conventions); minor nice-to-haves deferred with rationale.Scope verified (post-review diligence)
@openai/agentsmoved its Realtime transport to the GA interface at 0.1.0 (2025-08-28); the repo pins^0.3.3(and demo^0.3.9), both well past the cutover, and its event handler already uses the GAresponse.output_audio.*names. No TS-side beta-header fix needed.gpt-realtime-miniandgpt-4o-transcribeare both current OpenAI models, so the live handshake won't fail on model-not-found.✅ Caveat cleared — live GA handshake verified (2026-06-04)
The original
4000/beta_api_shape_disabledclose is gone. A live, key-gated@e2eGA realtime run was executed against the real OpenAI Realtime API (modelgpt-realtime-mini) and reproduced 4×, each time negotiating cleanly and closing normally — never4000. Wire-level evidence (websocketsDEBUG):No
4000, nobeta_api_shape_disabled, no"type":"error"frame anywhere in the captured stream.test_demo_openai_realtime_agent_e2e_success→ PASSED.Lower-confidence GA details (Investigation §6) — now confirmed live:
session.audio.output.formatcarryingrate({"type":"audio/pcm","rate":24000}) is accepted — server returnedsession.updatedwith no error.output_modalitiesissue surfaced; the current session shape negotiated successfully.Scope note: only the AGENT (direct-to-model) role opens a live socket and was verified live. The USER-role e2e (
test_openai_realtime_user_e2e)sys.exit(0)s at a pre-existing phase-2 skip-guard (introduced in #355, predates this PR) before any socket opens — it shows asFAILEDin the raw run but is unrelated to this handshake and to the GA migration.🤖 Generated with Claude Code