Skip to content

fix(voice/#602): migrate OpenAIRealtimeAgentAdapter to GA Realtime wire protocol#604

Merged
drewdrewthis merged 7 commits into
mainfrom
issue602/realtime-adapter-retired-beta-header
Jun 4, 2026
Merged

fix(voice/#602): migrate OpenAIRealtimeAgentAdapter to GA Realtime wire protocol#604
drewdrewthis merged 7 commits into
mainfrom
issue602/realtime-adapter-retired-beta-header

Conversation

@drewdrewthis
Copy link
Copy Markdown
Collaborator

@drewdrewthis drewdrewthis commented Jun 2, 2026

What & why

scenario.OpenAIRealtimeAgentAdapter (Python) spoke OpenAI's retired beta Realtime wire protocol, so every realtime run was server-closed at the handshake with 4000 / beta_api_shape_disabled. OpenAI removed the beta interface globally at GA (2026-05-12). This migrates the adapter to the GA wire protocol.

Fixes #602.

The fix — three coupled layers (Python-only)

The beta coupling was three layers deep; they move together (there is no working intermediate state — a header-only fix just surfaces the next-layer rejection):

  1. Handshake — drop the OpenAI-Beta: realtime=v1 header (auth-only Authorization: Bearer).
  2. session.update — GA shape: session.type="realtime"; audio nested under session.audio.{input,output}; formats as objects ({"type":"audio/pcm","rate":24000}); voice / transcription / turn_detection relocated under session.audio.
  3. recv_audio — branch on the GA response.output_audio.* event names, defensively accepting the legacy response.audio.* names with a one-time warning (live gpt-realtime* models have been reported still emitting the beta names despite the GA docs).

Module + class docstrings updated to GA.

Out of scope (confirmed in the issue's Investigation): ephemeral client-secret minting; the TypeScript adapter; Realtime/transcription model renames.

Tests

  • New regression guard test_openai_realtime_adapter_uses_ga_wire_protocol_not_beta — verified RED against the beta code and GREEN post-fix; runs in CI with no live key. Closes the coverage gap that let the regression ship (the previous unit test asserted the beta shape and stayed green while prod was 100% broken).
  • Defensive-legacy test (decodes the legacy name + asserts the one-time warning fires exactly once across repeats).
  • interrupt()response.cancel; send → input_audio_buffer.commitresponse.create round-trip; GA transcript observability (last_agent_transcript / last_user_transcript).
  • Removed stale stub-era drift from the realtime e2e tests (*_e2e.py carried a dead PendingTransportError probe + a "transport not yet shipped" docstring — misleading on the very adapter this PR fixes; they're now honest live, key-gated GA-handshake checks, still integration-marked → deselected in PR CI, run nightly).
  • specs/openai-realtime-ga-migration.feature — 15 scenarios mapping all 7 ACs, now bound 1:1 to tests: split the AC4 transcript test into agent-side + user-side; added the previously-unbound AC7 docstring-drift guard and a tools/instructions top-level placement test; corrected two @integration@unit tags (their tests are mocked, no-key) and a tool_choice over-spec. Every @unit scenario has a test; @e2e/@integration are live-gated.
  • python/tests/voice/test_adapters.py: 60 passed (17 realtime); pyright: 0 errors.

Review

Multi-agent /review (principles, hygiene, test, security): zero must-fix, security clean. Should-fix items addressed in this PR (test dedup, AC5 commit-sub-path coverage, hygiene conventions); minor nice-to-haves deferred with rationale.

Scope verified (post-review diligence)

  • TS adapter is genuinely unaffected@openai/agents moved its Realtime transport to the GA interface at 0.1.0 (2025-08-28); the repo pins ^0.3.3 (and demo ^0.3.9), both well past the cutover, and its event handler already uses the GA response.output_audio.* names. No TS-side beta-header fix needed.
  • Model names are valid GAgpt-realtime-mini and gpt-4o-transcribe are both current OpenAI models, so the live handshake won't fail on model-not-found.

✅ Caveat cleared — live GA handshake verified (2026-06-04)

The original 4000 / beta_api_shape_disabled close is gone. A live, key-gated @e2e GA realtime run was executed against the real OpenAI Realtime API (model gpt-realtime-mini) and reproduced , each time negotiating cleanly and closing normally — never 4000. Wire-level evidence (websockets DEBUG):

> GET /v1/realtime?model=gpt-realtime-mini HTTP/1.1
> Authorization: Bearer sk-proj-…        # auth-only — no OpenAI-Beta header
< HTTP/1.1 101 Switching Protocols
= connection is OPEN                       # handshake ACCEPTED (beta era: server-closed here)
> TEXT {"type":"session.update", …}       # GA session shape
< TEXT {"type":"session.created", …}
< TEXT {"type":"session.updated", …}       # GA session.update accepted, no error event
… input_audio_buffer.append → .commit → response.create …
< TEXT {"type":"response.output_audio.delta", …}   # GA event name (legacy-compat arm never fired)
> CLOSE 1000 (OK)  /  < CLOSE 1000 (OK)    # normal client-initiated close — NOT 4000
result.success = True

No 4000, no beta_api_shape_disabled, no "type":"error" frame anywhere in the captured stream. test_demo_openai_realtime_agent_e2e_successPASSED.

Lower-confidence GA details (Investigation §6) — now confirmed live:

  • session.audio.output.format carrying rate ({"type":"audio/pcm","rate":24000}) is accepted — server returned session.updated with no error.
  • No output_modalities issue surfaced; the current session shape negotiated successfully.

Scope note: only the AGENT (direct-to-model) role opens a live socket and was verified live. The USER-role e2e (test_openai_realtime_user_e2e) sys.exit(0)s at a pre-existing phase-2 skip-guard (introduced in #355, predates this PR) before any socket opens — it shows as FAILED in the raw run but is unrelated to this handshake and to the GA migration.

🤖 Generated with Claude Code

drewdrewthis and others added 5 commits June 2, 2026 16:26
BDD contract for migrating OpenAIRealtimeAgentAdapter from the retired beta Realtime wire protocol to GA. 15 scenarios mapping all 7 ACs of #602.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Red step (TDD): asserts the GA wire contract -- no OpenAI-Beta header, GA session.update shape (session.type + session.audio.{input,output} object formats), recv decodes response.output_audio.delta. Fails against the current beta adapter, reproducing the beta_api_shape_disabled root cause. Closes the AC6 test-coverage gap that let the regression ship.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re protocol

OpenAI retired the beta Realtime wire protocol at GA (2026-05-12); the adapter spoke it end-to-end, so every run was server-closed with beta_api_shape_disabled. Migrate all three coupled layers: drop the OpenAI-Beta header (auth-only handshake); rebuild session.update to the GA shape (session.type=realtime, audio nested under session.audio.input/output with object formats, voice/transcription/turn_detection relocated); branch recv_audio on the GA response.output_audio.* event names, accepting legacy beta names defensively with a one-time warning (live gpt-realtime* may still emit them). Update module+class docstrings and the existing beta-shaped unit tests to GA, plus a defensive-legacy test. Covers AC1-AC5, AC7. Python-only; the TS adapter delegates the wire shape to the @openai/agents SDK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
interrupt() is unchanged by the GA migration, but AC5 and the feature scenario require explicit coverage that it emits response.cancel on the socket. Closes the last AC coverage gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address /review feedback (no must-fix; all polish): consolidate test_..._connects_and_sends_pcm16 into a focused send/commit/response round-trip that asserts the AC5 input_audio_buffer.commit + response.create sequence (closing a declared-scenario coverage gap), with the GA session-shape assertions now owned solely by the regression guard (removes ~50 lines of duplication); prove the one-time legacy warning fires exactly once across two legacy events; hygiene (module-level import logging, canonical call_args.kwargs accessor, getMessage()); and consolidate the recv_audio defensive-alias comments into one block noting they should be removed once GA names are confirmed stable at a live endpoint.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drewdrewthis drewdrewthis added ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) bug Something isn't working in-ai-review Workflow: in-ai-review labels Jun 2, 2026
@drewdrewthis drewdrewthis marked this pull request as ready for review June 2, 2026 17:15
drewdrewthis and others added 2 commits June 2, 2026 17:30
The OpenAI Realtime transport shipped (real WebSocket), but its e2e tests still carried a dead PendingTransportError stub-probe and a 'transport not yet shipped' docstring -- actively misleading on the exact adapter this PR fixes, and part of why the coverage story was murky. Remove the dead probe + unused requires_transport_ready param and rewrite the docstrings to describe the live, key-gated GA-handshake check. Still auto-marked integration (deselected in PR CI via -m 'not integration', runs nightly with keys); the role=USER routing unit test had no drift and is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Spec-to-test fidelity (no adapter behavior change): split the AC4 transcript test into agent-side + user-side (matching the two declared scenarios); add the AC7 docstring-drift guard that was previously unbound (asserts no realtime=v1, GA event names present, auth-only header); add a tools/instructions top-level placement test; fix two scenario tags from @integration to @Unit (their tests are mocked, no-live-key unit tests); and correct an over-spec I introduced -- the session.update scenario referenced tool_choice, which the minimal adapter never sets, now reads tools+instructions to match the implementation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The PR changes the OpenAIRealtimeAgentAdapter’s behavior in its integration with the OpenAI Realtime API (handshake headers, session.update payload shape, and event/event-name handling), which is a third-party integration and therefore excluded from the low-risk category. Although it includes tests and doc updates, the changes alter the external API protocol the code speaks and thus do not meet the policy for automatic low-risk merging.

This PR requires a manual review before merging.

@drewdrewthis
Copy link
Copy Markdown
Collaborator Author

prove-it re-run on HEAD 3dfab94 — 7/7 ACs PASS.

Adapter source is unchanged since the prior clean marker (81b7a4c); the intervening commits were test/spec only (AC4 transcript test split, AC7 docstring-drift guard, two @integration@unit tag fixes, tool_choice over-spec fix, tools/instructions placement test) — coverage improved, behavior identical. test_adapters.py 60 passed, pyright 0, all PR checks green on 3dfab94. Residual (unchanged): AC1 live @e2e handshake needs OPENAI_API_KEY.

@drewdrewthis drewdrewthis self-assigned this Jun 4, 2026
@drewdrewthis
Copy link
Copy Markdown
Collaborator Author

Test requirement before merge (per this PR's own caveat — recording it here so it isn't lost):

The unit/integration layer is fully proven (60 passed, falsification-tested regression guard test_openai_realtime_adapter_uses_ga_wire_protocol_not_beta runs no-key in CI). But the original 4000 / beta_api_shape_disabled close cannot be reproduced offline, so the only proof the regression is actually gone is one live or recorded GA realtime run with OPENAI_API_KEY set:

Until that live run is captured, this PR is unit-green but not e2e-verified.

@drewdrewthis drewdrewthis merged commit 3765f3c into main Jun 4, 2026
19 checks passed
@drewdrewthis drewdrewthis deleted the issue602/realtime-adapter-retired-beta-header branch June 4, 2026 15:21
drewdrewthis added a commit that referenced this pull request Jun 4, 2026
#604 reality

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drewdrewthis added a commit that referenced this pull request Jun 4, 2026
#604 reality

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drewdrewthis added a commit that referenced this pull request Jun 5, 2026
…t-gen (#610)

* docs(voice/#606): expand STT/TTS doc comments and relax audio-to-text judge criteria

Adds deliberate-choice rationale comments to OPENAI_STT_MODEL and
OPENAI_TTS_MODEL in both JS (voice-models.ts) and Python (voice_models.py),
noting no gpt-5-family transcription/TTS models exist on the public API as
of 2026-06. Also documents the Python-only OPENAI_BOT_STT_MODEL gap in the
TS file. Relaxes the multimodal-audio-to-text judge criteria from
overly-specific assertions (exact voice gender, exact repeat phrasing) to
behavioural checks (processed audio, coherent response, non-text format
acknowledgement). Updates the stale skip comment to reflect the model swap
in PR #607.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(voice/#606): update feature-file contract counts to match post-#561/#604 reality

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(voice/#606): add AC4/AC5 doc comments — STT lock rationale + TTS callable-swap pattern

- openai-realtime.ts: explain why `input.transcription.model` is locked to
  OPENAI_STT_MODEL and not exposed as a constructor option (Realtime API
  only accepts transcription-class models; callers who need a different model
  subclass the adapter)
- openai-tts.ts: document that the TTS model is not a parameter by design —
  the pattern is to swap the whole TTSCallable rather than parameterise this
  one; link to OPENAI_TTS_MODEL for the current-gen rationale

Closes #606 (AC4 + AC5)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(examples/voice/#606): correct stale comment — model swap + unskip are in #607, not this branch

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) bug Something isn't working in-ai-review Workflow: in-ai-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Realtime adapter sends retired OpenAI-Beta: realtime=v1 header → all realtime runs rejected (beta_api_shape_disabled)

2 participants