fix(voice/#602): migrate OpenAIRealtimeAgentAdapter to GA Realtime wire protocol by drewdrewthis · Pull Request #604 · langwatch/scenario

drewdrewthis · 2026-06-02T17:08:13Z

What & why

scenario.OpenAIRealtimeAgentAdapter (Python) spoke OpenAI's retired beta Realtime wire protocol, so every realtime run was server-closed at the handshake with 4000 / beta_api_shape_disabled. OpenAI removed the beta interface globally at GA (2026-05-12). This migrates the adapter to the GA wire protocol.

Fixes #602.

The fix — three coupled layers (Python-only)

The beta coupling was three layers deep; they move together (there is no working intermediate state — a header-only fix just surfaces the next-layer rejection):

Handshake — drop the OpenAI-Beta: realtime=v1 header (auth-only Authorization: Bearer).
session.update — GA shape: session.type="realtime"; audio nested under session.audio.{input,output}; formats as objects ({"type":"audio/pcm","rate":24000}); voice / transcription / turn_detection relocated under session.audio.
recv_audio — branch on the GA response.output_audio.* event names, defensively accepting the legacy response.audio.* names with a one-time warning (live gpt-realtime* models have been reported still emitting the beta names despite the GA docs).

Module + class docstrings updated to GA.

Out of scope (confirmed in the issue's Investigation): ephemeral client-secret minting; the TypeScript adapter; Realtime/transcription model renames.

Tests

New regression guard test_openai_realtime_adapter_uses_ga_wire_protocol_not_beta — verified RED against the beta code and GREEN post-fix; runs in CI with no live key. Closes the coverage gap that let the regression ship (the previous unit test asserted the beta shape and stayed green while prod was 100% broken).
Defensive-legacy test (decodes the legacy name + asserts the one-time warning fires exactly once across repeats).
interrupt() → response.cancel; send → input_audio_buffer.commit → response.create round-trip; GA transcript observability (last_agent_transcript / last_user_transcript).
Removed stale stub-era drift from the realtime e2e tests (*_e2e.py carried a dead PendingTransportError probe + a "transport not yet shipped" docstring — misleading on the very adapter this PR fixes; they're now honest live, key-gated GA-handshake checks, still integration-marked → deselected in PR CI, run nightly).
specs/openai-realtime-ga-migration.feature — 15 scenarios mapping all 7 ACs, now bound 1:1 to tests: split the AC4 transcript test into agent-side + user-side; added the previously-unbound AC7 docstring-drift guard and a tools/instructions top-level placement test; corrected two @integration→@unit tags (their tests are mocked, no-key) and a tool_choice over-spec. Every @unit scenario has a test; @e2e/@integration are live-gated.
python/tests/voice/test_adapters.py: 60 passed (17 realtime); pyright: 0 errors.

Review

Multi-agent /review (principles, hygiene, test, security): zero must-fix, security clean. Should-fix items addressed in this PR (test dedup, AC5 commit-sub-path coverage, hygiene conventions); minor nice-to-haves deferred with rationale.

Scope verified (post-review diligence)

TS adapter is genuinely unaffected — @openai/agents moved its Realtime transport to the GA interface at 0.1.0 (2025-08-28); the repo pins ^0.3.3 (and demo ^0.3.9), both well past the cutover, and its event handler already uses the GA response.output_audio.* names. No TS-side beta-header fix needed.
Model names are valid GA — gpt-realtime-mini and gpt-4o-transcribe are both current OpenAI models, so the live handshake won't fail on model-not-found.

✅ Caveat cleared — live GA handshake verified (2026-06-04)

The original 4000 / beta_api_shape_disabled close is gone. A live, key-gated @e2e GA realtime run was executed against the real OpenAI Realtime API (model gpt-realtime-mini) and reproduced 4×, each time negotiating cleanly and closing normally — never 4000. Wire-level evidence (websockets DEBUG):

> GET /v1/realtime?model=gpt-realtime-mini HTTP/1.1
> Authorization: Bearer sk-proj-…        # auth-only — no OpenAI-Beta header
< HTTP/1.1 101 Switching Protocols
= connection is OPEN                       # handshake ACCEPTED (beta era: server-closed here)
> TEXT {"type":"session.update", …}       # GA session shape
< TEXT {"type":"session.created", …}
< TEXT {"type":"session.updated", …}       # GA session.update accepted, no error event
… input_audio_buffer.append → .commit → response.create …
< TEXT {"type":"response.output_audio.delta", …}   # GA event name (legacy-compat arm never fired)
> CLOSE 1000 (OK)  /  < CLOSE 1000 (OK)    # normal client-initiated close — NOT 4000
result.success = True

No 4000, no beta_api_shape_disabled, no "type":"error" frame anywhere in the captured stream. test_demo_openai_realtime_agent_e2e_success → PASSED.

Lower-confidence GA details (Investigation §6) — now confirmed live:

session.audio.output.format carrying rate ({"type":"audio/pcm","rate":24000}) is accepted — server returned session.updated with no error.
No output_modalities issue surfaced; the current session shape negotiated successfully.

Scope note: only the AGENT (direct-to-model) role opens a live socket and was verified live. The USER-role e2e (test_openai_realtime_user_e2e) sys.exit(0)s at a pre-existing phase-2 skip-guard (introduced in #355, predates this PR) before any socket opens — it shows as FAILED in the raw run but is unrelated to this handshake and to the GA migration.

🤖 Generated with Claude Code

BDD contract for migrating OpenAIRealtimeAgentAdapter from the retired beta Realtime wire protocol to GA. 15 scenarios mapping all 7 ACs of #602. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Red step (TDD): asserts the GA wire contract -- no OpenAI-Beta header, GA session.update shape (session.type + session.audio.{input,output} object formats), recv decodes response.output_audio.delta. Fails against the current beta adapter, reproducing the beta_api_shape_disabled root cause. Closes the AC6 test-coverage gap that let the regression ship. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…re protocol OpenAI retired the beta Realtime wire protocol at GA (2026-05-12); the adapter spoke it end-to-end, so every run was server-closed with beta_api_shape_disabled. Migrate all three coupled layers: drop the OpenAI-Beta header (auth-only handshake); rebuild session.update to the GA shape (session.type=realtime, audio nested under session.audio.input/output with object formats, voice/transcription/turn_detection relocated); branch recv_audio on the GA response.output_audio.* event names, accepting legacy beta names defensively with a one-time warning (live gpt-realtime* may still emit them). Update module+class docstrings and the existing beta-shaped unit tests to GA, plus a defensive-legacy test. Covers AC1-AC5, AC7. Python-only; the TS adapter delegates the wire shape to the @openai/agents SDK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

interrupt() is unchanged by the GA migration, but AC5 and the feature scenario require explicit coverage that it emits response.cancel on the socket. Closes the last AC coverage gap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address /review feedback (no must-fix; all polish): consolidate test_..._connects_and_sends_pcm16 into a focused send/commit/response round-trip that asserts the AC5 input_audio_buffer.commit + response.create sequence (closing a declared-scenario coverage gap), with the GA session-shape assertions now owned solely by the regression guard (removes ~50 lines of duplication); prove the one-time legacy warning fires exactly once across two legacy events; hygiene (module-level import logging, canonical call_args.kwargs accessor, getMessage()); and consolidate the recv_audio defensive-alias comments into one block noting they should be removed once GA names are confirmed stable at a live endpoint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The OpenAI Realtime transport shipped (real WebSocket), but its e2e tests still carried a dead PendingTransportError stub-probe and a 'transport not yet shipped' docstring -- actively misleading on the exact adapter this PR fixes, and part of why the coverage story was murky. Remove the dead probe + unused requires_transport_ready param and rewrite the docstrings to describe the live, key-gated GA-handshake check. Still auto-marked integration (deselected in PR CI via -m 'not integration', runs nightly with keys); the role=USER routing unit test had no drift and is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Spec-to-test fidelity (no adapter behavior change): split the AC4 transcript test into agent-side + user-side (matching the two declared scenarios); add the AC7 docstring-drift guard that was previously unbound (asserts no realtime=v1, GA event names present, auth-only header); add a tools/instructions top-level placement test; fix two scenario tags from @integration to @Unit (their tests are mocked, no-live-key unit tests); and correct an over-spec I introduced -- the session.update scenario referenced tool_choice, which the minimal adapter never sets, now reads tools+instructions to match the implementation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T17:47:53Z

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The PR changes the OpenAIRealtimeAgentAdapter’s behavior in its integration with the OpenAI Realtime API (handshake headers, session.update payload shape, and event/event-name handling), which is a third-party integration and therefore excluded from the low-risk category. Although it includes tests and doc updates, the changes alter the external API protocol the code speaks and thus do not meet the policy for automatic low-risk merging.

This PR requires a manual review before merging.

drewdrewthis · 2026-06-02T20:27:59Z

✅ prove-it re-run on HEAD 3dfab94 — 7/7 ACs PASS.

Adapter source is unchanged since the prior clean marker (81b7a4c); the intervening commits were test/spec only (AC4 transcript test split, AC7 docstring-drift guard, two @integration→@unit tag fixes, tool_choice over-spec fix, tools/instructions placement test) — coverage improved, behavior identical. test_adapters.py 60 passed, pyright 0, all PR checks green on 3dfab94. Residual (unchanged): AC1 live @e2e handshake needs OPENAI_API_KEY.

drewdrewthis · 2026-06-04T10:17:38Z

Test requirement before merge (per this PR's own caveat — recording it here so it isn't lost):

The unit/integration layer is fully proven (60 passed, falsification-tested regression guard test_openai_realtime_adapter_uses_ga_wire_protocol_not_beta runs no-key in CI). But the original 4000 / beta_api_shape_disabled close cannot be reproduced offline, so the only proof the regression is actually gone is one live or recorded GA realtime run with OPENAI_API_KEY set:

Run the @e2e realtime handshake scenario against a live gpt-realtime* model and confirm the session opens (no 4000 close).
While doing so, confirm two lower-confidence GA details flagged in Realtime adapter sends retired OpenAI-Beta: realtime=v1 header → all realtime runs rejected (beta_api_shape_disabled) #602 Investigation §6: (1) whether session.audio.output.format must carry rate, and (2) the output_modalities field shape.

Until that live run is captured, this PR is unit-green but not e2e-verified.

#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t-gen (#610) * docs(voice/#606): expand STT/TTS doc comments and relax audio-to-text judge criteria Adds deliberate-choice rationale comments to OPENAI_STT_MODEL and OPENAI_TTS_MODEL in both JS (voice-models.ts) and Python (voice_models.py), noting no gpt-5-family transcription/TTS models exist on the public API as of 2026-06. Also documents the Python-only OPENAI_BOT_STT_MODEL gap in the TS file. Relaxes the multimodal-audio-to-text judge criteria from overly-specific assertions (exact voice gender, exact repeat phrasing) to behavioural checks (processed audio, coherent response, non-text format acknowledgement). Updates the stale skip comment to reflect the model swap in PR #607. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(voice/#606): update feature-file contract counts to match post-#561/#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(voice/#606): add AC4/AC5 doc comments — STT lock rationale + TTS callable-swap pattern - openai-realtime.ts: explain why `input.transcription.model` is locked to OPENAI_STT_MODEL and not exposed as a constructor option (Realtime API only accepts transcription-class models; callers who need a different model subclass the adapter) - openai-tts.ts: document that the TTS model is not a parameter by design — the pattern is to swap the whole TTSCallable rather than parameterise this one; link to OPENAI_TTS_MODEL for the current-gen rationale Closes #606 (AC4 + AC5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(examples/voice/#606): correct stale comment — model swap + unskip are in #607, not this branch Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

drewdrewthis and others added 5 commits June 2, 2026 16:26

test(voice/#602): add GA realtime migration feature spec

c24ce38

BDD contract for migrating OpenAIRealtimeAgentAdapter from the retired beta Realtime wire protocol to GA. 15 scenarios mapping all 7 ACs of #602. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

drewdrewthis added ai-reviewed /review was run on this PR (multi-agent: principles, hygiene, test, security) bug Something isn't working in-ai-review Workflow: in-ai-review labels Jun 2, 2026

drewdrewthis marked this pull request as ready for review June 2, 2026 17:15

drewdrewthis and others added 2 commits June 2, 2026 17:30

drewdrewthis self-assigned this Jun 4, 2026

drewdrewthis requested review from 0xdeafcafe, Aryansharma28, rogeriochaves and sergioestebance June 4, 2026 13:59

drewdrewthis mentioned this pull request Jun 4, 2026

Realtime adapter sends retired OpenAI-Beta: realtime=v1 header → all realtime runs rejected (beta_api_shape_disabled) #602

Closed

7 tasks

0xdeafcafe approved these changes Jun 4, 2026

View reviewed changes

drewdrewthis merged commit 3765f3c into main Jun 4, 2026
19 checks passed

drewdrewthis deleted the issue602/realtime-adapter-retired-beta-header branch June 4, 2026 15:21

rogeriochaves mentioned this pull request Jun 4, 2026

chore(main): release python 0.7.28 #542

Open

drewdrewthis mentioned this pull request Jun 4, 2026

fix(voice): main python-ci red — stale feature-file contract counts (108→127) after #561 #609

Closed

drewdrewthis added a commit that referenced this pull request Jun 4, 2026

fix(voice/#606): update feature-file contract counts to match post-#561/

8199d0e

#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

drewdrewthis mentioned this pull request Jun 4, 2026

Voice STT/TTS defaults still use gpt-4o-* models — decide modernization path #606

Closed

drewdrewthis added a commit that referenced this pull request Jun 4, 2026

fix(voice/#606): update feature-file contract counts to match post-#561/

6ea8b8d

#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

drewdrewthis mentioned this pull request Jun 5, 2026

feat: realtime scenario tests #154

Closed

drewdrewthis mentioned this pull request Jun 5, 2026

refactor(voice): deprecate + remove the legacy agents/realtime RealtimeAgentAdapter (superseded by GA OpenAIRealtimeAgentAdapter) #615

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice/#602): migrate OpenAIRealtimeAgentAdapter to GA Realtime wire protocol#604

fix(voice/#602): migrate OpenAIRealtimeAgentAdapter to GA Realtime wire protocol#604
drewdrewthis merged 7 commits into
mainfrom
issue602/realtime-adapter-retired-beta-header

drewdrewthis commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

drewdrewthis commented Jun 2, 2026

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewdrewthis commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What & why

The fix — three coupled layers (Python-only)

Tests

Review

Scope verified (post-review diligence)

✅ Caveat cleared — live GA handshake verified (2026-06-04)

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

drewdrewthis commented Jun 2, 2026

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drewdrewthis commented Jun 2, 2026 •

edited

Loading