fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini by drewdrewthis · Pull Request #607 · langwatch/scenario

drewdrewthis · 2026-06-04T15:54:15Z

Problem

The voice/audio examples pinned gpt-4o-audio-preview, which OpenAI has deleted — 404 model_not_found since 2026-05-19. Any user running the canonical voice-to-voice example (or audio-to-text) hits an immediate 404. The SDK core is unaffected (OpenAIRealtimeAgentAdapter uses gpt-realtime-mini).

Fix

Swap to gpt-audio-mini — OpenAI's current cost-efficient GA audio-chat model — matching the Python twin, which already migrated (python/scenario/config/voice_models.py:44 OPENAI_AUDIO_CHAT_MODEL, python/examples/test_audio_to_text.py:157). Closes a py↔ts example-parity gap left by #561.

Files changed (4 lines):

examples/vitest/tests/helpers/openai-voice-agent.ts — model literal + 2 doc-comments
examples/vitest/tests/multimodal-audio-to-text.test.ts — model literal

Verification (live, against prod LangWatch)

gpt-audio-mini accepts the identical chat.completions shape (modalities:["text","audio"], audio:{voice,format}) and returns audio — confirmed via direct /v1/chat/completions call.
multimodal-voice-to-voice-conversation.test.ts: ✅ success: true, real 2-turn conversation, judge passed, traces landed in prod (project_bZspxwkhCD4POvqmIgOr2).
multimodal-audio-to-audio.test.ts: ✅ passes.
multimodal-audio-to-text.test.ts: no longer 404s, but result.success=false on brittle judge criteria ("guesses it's a male voice", "says what format the input was") — gpt-audio-mini doesn't reliably volunteer all three. This is a pre-existing brittle-criteria sensitivity, not a regression from this fix; tracked in Voice STT/TTS defaults still use gpt-4o-* models — decide modernization path #606. Reverting would restore the 404 (strictly worse), so this PR keeps the swap.

Draft

Left as draft — model-default modernization (the remaining gpt-4o-* STT/TTS) is tracked separately in #606. Ready for your review/merge call.

🤖 Generated with Claude Code

drewdrewthis · 2026-06-04T16:29:47Z

[grinder] READY for human review

CI: green — zero failing, zero pending (all 17 checks SUCCESS or SKIPPED-by-design)
Review threads: zero (confirmed via GraphQL reviewThreads — 0 nodes)
Draft: lifted (gh pr ready ✓)
Links: closes #486 (unskip voice tests — same dead-model root cause); related #606 (STT/TTS defaults modernization, tracked separately)

Verified by:

statusCheckRollup (17 checks, 2026-06-04T15:54–15:55Z):
- preflight → SUCCESS
- javascript-complete → SUCCESS
- python-complete → SUCCESS
- docs-complete → SUCCESS
- CodeQL (javascript-typescript) → SUCCESS
- CodeQL (python) → SUCCESS
- Validate PR Title → SUCCESS
- evaluate (auto-approve workflow) → SUCCESS
- action-semantic-pull-request → SUCCESS
- All changes checks → SUCCESS; ci-checks/test/build/firefighting/dismiss-firefighting-approval → SKIPPED (by-design path)
reviewThreads(first:50) → {"nodes":[]} (zero unresolved, zero outdated)
Live voice verification (this session, pre-push): multimodal-voice-to-voice-conversation.test.ts ✅ success:true; multimodal-audio-to-audio.test.ts ✅; gpt-audio-mini accepts identical chat.completions shape, audio returned, traces in prod (project_bZspxwkhCD4POvqmIgOr2)
Published @langwatch/scenario@0.4.12 verified byte-identical to the verified build (419873 bytes; audio fix present)

ACs (from verification plan):

AC11 (no shipped example pins deleted model) ✅ — gpt-4o-audio-preview swapped → gpt-audio-mini; fix is 4 line changes across 2 files
ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486 AC1+AC2 (skip markers removed; CI runs them) — this PR is the prerequisite fix; skip markers removal is the follow-on step (tracked in ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486)
Note: multimodal-audio-to-text still result.success=false on brittle judge criteria (pre-existing sensitivity, not a regression from this fix; tracked in Voice STT/TTS defaults still use gpt-4o-* models — decide modernization path #606)

Do NOT merge — that's your call.

github-actions

Approved by automation: PR qualifies as low-risk-change under the documented policy.

…p are in #607, not this branch Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis · 2026-06-04T17:54:39Z

✅ Review + prove-it: READY

Review: diff is exactly the model swap (gpt-4o-audio-preview → gpt-audio-mini) in 2 example files + 3 doc-comment lines. No scope creep. Verified live that gpt-4o-audio-preview is deleted (GET /v1/models/gpt-4o-audio-preview → model_not_found) and gpt-audio-mini supports the exact modalities:["text","audio"] + audio:{voice,format:"wav"} shape the code uses (direct API call returned a WAV transcript).

Prove-it (live, not CI — CI skips these):

cd javascript/examples/vitest && env -u CI npx vitest run tests/multimodal-audio-to-text.test.ts
→ Test Files 1 passed (1) | judge verdict: SUCCESS, all 3 criteria met

The gpt-audio-mini agent produced audio, transcript extracted, gpt-5 judge passed clean (the #606 brittle-judge issue did not bite this run).

Caveat (non-blocking): 4 stale // Skipped in CI: depends on ... gpt-4o-audio-preview comments remain in the example files (one in this file) — comments only, no dead model in any executable line. Worth a one-line follow-up.

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions

Approved by automation: PR qualifies as low-risk-change under the documented policy.

…t-gen (#610) * docs(voice/#606): expand STT/TTS doc comments and relax audio-to-text judge criteria Adds deliberate-choice rationale comments to OPENAI_STT_MODEL and OPENAI_TTS_MODEL in both JS (voice-models.ts) and Python (voice_models.py), noting no gpt-5-family transcription/TTS models exist on the public API as of 2026-06. Also documents the Python-only OPENAI_BOT_STT_MODEL gap in the TS file. Relaxes the multimodal-audio-to-text judge criteria from overly-specific assertions (exact voice gender, exact repeat phrasing) to behavioural checks (processed audio, coherent response, non-text format acknowledgement). Updates the stale skip comment to reflect the model swap in PR #607. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(voice/#606): update feature-file contract counts to match post-#561/#604 reality Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(voice/#606): add AC4/AC5 doc comments — STT lock rationale + TTS callable-swap pattern - openai-realtime.ts: explain why `input.transcription.model` is locked to OPENAI_STT_MODEL and not exposed as a constructor option (Realtime API only accepts transcription-class models; callers who need a different model subclass the adapter) - openai-tts.ts: document that the TTS model is not a parameter by design — the pattern is to swap the whole TTSCallable rather than parameterise this one; link to OPENAI_TTS_MODEL for the current-gen rationale Closes #606 (AC4 + AC5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(examples/voice/#606): correct stale comment — model swap + unskip are in #607, not this branch Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

The voice-to-voice example helper and the audio-to-text example pinned `gpt-4o-audio-preview`, which OpenAI has removed (404 model_not_found since 2026-05-19). Any user running the canonical voice example hit an immediate 404. Switch to `gpt-audio-mini` — OpenAI's current cost-efficient GA audio-chat model — matching the Python twin, which already migrated (python/scenario/config/voice_models.py:44 OPENAI_AUDIO_CHAT_MODEL, python/examples/test_audio_to_text.py:157). Verified live: gpt-audio-mini accepts the identical chat.completions shape (modalities:["text","audio"], audio:{voice,format}) and returns audio. Re-ran the voice-to-voice e2e against prod LangWatch — success: true, real 2-turn conversation, traces landed (project_bZspxwkhCD4POvqmIgOr2). SDK core was unaffected (OpenAIRealtimeAgentAdapter uses gpt-realtime-mini). This closes a py↔ts example-parity gap left by #561. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-05T10:01:01Z

Automated low-risk assessment

This PR was evaluated against the repository's Low-Risk Pull Requests procedure and does not qualify as low risk.

The PR updates example/test code to swap the OpenAI model name from gpt-4o-audio-preview to gpt-audio-mini and adjusts related comments. Although the changes are small and confined to examples/tests, they alter which external model the code calls (a change to an integration with a third‑party system), which is explicitly excluded from low‑risk automatic merge under the policy. Therefore this PR does not qualify for the low-risk-change label.

This PR requires a manual review before merging.

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e, keep+migrate supported audio examples Cohesive retirement of the legacy gpt-4o-audio-preview voice/audio example surface, folding in the model swap from #607 (cherry-picked) and superseding the unskip plan in #486. Genuinely-dead (retired): - Tombstone docs/docs/pages/examples/multimodal/voice-to-voice.mdx and testing-voice-agents.mdx -> pointer to /voice/getting-started (URLs still 200; the langwatch vocs fork has no redirect layer, so a tombstone is how we avoid 404s on previously-public URLs). - Delete the now-unused LegacyVoiceDeprecation.mdx snippet (no importers left). - (test_voice_to_voice_conversation.py already deleted in an earlier #486 commit.) Supported (kept + migrated to gpt-audio-mini): - audio-to-text.mdx / audio-to-audio.mdx kept and updated: prereq prose now names gpt-audio-mini; LegacyVoiceDeprecation banner removed (these document the CURRENT supported single-call pattern, not a legacy one). - Python test_audio_to_text.py / test_audio_to_audio.py: skip COMMENTS rewritten to the real reason (live E2E -- real OpenAI gpt-audio-mini + LangWatch backend, cost, non-deterministic audio); skipif(CI) markers retained by design. No model literal change (they route through the helper's gpt-audio-mini default). - _generated example partials regenerated to match the migrated test sources. overview.mdx voice-agents link repointed to /voice/getting-started. Why the audio tests stay CI-skipped: they are live end-to-end tests; #486's "unskip to restore CI coverage" premise was never achievable (cost + non-determinism). The right end-state is migrated-and-intentionally-skipped. Docs build: pnpm build exits 0, no broken-link/missing-import errors; tombstone routes render the /voice/getting-started pointer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis · 2026-06-05T12:23:28Z

Superseded by #612. The two model-swap commits from this PR (gpt-4o-audio-preview → gpt-audio-mini) were cherry-picked into #612, which retires the legacy voice/audio example surface cohesively (tombstones the dead example docs, keeps+migrates the supported audio examples, and reconciles #486). Closing as folded-in.

drewdrewthis marked this pull request as ready for review June 4, 2026 16:29

drewdrewthis added the grinding Grinder is actively managing this PR label Jun 4, 2026

drewdrewthis added pr-ready and removed grinding Grinder is actively managing this PR labels Jun 4, 2026

drewdrewthis mentioned this pull request Jun 4, 2026

ci: migrate voice-agent example tests off deleted gpt-4o-audio-preview, then unskip #486

Closed

github-actions Bot added the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label Jun 4, 2026

github-actions Bot previously approved these changes Jun 4, 2026

View reviewed changes

This was referenced Jun 4, 2026

docs(voice/#606): document STT/TTS model choices as deliberate current-gen #610

Merged

chore(examples/voice/#486): retire legacy gpt-4o-audio-preview surface, migrate supported audio examples to gpt-audio-mini #612

Open

drewdrewthis added a commit that referenced this pull request Jun 4, 2026

docs(examples/voice/#606): correct stale comment — model swap + unski…

5762cdd

…p are in #607, not this branch Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis added a commit that referenced this pull request Jun 4, 2026

docs(examples/voice/#607): refresh stale gpt-4o-audio-preview comment…

3593191

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis dismissed github-actions[bot]’s stale review via 3593191 June 4, 2026 18:03

github-actions Bot added low-risk-change PR qualifies as low-risk per policy and can be merged without manual review and removed low-risk-change PR qualifies as low-risk per policy and can be merged without manual review labels Jun 4, 2026

github-actions Bot previously approved these changes Jun 4, 2026

View reviewed changes

drewdrewthis and others added 2 commits June 5, 2026 11:53

docs(examples/voice/#607): refresh stale gpt-4o-audio-preview comment…

37c0343

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis dismissed github-actions[bot]’s stale review via 37c0343 June 5, 2026 09:56

drewdrewthis force-pushed the fix/voice-example-dead-audio-model branch from 3593191 to 37c0343 Compare June 5, 2026 09:56

github-actions Bot removed the low-risk-change PR qualifies as low-risk per policy and can be merged without manual review label Jun 5, 2026

drewdrewthis self-assigned this Jun 5, 2026

drewdrewthis added a commit that referenced this pull request Jun 5, 2026

docs(examples/voice/#607): refresh stale gpt-4o-audio-preview comment…

4643a8a

… refs after model swap Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

drewdrewthis closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini#607

fix(examples/voice): swap deleted gpt-4o-audio-preview → gpt-audio-mini#607
drewdrewthis wants to merge 2 commits into
mainfrom
fix/voice-example-dead-audio-model

drewdrewthis commented Jun 4, 2026

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

drewdrewthis commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

drewdrewthis commented Jun 4, 2026

Problem

Fix

Verification (live, against prod LangWatch)

Draft

Uh oh!

drewdrewthis commented Jun 4, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

drewdrewthis commented Jun 4, 2026

✅ Review + prove-it: READY

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

drewdrewthis commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant