Skip to content

fix(realtime): handle typed/Mapping g711 formats in audio length calc#3196

Draft
adityasingh2400 wants to merge 1 commit intoopenai:mainfrom
adityasingh2400:fix/realtime-audio-length-typed-formats
Draft

fix(realtime): handle typed/Mapping g711 formats in audio length calc#3196
adityasingh2400 wants to merge 1 commit intoopenai:mainfrom
adityasingh2400:fix/realtime-audio-length-typed-formats

Conversation

@adityasingh2400
Copy link
Copy Markdown
Contributor

Summary

calculate_audio_length_ms only special-cased format strings starting with g711, so g711 audio passed as

  • a typed AudioPCMU / AudioPCMA pydantic model,
  • a Mapping like {\"type\": \"audio/pcmu\"}, or
  • the API-style \"audio/pcmu\" / \"audio/pcma\" string

silently fell back to PCM-24kHz math. That yields a ~6x wrong duration (10ms of g711 reads as ~1.67ms) and miscalculates truncation offsets on interrupt for SIP/Twilio sessions whose callers wired RealtimePlaybackTracker.set_audio_format / ModelAudioTracker.set_audio_format with the typed shape that RealtimeAudioFormat already advertises.

RealtimeAudioFormat is declared as str | Mapping[str, Any] | OpenAIRealtimeAudioFormats, but the length helper only honored the first form. Internal callers (OpenAIRealtimeWebSocketModel._extract_audio_format) always normalize to one of pcm16/g711_ulaw/g711_alaw, so the bug only surfaces for user-facing entry points (custom playback trackers, library users mirroring the public type).

This PR normalizes the format to a lower-cased type identifier up front and treats audio/pcmu / audio/pcma as g711 alongside the existing g711_* prefix. PCM behavior and the existing None fallback are unchanged.

In the spirit of recently merged #3141 (audio deltas before format negotiation) and #3140 — keep the realtime audio path defensive against the input shapes the public types already permit.

Test

tests/realtime/test_playback_tracker.py::test_audio_length_calculation_handles_typed_and_mapping_g711_formats exercises typed AudioPCM/AudioPCMU/AudioPCMA, Mapping {type: audio/pcm*}, and string audio/pcmu / audio/pcma inputs and asserts the g711 forms resolve to the 8 kHz sample rate while PCM stays on 24 kHz. The test fails on main (g711 forms read as ~1.67ms instead of 10ms) and passes with this fix. Full tests/realtime/ suite (233 tests) passes.

calculate_audio_length_ms only special-cased strings starting with 'g711',
so g711 audio passed as a typed AudioPCMU/AudioPCMA pydantic model, a
Mapping like {'type': 'audio/pcmu'}, or the API-style 'audio/pcmu' /
'audio/pcma' string silently fell back to PCM-24kHz math. That yielded a
~6x wrong duration and miscalculated truncation offsets on interrupt for
SIP/Twilio sessions whose users wired set_audio_format with the typed
shape that RealtimeAudioFormat already advertises.

Normalize the format to a lower-cased type string up front and treat
'audio/pcmu' / 'audio/pcma' as g711 alongside the existing 'g711_*'
prefix. Internal callers always pass normalized strings already, so
behavior there is unchanged.
@seratch
Copy link
Copy Markdown
Member

seratch commented May 8, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 878201205b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +29 to +30
type_value = getattr(format, "type", None)
return type_value.lower() if isinstance(type_value, str) else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify typed G.711 models even when type is omitted

AudioPCMU/AudioPCMA are accepted typed formats, but their generated type field is optional and defaults to None. Passing AudioPCMU() or AudioPCMA() still falls through to PCM math here because only the attribute value is inspected, so typed G.711 models are not reliably fixed unless callers redundantly set type.

Useful? React with 👍 / 👎.

@seratch seratch marked this pull request as draft May 8, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature:realtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants