fix(realtime): handle typed/Mapping g711 formats in audio length calc#3196
fix(realtime): handle typed/Mapping g711 formats in audio length calc#3196adityasingh2400 wants to merge 1 commit intoopenai:mainfrom
Conversation
calculate_audio_length_ms only special-cased strings starting with 'g711',
so g711 audio passed as a typed AudioPCMU/AudioPCMA pydantic model, a
Mapping like {'type': 'audio/pcmu'}, or the API-style 'audio/pcmu' /
'audio/pcma' string silently fell back to PCM-24kHz math. That yielded a
~6x wrong duration and miscalculated truncation offsets on interrupt for
SIP/Twilio sessions whose users wired set_audio_format with the typed
shape that RealtimeAudioFormat already advertises.
Normalize the format to a lower-cased type string up front and treat
'audio/pcmu' / 'audio/pcma' as g711 alongside the existing 'g711_*'
prefix. Internal callers always pass normalized strings already, so
behavior there is unchanged.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 878201205b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| type_value = getattr(format, "type", None) | ||
| return type_value.lower() if isinstance(type_value, str) else None |
There was a problem hiding this comment.
Classify typed G.711 models even when type is omitted
AudioPCMU/AudioPCMA are accepted typed formats, but their generated type field is optional and defaults to None. Passing AudioPCMU() or AudioPCMA() still falls through to PCM math here because only the attribute value is inspected, so typed G.711 models are not reliably fixed unless callers redundantly set type.
Useful? React with 👍 / 👎.
Summary
calculate_audio_length_msonly special-cased format strings starting withg711, so g711 audio passed asAudioPCMU/AudioPCMApydantic model,Mappinglike{\"type\": \"audio/pcmu\"}, or\"audio/pcmu\"/\"audio/pcma\"stringsilently fell back to PCM-24kHz math. That yields a ~6x wrong duration (10ms of g711 reads as ~1.67ms) and miscalculates truncation offsets on interrupt for SIP/Twilio sessions whose callers wired
RealtimePlaybackTracker.set_audio_format/ModelAudioTracker.set_audio_formatwith the typed shape thatRealtimeAudioFormatalready advertises.RealtimeAudioFormatis declared asstr | Mapping[str, Any] | OpenAIRealtimeAudioFormats, but the length helper only honored the first form. Internal callers (OpenAIRealtimeWebSocketModel._extract_audio_format) always normalize to one ofpcm16/g711_ulaw/g711_alaw, so the bug only surfaces for user-facing entry points (custom playback trackers, library users mirroring the public type).This PR normalizes the format to a lower-cased type identifier up front and treats
audio/pcmu/audio/pcmaas g711 alongside the existingg711_*prefix. PCM behavior and the existingNonefallback are unchanged.In the spirit of recently merged #3141 (audio deltas before format negotiation) and #3140 — keep the realtime audio path defensive against the input shapes the public types already permit.
Test
tests/realtime/test_playback_tracker.py::test_audio_length_calculation_handles_typed_and_mapping_g711_formatsexercises typedAudioPCM/AudioPCMU/AudioPCMA, Mapping{type: audio/pcm*}, and stringaudio/pcmu/audio/pcmainputs and asserts the g711 forms resolve to the 8 kHz sample rate while PCM stays on 24 kHz. The test fails onmain(g711 forms read as ~1.67ms instead of 10ms) and passes with this fix. Fulltests/realtime/suite (233 tests) passes.