-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Please read this first
- Have you read the docs? Yes, Agents SDK docs
- Have you searched for related issues? Yes, no similar issues found regarding this voices
Describe the bug
When using the OpenAI Realtime API with Twilio Media Streams, three specific voices (fable, onyx, and nova) do not produce intelligible speech audio. After a significant delay, they return audio that sounds like distorted noise rather than human speech. All other voices (alloy, echo, shimmer, ash, ballad, coral, sage, verse) work perfectly.
Debug information
- Agents SDK version: v0.3.3
- OpenAI SDK version: >=1.55.1
- Python version: Python 3.13
- Audio format: g711_ulaw (both input and output)
- Integration: Twilio Media Streams
- Sample rate: 8 kHz
Repro steps
Use the twilio example with the following configuration:
session = await runner.run(
model_config={
"api_key": api_key,
"initial_model_settings": {
"input_audio_format": "g711_ulaw",
"output_audio_format": "g711_ulaw",
"voice": "fable", # or "onyx" or "nova"
"turn_detection": {
"type": "semantic_vad",
"interrupt_response": True,
"create_response": True,
},
},
}
)
Note: Changing the voice to shimmer, coral, alloy, or any other voice (except fable, onyx, nova) resolves the issue immediately.
Expected behavior
All voices should produce clear, intelligible speech audio when using g711_ulaw format with Twilio Media Streams, similar to how alloy, echo, shimmer, ash, ballad, coral, sage, and verse work correctly.
Hypothesis: This might be related to how these specific voices are encoded/compressed in G.711 µ-law format (8 kHz, 8-bit, logarithmic compression). The acoustic characteristics of fable, onyx, and nova may not be compatible with this codec, or there could be a bug in the processing pipeline for these voices when g711_ulaw is requested.