-
Notifications
You must be signed in to change notification settings - Fork 785
Description
Title
Gemini Live API Native Audio Premature turnComplete Causes Mid-Sentence Audio Truncation
Description
When using the Gemini Live API with native audio output, the model frequently stops speaking mid-sentence.
The server sends a turnComplete message while the model is still generating audio, with no interrupted flag set.
This is not caused by client-side echo or VAD. It appears to be a server-side issue where the model prematurely terminates its own turn.
This has been reported across multiple repos and confirmed by ~40 developers over the past ~8 months (see Related Issues below). Despite multiple fix attempts by Google engineers, the problem persists or regresses.
Environment
- Model:
gemini-2.5-flash-native-audio-preview-12-2025 - SDK:
google-genai 1.64.0viagoogle-adk 1.25.1 - API: Google AI Developer API (not Vertex AI)
- Platform: FastAPI WebSocket server (Python 3.12) + iOS client (SwiftUI)
- OS: macOS (server), iOS 18 (client)
Steps to Reproduce
- Establish a Live API session with native audio output (Python).
response_modalities=["AUDIO"]speech_configwith a prebuilt voice (e.g.Aoede)realtime_input_config.automatic_activity_detection.disabled=True
- Send a user message that requires a multi-sentence response (e.g. "Describe what you see in detail").
- Observe the server stream.
Expected Behavior
The model completes its entire response before sending turnComplete.
Actual Behavior
- The model begins generating audio normally.
- After 1–3 sentences (sometimes mid-word), a
turnCompletemessage arrives withoutinterrupted: true. - The remaining audio is never delivered.
- This happens intermittently (sometimes the model completes, sometimes it truncates).
- Frequency increases over the course of a session.
Key Evidence: This Is Server-Side, Not Echo / VAD
We have implemented every possible client-side mitigation and the problem persists:
- Hardware AEC (iOS Voice Processing IO)
- Client-side echo gating (send silence frames during model speech)
- SileroVAD confirmation (no speech being sent during model output)
- NOINTERRUPTION mode (model should not be interruptible)
- Disabled automatic activity detection (manual activity signals)
Despite all five layers of protection, the model still truncates its own output. turnComplete arrives with no interrupted flag, confirming the server decided to end the turn on its own.
Google's own documentation acknowledges "self-interruption"
From the Gemini Live API Get Started page:
Note: Use headphones... To prevent the model from interrupting itself, use headphones.
However:
- Our iOS app already has hardware-level AEC.
- We already implement echo gating.
- The problem persists because the root cause is server-side premature turn termination, not echo.
- Requiring headphones is not an acceptable solution for an accessibility app serving visually impaired users.
Aggravating Factors (observed across community + our tests)
- Tool calls / function calling (related: Incorrect model names in API error messages #707, fix: support parameterized generics Union type in automatic function calling parameters (fixes #22) #139, Bug Report: Gemini 2.5 Flash Native Audio Preview 12-2025 - Model Hallucinates Before NON_BLOCKING Tool Results Return #1894): truncation increases after tool return.
- Growing context length (related: Incorrect model names in API error messages #707): longer conversations worsen.
- Non-English languages (related: Incorrect model names in API error messages #707): Chinese/Japanese significantly worse.
enable_affective_dialog(related: Incorrect model names in API error messages #707): correlated with prematureturnComplete.context_window_compression(related: chore: fix protected_namespaces config value #117): enabling worsens.
Our application (SightLine, an AI assistant for visually impaired users) hits all five factors simultaneously (tool usage, accumulating context, Chinese support, compression), so the bug becomes production-blocking.
Impact
SightLine relies on native audio for real-time voice interaction. Truncation makes the product unusable for target users:
- Safety-critical info gets cut off (navigation directions, obstacle warnings).
This is not cosmetic. For accessibility applications, reliable audio output is a hard requirement.
No Alternative Models Available (as of now)
gemini-2.5-flash-native-audio-preview-12-2025(current): has this bug.gemini-2.5-flash-native-audio-preview-09-2025(deprecated 2026-03-19): worse / raspy audio (related: The voices in the 'gemini-2.5-flash-native-audio-preview-12-2025' are raspy and insufferable js-genai#1209).- Vertex AI
gemini-live-2.5-flash-native-audio(GA): same underlying behavior. gemini-2.0-flash-live-001(decommissioned 2025-12-09): unavailable.- Gemini 3.x: no Live API support.
There is currently no Gemini Live audio model without this bug.
Related Issues / Community Reports
Core & related issues reported across repos:
- Gemini Live API responses cut off prematurely with turnComplete despite incomplete content js-genai#707 — premature
turnComplete, OPEN, P2, ~8 months, ~40 confirmations - google-gemini-live-api-web-console#117 — model stops midway, OPEN
- Live API Audio Quality bad #872 — audio quality degradation, CLOSED but problem persists
- The voices in the 'gemini-2.5-flash-native-audio-preview-12-2025' are raspy and insufferable js-genai#1209 — 12-2025 model raspy voice, OPEN, P2
- google-gemini-live-api-web-console#139 — model self-interrupts / talks over itself, OPEN
- Bug Report: Gemini 2.5 Flash Native Audio Preview 12-2025 - Model Hallucinates Before NON_BLOCKING Tool Results Return #1894 — post-tool-call hallucination, CLOSED by bot
- Gemini Native Audio (Tier 1) – Response Cutoff & Unclear RPD Limits #1275 — response truncation, OPEN, P3
Forum reports also mention stuttering, delays, extremely short audio playback, etc.
Developer sentiment (from googleapis/js-genai#707 and others) indicates teams are switching to OpenAI Realtime due to this unresolved issue.
Request
- Please acknowledge this as a server-side model issue, not a client-side echo problem.
- Prioritize a fix (P2 for ~8 months is too long for a production-blocking bug).
- Provide a timeline or an interim workaround beyond "use headphones".
- Consider accessibility use cases: visually impaired users cannot be told to "just wear headphones".
Contest note:
I'm currently participating in the Gemini Live Agent Challenge (Devpost): https://geminiliveagentchallenge.devpost.com/?linkId=54514909
Contact:
LiuWei
sunflowers0607@outlook.com
https://github.com/SunflowersLwtech