-
-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Bug title
TTS playback skips most of the model's response in Open WebUI when using google_gemini.py v1.6.4
Describe the bug
I'm using the google_gemini.py pipeline (version v1.6.4) inside Open WebUI. Whenever a message is generated and TTS is set to auto read aloud or I'm in the call mode, only the very last sentence of the response is spoken aloud β sometimes just a single word or emoji. The rest of the response is skipped entirely.
This happens specifically in two scenarios:
- When auto-play TTS is enabled (automatic voice playback after each response)
- When using call mode (live voice chat)
It does not happen when manually pressing the "Listen to message" button after the full response has already appeared. In that case, the entire message is spoken correctly.
This strongly suggests a problem in how streaming output is handled in TTS-related workflows β especially when playback is triggered before the entire model response is finalized.
Additionally, I personally suspect that this might also be influenced by the brief visual rendering of the model's internal Thought section. This section appears briefly, then quickly collapses again, which might confuse the TTS system or disrupt the message composition internally. It could be a combination of premature streaming and dynamic UI state changes.
There are no visible errors, but Open WebUI often continues to "work" in the background indefinitely, requiring a manual refresh of the page.
Steps to reproduce
- Install the
google_gemini.pypipeline (v1.6.4) from the OpenWebUI function community - Use a Gemini model like
gemini-2.5-flash - Enable TTS auto-play or use call mode
- Enter any text prompt
- Observe: Only the last sentence is spoken aloud
- Optional: Manually press the TTS playback button afterward β in this case the entire message is read correctly
Environment
- OpenWebUI version: v0.6.32
- Pipeline:
google_gemini.pyv1.6.4 - Model: gemini-2.5-flash
- TTS usage: using auto-play or call mode
- Browser: Google Chrome (latest)
- System: Ubuntu 22.04.5 LTS
- Setup: Docker
Additional context
We believe the problem is linked to how early streaming chunks are handled. If TTS begins playback too early (before all chunks are received), only the final chunk is spoken aloud.
It might also be relevant that OpenWebUI briefly renders a collapsible Thought section before the final message, which could interfere with the rendering or parsing of the full message used for TTS.
We'd love to see better TTS handling or a different Thought section mechanism.
Happy to help test a fix or contribute further!
Thanks again for your incredible work and the beautifully structured Gemini pipeline β it's very appreciated!