Deepgram Flux dropping short and soft utterances in non-English languages (German, Italian) #1614
Replies: 5 comments 4 replies
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
|
We also noticed (for Also the docs previously mentioned that the default value for |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Across multiple voice deployments using Deepgram Flux, we are seeing a consistent pattern in non-English languages where caller speech is not transcribed at all — i.e. Flux emits no transcript rather than emitting a wrong one. From the application's point of view this looks like silence: the AI agent stays mute for 15–20 seconds waiting for a turn that never gets signaled, then either prompts again or the caller gives up.
We've ruled out microphone quality as the primary cause — issues reproduce on multiple mic setups, including ones with audibly clean input.
Observed behaviors:
Single-word language selections at the start of a call (e.g. caller saying "Deutsch" in response to a German/English autodetect greeting) are frequently not transcribed at all. Easier to reproduce when said softly.
Short backchannel/confirmation answers in Italian like "sì" or "l'ho trovato" ("I've found it") are commonly missed.
Quiet speech dropped entirely. The same phrase said at normal volume is transcribed; said quietly, no transcript is emitted (we'd expect a low-confidence guess rather than total silence).
Long silences indicating missed turn endpointing. In Italian sessions, testers report frequent 15–20s gaps where they spoke but the agent didn't respond. Reviewing the recordings vs transcripts confirms the audio was not transcribed by Flux at all. We suspect utterance detection and/or turn-end detection is materially weaker for these languages than for English.
Interruption misbehaviour. In at least one Italian session, the caller attempted to interrupt the agent, and the interrupting speech was not detected.
Confusable single-word triggers. When the single-word selection ("Deutsch") is transcribed, it's often misrecognized as similar-sounding English/Dutch words ("Dutch", "Torage", "Voight").
Request IDs for affected sessions:
019e455a-acf0-77e2-8072-e3952f2b863c
019e3b6b-c294-77b1-9d27-d7f106993721
Beta Was this translation helpful? Give feedback.
All reactions