Streaming with short lived connections is not working #390
-
Which Deepgram product are you using?Deepgram API DetailsHi, I own a voice dictation application that supports multiple recognition services. I am attempting to add deepgram as one of the recognition options that my users can select. I am getting data from the user's microphone and sending it to the service. Because this is a dictation scenario, each utterance from the user may be short, and there may be long periods of time in between utterances. I also need to control endpointing manually. My application is written in Go, so I am using your Go package. When the user begins speaking, my application does the following:
When the user finishes speaking, my application does the following:
For short (~2 seconds) utterances, requesting the transcript hangs for several seconds and eventually returns the error:
I am certain that I am sending data before requesting the transcript, because of my logging and successful behavior with other recognizers. Here are my transcription options: If you are making a request to the Deepgram API, what is the full Deepgram URL you are making a request to?wss://api.deepgram.com/v1/listen?alternatives=5&encoding=linear16&endpointing=false&filler_words=true&language=en-US&model=general&sample_rate=16000&tier=nova If you are making a request to the Deepgram API and have a request ID, please paste it below:No response If possible, please attach your code or paste it into the text box.No response If possible, please attach an example audio file to reproduce the issue.No response |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
With an utterance that short, the Deepgram server would be waiting to receive more audio before it has enough context to transcribe (how much is enough depends on multiple factors, such as duration thresholds, and endpointing, and more). If you want to tell Deepgram that you won't be sending any more audio, send a WebSocket text frame containing |
Beta Was this translation helpful? Give feedback.
With an utterance that short, the Deepgram server would be waiting to receive more audio before it has enough context to transcribe (how much is enough depends on multiple factors, such as duration thresholds, and endpointing, and more). If you want to tell Deepgram that you won't be sending any more audio, send a WebSocket text frame containing
{"type": "CloseStream"}. This will cause Deepgram to transcribe any audio that's been received (even though we might not have received the "optimal" amount of context yet), send back the transcript, and then close the WebSocket. See the docs for more (although that's basically all there is to it!)