Multiple connections versus persistent connection for conversational AI #421

davehorton · 2023-11-07T13:15:12Z

davehorton
Nov 7, 2023

I am using Deepgram for a conversational AI use case. Its working well but I have a question on best practices. Since the conversational dialog flow is very much prompt-response-repeat, I am connecting to deepgram via websocket with each prompt, collecting a transcript and then disconnecting, processing the transcript and saying something to the user, and then repeating the whole cycle.

That is fine but I do pay a slight price for reconnecting each time, in terms of the time to establish the socket connection (tls handshake etc) which at times has caused a delay resulting in some speech audio not making it to deepgram. Also, since each turn of conversation is a completely new session on deepgram, I am wondering if I am forgoing some accuracy as the speaker gets further into the conversation -- what I mean to ask here is whether there is any "learning" that deepgram does as more audio is processed such that transcripts would be more accurate in a single long audio session vs provided over multiple unrelated (to deepgram) sessions?

That is my main question, and while the answer could lead me to want to adjust my implementation and just connect once for the entire conversation, there is a blocker there --- because things like keywords can only be provided in the URL path on the connection, I would not be able to change keywords during the conversation which would render this approach useless. Have you considered augmenting the API so that clients could send JSON text frames during the connection to manipulate things like keywords, or even language?

Answered by jkroll-deepgram

Nov 9, 2023

Hi @davehorton - The short answer is no, you're not getting additional cumulative accuracy by using a single uninterrupted stream. Deepgram doesn't use all the previous audio in a stream to improve transcription as the stream progresses.

However, it does use some context to improve accuracy. If you send very short snippets of audio (say, less than 5 seconds), there can be context missing, which can lower accuracy. But if a complete turn is only a few seconds, then in some ways that's the full context available for the utterance. I'd say that a stream should last at least the length of one full turn, but it can be at your discretion whether you want to keep reconnecting for each new turn, …

View full answer

DamienDeepgram · 2023-11-08T20:48:08Z

DamienDeepgram
Nov 8, 2023

You can send Deepgram a Keep Alive message every few seconds and stop sending audio packets to Deepgram when not needed.

When not sending audio but sending a keep alive we will not charge since we are not processing any audio.

You will be able to keep the websocket connection open and once you send audio again we will transcribe it.

See: https://developers.deepgram.com/reference/streaming#stream-keepalive

0 replies

davehorton · 2023-11-08T21:01:02Z

davehorton
Nov 8, 2023
Author

Thanks, appreciate that info. Wonder if you could respond to my question though:

what I mean to ask here is whether there is any "learning" that deepgram does as more audio is processed such that transcripts would be more accurate in a single long audio session vs provided over multiple unrelated (to deepgram) sessions?

1 reply

jkroll-deepgram Nov 9, 2023
Collaborator

Hi @davehorton - The short answer is no, you're not getting additional cumulative accuracy by using a single uninterrupted stream. Deepgram doesn't use all the previous audio in a stream to improve transcription as the stream progresses.

However, it does use some context to improve accuracy. If you send very short snippets of audio (say, less than 5 seconds), there can be context missing, which can lower accuracy. But if a complete turn is only a few seconds, then in some ways that's the full context available for the utterance. I'd say that a stream should last at least the length of one full turn, but it can be at your discretion whether you want to keep reconnecting for each new turn, or use a KeepAlive mechanism.

Answer selected by davehorton

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Multiple connections versus persistent connection for conversational AI #421

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deepgram

Multiple connections versus persistent connection for conversational AI #421

Uh oh!

davehorton Nov 7, 2023

Replies: 2 comments · 1 reply

Uh oh!

DamienDeepgram Nov 8, 2023

Uh oh!

davehorton Nov 8, 2023 Author

Uh oh!

jkroll-deepgram Nov 9, 2023 Collaborator

davehorton
Nov 7, 2023

Replies: 2 comments 1 reply

DamienDeepgram
Nov 8, 2023

davehorton
Nov 8, 2023
Author

jkroll-deepgram Nov 9, 2023
Collaborator