Elevenlabs TTS websocket connection design #306

fjprobos · 2024-05-21T01:02:01Z

Hi,

I was able to make the minimal_assistant.py implementation work. Once I sorted out all the difficulties, it runs pretty well! Kudos for that 😃.

I have a question regarding the WebSocket connections used in the ElevenLabs TTS module. In my environment, I noticed that the WebSocket creation is being triggered every time the agent responds to the user. Consequently, the WebSocket is being closed every time the agent stops talking.

Questions:

Is this a design decision? If so, could you please explain the rationale behind it?
Is there a specific reason for not maintaining a persistent WebSocket connection throughout the session?

I believe closing and reopening the WebSocket repeatedly introduces unnecessary overhead. Maintaining one or a few stable connections throughout the session might be more efficient.

Looking forward to your insights on this.

Thank you!

keepingitneil · 2024-05-23T21:22:49Z

This was a constraint of ElevenLabs. Additional text can't be sent on the same websocket after an EOS and the EOS signal is used to flush.

Looking at their docs now, it looks like they have since introduced a "flush" flag in their protocol which we can look into using.

With that being said, typically there is no additional latency introduced to the end-user with this strategy because the next websocket connection will have been connected long before speech generation is needed.

fjprobos · 2024-05-25T11:34:49Z

Thanks for the clarification! Regarding the later, I saw something different when debugging though. The connection is being established when the synthesis task is started (in the line I referenced in my first message), and no active connection is available at that point as you suggest. El El jue, may. 23, 2024 a la(s) 6:23 p. m., Neil Dwyer < ***@***.***> escribió:

…

This was a constraint of ElevenLabs. Additional text can't be sent on the same websocket after an EOS and the EOS signal is used to flush. Looking at their docs now, it looks like they have since introduced a "flush" flag in their protocol which we can look into using. With that being said, typically there is no additional latency introduced to the end-user with this strategy because the next websocket connection will have been connected long before speech generation is needed. — Reply to this email directly, view it on GitHub <#306 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA6I2Q2KSOK3PROGK6QFPXLZDZM37AVCNFSM6AAAAABIAUBITKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRYGA2DCMBQHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Co-authored-by: sweep-ai[bot] <128439645+sweep-ai[bot]@users.noreply.github.com>

theomonnom · 2024-08-01T20:44:39Z

Hey, the 11labs websocket connection is initialized as soon as there is pushed text. The connection will then be closed on flush. (due to 11labs API limitations)

parshvadaftari pushed a commit to parshvadaftari/agents that referenced this issue Jun 23, 2024

Update docs/telephony.mdx (livekit#306)

a5474ed

Co-authored-by: sweep-ai[bot] <128439645+sweep-ai[bot]@users.noreply.github.com>

theomonnom closed this as completed Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elevenlabs TTS websocket connection design #306

Elevenlabs TTS websocket connection design #306

fjprobos commented May 21, 2024 •

edited

Loading

keepingitneil commented May 23, 2024

fjprobos commented May 25, 2024 via email

theomonnom commented Aug 1, 2024

Elevenlabs TTS websocket connection design #306

Elevenlabs TTS websocket connection design #306

Comments

fjprobos commented May 21, 2024 • edited Loading

keepingitneil commented May 23, 2024

fjprobos commented May 25, 2024 via email

theomonnom commented Aug 1, 2024

fjprobos commented May 21, 2024 •

edited

Loading