-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elevenlabs TTS websocket connection design #306
Comments
This was a constraint of ElevenLabs. Additional text can't be sent on the same websocket after an EOS and the EOS signal is used to flush. Looking at their docs now, it looks like they have since introduced a "flush" flag in their protocol which we can look into using. With that being said, typically there is no additional latency introduced to the end-user with this strategy because the next websocket connection will have been connected long before speech generation is needed. |
Thanks for the clarification! Regarding the later, I saw something
different when debugging though. The connection is being established when
the synthesis task is started (in the line I referenced in my first
message), and no active connection is available at that point as you
suggest.
El El jue, may. 23, 2024 a la(s) 6:23 p. m., Neil Dwyer <
***@***.***> escribió:
… This was a constraint of ElevenLabs. Additional text can't be sent on the
same websocket after an EOS and the EOS signal is used to flush.
Looking at their docs now, it looks like they have since introduced a
"flush" flag in their protocol which we can look into using.
With that being said, typically there is no additional latency introduced
to the end-user with this strategy because the next websocket connection
will have been connected long before speech generation is needed.
—
Reply to this email directly, view it on GitHub
<#306 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6I2Q2KSOK3PROGK6QFPXLZDZM37AVCNFSM6AAAAABIAUBITKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRYGA2DCMBQHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Co-authored-by: sweep-ai[bot] <128439645+sweep-ai[bot]@users.noreply.github.com>
Hey, the 11labs websocket connection is initialized as soon as there is pushed text. The connection will then be closed on flush. (due to 11labs API limitations) |
Hi,
I was able to make the minimal_assistant.py implementation work. Once I sorted out all the difficulties, it runs pretty well! Kudos for that 😃.
I have a question regarding the WebSocket connections used in the ElevenLabs TTS module. In my environment, I noticed that the WebSocket creation is being triggered every time the agent responds to the user. Consequently, the WebSocket is being closed every time the agent stops talking.
Questions:
Is this a design decision? If so, could you please explain the rationale behind it?
Is there a specific reason for not maintaining a persistent WebSocket connection throughout the session?
I believe closing and reopening the WebSocket repeatedly introduces unnecessary overhead. Maintaining one or a few stable connections throughout the session might be more efficient.
Looking forward to your insights on this.
Thank you!
The text was updated successfully, but these errors were encountered: