Skip to content

ElevenLabs STT stream ignores sample rate configuration #5573

@ameyakhare

Description

@ameyakhare

Bug Description

The elevenlabs STT implementation passes the sample rate/audio format in the wrong parameter during wss connection. The correct parameter is audio_format, but the format is being passed in the encoding (non-existent) parameter. API Reference here.

Expected Behavior

The sample rate/encoding should be passed in the audio_format parameter during wss connection.

Reproduction Steps

elevenlabs.STT(model="scribe_v2_realtime", sample_rate=8000)

pcm_8000 is passed to the non-existent encoding parameter. This defaults to pcm_16000, resulting in incorrect transcription timestamps, silence length detection, min silence/speech threshold detection, etc. for vad-based commits

Operating System

macOS Tahoe

Models Used

ElevenLabs Scribe V2 Realtime

Package Versions

livekit-plugins-elevenlabs==1.5.6

Session/Room/Call IDs

No response

Proposed Solution

--- a/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
+++ b/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
@@ -365,7 +365,6 @@ class SpeechStream(stt.SpeechStream):
                                 "message_type": "input_audio_chunk",
                                 "audio_base_64": "",
                                 "commit": False,
-                                "sample_rate": self._opts.sample_rate,
                             }
                         )
                     )
@@ -403,7 +402,6 @@ class SpeechStream(stt.SpeechStream):
                                 "message_type": "input_audio_chunk",
                                 "audio_base_64": audio_b64,
                                 "commit": False,
-                                "sample_rate": self._opts.sample_rate,
                             }
                         )
                     )
@@ -484,7 +482,7 @@ class SpeechStream(stt.SpeechStream):
         commit_strategy = "manual" if self._opts.server_vad is None else "vad"
         params = [
             f"model_id={self._opts.model_id}",
-            f"encoding=pcm_{self._opts.sample_rate}",
+            f"audio_format=pcm_{self._opts.sample_rate}",
             f"commit_strategy={commit_strategy}",
         ]

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions