Skip to content

ElevenLabs Scribe V2 Realtime STT produces no transcriptions via LiveKit Inference #4255

@Punchy25

Description

@Punchy25

Bug Description

ElevenLabs Scribe V2 Realtime STT via LiveKit Inference produces stt_audio_duration=0.0 (no transcriptions at all), while Deepgram Nova-3 works perfectly with the exact same setup.

I've tested extensively with different initialization approaches and language formats. Audio is definitely being received and processed (confirmed via gain boost logs showing -27 to -30 dB RMS), but ElevenLabs STT never produces any transcription output.

The documentation at https://docs.livekit.io/agents/models/stt/inference/elevenlabs/ shows this should work with the inference.STT() approach.

Expected Behavior

ElevenLabs Scribe V2 Realtime should produce transcriptions similar to Deepgram Nova-3, with non-zero stt_audio_duration in usage metrics and user_input_transcribed events firing when speech is detected.

Reproduction Steps

1. Create an AgentSession with ElevenLabs STT via LiveKit Inference
2. Connect a participant and speak into microphone
3. Observe that no transcriptions are produced (stt_audio_duration=0.0)
4. Switch to Deepgram Nova-3 with same setup
5. Observe that transcriptions work perfectly

**Working Code (Deepgram):**

from livekit.agents import inference

STT_MODEL = "deepgram/nova-3"
STT_LANGUAGE = "en"

session = AgentSession(
    stt=inference.STT(model=STT_MODEL, language=STT_LANGUAGE),
    llm="openai/gpt-4o-mini",
    tts="cartesia/sonic-3:f786b574-daa5-4673-aa0c-cbe3e8534c02",
    vad=vad,
)
Failing Code (ElevenLabs) - Tested 3 variations: Option 1 - String shorthand:
stt="elevenlabs/scribe_v2_realtime:en"
Option 2 - inference.STT with "en":
stt=inference.STT(model="elevenlabs/scribe_v2_realtime", language="en")
Option 3 - inference.STT with "en-US":
stt=inference.STT(model="elevenlabs/scribe_v2_realtime", language="en-US")
All three ElevenLabs variations produce stt_audio_duration=0.0

Operating System

Linux (Railway deployment)

Models Used

STT: elevenlabs/scribe_v2_realtime (FAILING) / deepgram/nova-3 (WORKING) LLM: openai/gpt-4o-mini TTS: cartesia/sonic-3

Package Versions

ivekit-agents==1.1.2 
livekit-plugins-silero==1.1.2 
livekit-plugins-turn-detector==1.1.2 
Python 3.11

Session/Room/Call IDs

Room ID: voice-test-7e493f46
Participant: web-tester-7e493f46

Proposed Solution

Unclear - may be an issue with how LiveKit Inference routes requests to ElevenLabs, or a configuration issue on the ElevenLabs integration side.

Additional Context

Key observation from logs: WORKING (Deepgram):
stt_audio_duration=19.55
Multiple user_input_transcribed events
FAILING (ElevenLabs):
stt_audio_duration=0.0
Zero user_input_transcribed events
Audio IS being received (GAIN_BOOST logs show -27 to -30 dB RMS)
Full usage summary from ElevenLabs test: UsageSummary(llm_prompt_tokens=178, llm_completion_tokens=9, tts_characters_count=32, tts_audio_duration=1.95, stt_audio_duration=0.0) Note: LLM and TTS work fine - only STT fails with ElevenLabs.

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions