ElevenLabs Scribe V2 Realtime STT produces no transcriptions via LiveKit Inference

### Bug Description

ElevenLabs Scribe V2 Realtime STT via LiveKit Inference produces `stt_audio_duration=0.0` (no transcriptions at all), while Deepgram Nova-3 works perfectly with the exact same setup.

I've tested extensively with different initialization approaches and language formats. Audio is definitely being received and processed (confirmed via gain boost logs showing -27 to -30 dB RMS), but ElevenLabs STT never produces any transcription output.

The documentation at https://docs.livekit.io/agents/models/stt/inference/elevenlabs/ shows this should work with the `inference.STT()` approach.

### Expected Behavior

ElevenLabs Scribe V2 Realtime should produce transcriptions similar to Deepgram Nova-3, with non-zero `stt_audio_duration` in usage metrics and `user_input_transcribed` events firing when speech is detected.

### Reproduction Steps

```bash
1. Create an AgentSession with ElevenLabs STT via LiveKit Inference
2. Connect a participant and speak into microphone
3. Observe that no transcriptions are produced (stt_audio_duration=0.0)
4. Switch to Deepgram Nova-3 with same setup
5. Observe that transcriptions work perfectly

**Working Code (Deepgram):**

from livekit.agents import inference

STT_MODEL = "deepgram/nova-3"
STT_LANGUAGE = "en"

session = AgentSession(
    stt=inference.STT(model=STT_MODEL, language=STT_LANGUAGE),
    llm="openai/gpt-4o-mini",
    tts="cartesia/sonic-3:f786b574-daa5-4673-aa0c-cbe3e8534c02",
    vad=vad,
)
Failing Code (ElevenLabs) - Tested 3 variations: Option 1 - String shorthand:
stt="elevenlabs/scribe_v2_realtime:en"
Option 2 - inference.STT with "en":
stt=inference.STT(model="elevenlabs/scribe_v2_realtime", language="en")
Option 3 - inference.STT with "en-US":
stt=inference.STT(model="elevenlabs/scribe_v2_realtime", language="en-US")
All three ElevenLabs variations produce stt_audio_duration=0.0
```

### Operating System

Linux (Railway deployment)

### Models Used

STT: elevenlabs/scribe_v2_realtime (FAILING) / deepgram/nova-3 (WORKING) LLM: openai/gpt-4o-mini TTS: cartesia/sonic-3

### Package Versions

```bash
ivekit-agents==1.1.2 
livekit-plugins-silero==1.1.2 
livekit-plugins-turn-detector==1.1.2 
Python 3.11
```

### Session/Room/Call IDs

Room ID: voice-test-7e493f46 
Participant: web-tester-7e493f46

### Proposed Solution

```python
Unclear - may be an issue with how LiveKit Inference routes requests to ElevenLabs, or a configuration issue on the ElevenLabs integration side.
```

### Additional Context

Key observation from logs: WORKING (Deepgram):
stt_audio_duration=19.55
Multiple user_input_transcribed events
FAILING (ElevenLabs):
stt_audio_duration=0.0
Zero user_input_transcribed events
Audio IS being received (GAIN_BOOST logs show -27 to -30 dB RMS)
Full usage summary from ElevenLabs test: UsageSummary(llm_prompt_tokens=178, llm_completion_tokens=9, tts_characters_count=32, tts_audio_duration=1.95, stt_audio_duration=0.0) Note: LLM and TTS work fine - only STT fails with ElevenLabs.

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElevenLabs Scribe V2 Realtime STT produces no transcriptions via LiveKit Inference #4255

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ElevenLabs Scribe V2 Realtime STT produces no transcriptions via LiveKit Inference #4255

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions