ElevenLabs server_vad with turn_detection="stt" does not end user turns

### Bug Description

When using `livekit-plugins-elevenlabs` with ElevenLabs Scribe v2 Realtime and `server_vad`, `AgentSession(turn_detection="stt")` receives partial and final transcripts, but the user turn is not committed in time because no useful `END_OF_SPEECH` event is emitted.

In a real call, the ElevenLabs STT stream produced multiple partial/final transcripts for short user utterances, for example:

```text
Can you hear me?
Hello?
Are you receiving audio?
```

However, the agent did not respond. The transcript only appeared as a `user_turn` when the participant disconnected, and LiveKit then logged:

```text
skipping user input, speech scheduling is paused
```

This looks like a plugin/turn-detection integration issue rather than an STT recognition issue: Scribe did hear the user, but the turn stayed open until shutdown.

Related but not exactly the same as #4087. That issue describes Scribe v2 turns committing very late with local/Silero VAD. This issue is specifically about the new ElevenLabs `server_vad` path with `AgentSession(turn_detection="stt")`.

### Expected Behavior

With ElevenLabs `server_vad` enabled and `AgentSession(turn_detection="stt")`, the server-side VAD endpoint should cause LiveKit Agents to commit the user turn promptly, so the LLM/agent can answer as soon as ElevenLabs decides the utterance ended.

### Actual Behavior

The ElevenLabs plugin emits partial/interim and committed/final transcript events, but the user turn is not committed promptly. The agent keeps waiting and only sees the accumulated user transcript during shutdown/disconnect.

### Likely Cause

From `livekit.plugins.elevenlabs.stt` in `livekit-plugins-elevenlabs==1.5.12`:

- `_connect_ws()` uses `commit_strategy = "vad"` when `server_vad` is configured.
- `_process_stream_event()` maps non-empty `committed_transcript` / `committed_transcript_with_timestamps` to `SpeechEventType.FINAL_TRANSCRIPT` and keeps `_speaking = True`.
- `SpeechEventType.END_OF_SPEECH` is only emitted for an empty committed transcript.

With `AgentSession(turn_detection="stt")` and no local VAD, `audio_recognition.py` appears to rely on an STT `END_OF_SPEECH` event to mark `_user_turn_committed=True`. If ElevenLabs does not send an empty committed transcript after a server-VAD commit, LiveKit never gets the end-of-speech signal and the turn remains open.

### Reproduction Steps

1. Create an agent using ElevenLabs Scribe v2 Realtime STT with `server_vad` configured.
2. Configure the session to use STT turn detection, without local VAD:

```python
from livekit.agents import AgentSession
from livekit.plugins import elevenlabs

stt = elevenlabs.STT(
    model="scribe_v2_realtime",
    server_vad=elevenlabs.stt.VADOptions(
        vad_threshold=0.4,
        vad_silence_threshold_secs=0.5,
        min_speech_duration_ms=100,
        min_silence_duration_ms=100,
    ),
)

session = AgentSession(
    stt=stt,
    turn_detection="stt",
    # no local VAD
)
```

3. Join a voice room/call and speak a few short utterances, e.g. “hello”, “can you hear me?”.
4. Observe that ElevenLabs produces partial and final transcripts, but the agent does not answer until disconnect/shutdown, if at all.

### Package Versions

```text
livekit-agents==1.5.12
livekit-plugins-elevenlabs==1.5.12
Python 3.12
```

### Proposed Direction

For the `server_vad` / `commit_strategy="vad"` path, the ElevenLabs plugin may need to translate the server-VAD committed transcript into an end-of-speech signal that LiveKit Agents can use for `turn_detection="stt"`.

For example, after emitting `FINAL_TRANSCRIPT` for a non-empty server-VAD `committed_transcript`, the plugin could also emit `END_OF_SPEECH` (or otherwise mark the STT turn as committed) when the committed transcript represents an endpointed utterance.

The important part is that `turn_detection="stt"` should not depend on a local VAD fallback to make ElevenLabs `server_vad` usable for low-latency agent responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ElevenLabs server_vad with turn_detection="stt" does not end user turns #5849

Bug Description

Expected Behavior

Actual Behavior

Likely Cause

Reproduction Steps

Package Versions

Proposed Direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ElevenLabs server_vad with turn_detection="stt" does not end user turns #5849

Description

Bug Description

Expected Behavior

Actual Behavior

Likely Cause

Reproduction Steps

Package Versions

Proposed Direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions