Bug Description
AgentSession._on_error applies an max_unrecoverable_errors tolerance counter for llm_error and tts_error, but stt_error has no such counter — it falls through and immediately closes the session on the first non-recoverable STT error.
Root Cause
In livekit-agents/livekit/agents/voice/agent_session.py, the _on_error method:
def _on_error(self, error: llm.LLMError | stt.STTError | tts.TTSError | ...) -> None:
if self._closing_task or error.recoverable:
return
if error.type == "llm_error":
self._llm_error_counts += 1
if self._llm_error_counts <= self.conn_options.max_unrecoverable_errors:
return # ← tolerance applied
elif error.type == "tts_error":
self._tts_error_counts += 1
if self._tts_error_counts <= self.conn_options.max_unrecoverable_errors:
return # ← tolerance applied
# stt_error: no branch → falls through → session closes immediately ❌
self._closing_task = asyncio.create_task(
self._aclose_impl(error=error, reason=CloseReason.ERROR)
)
_llm_error_counts and _tts_error_counts are initialised and reset in multiple places (__init__, _aclose_impl, _update_agent_state), but there is no equivalent _stt_error_counts.
Real-World Impact
When an STT provider returns transient errors (e.g. HTTP 403/429 from Groq after exhausting per-utterance retries in the RecognizeStream._main_task loop), the session closes immediately on the very first non-recoverable STT error. There is no opportunity for application-level error handlers (registered via session.on("error", ...)) to perform graceful recovery such as transferring to a human agent.
The per-utterance retry loop in RecognizeStream._main_task already handles transient network blips. The session-level tolerance is meant to handle the case where retries are truly exhausted — but currently STT gets zero tolerance at the session level while LLM and TTS each get max_unrecoverable_errors (default 3).
Expected Behaviour
stt_error should receive the same tolerance as llm_error and tts_error: allow up to max_unrecoverable_errors non-recoverable STT errors before closing the session, giving application error handlers a chance to act (e.g. fall back to another STT, transfer to human, speak an acknowledgement).
Proposed Fix
Add _stt_error_counts counter mirroring the existing LLM/TTS pattern:
# __init__ and reset sites:
self._stt_error_counts = 0
# _on_error:
elif error.type == "stt_error":
self._stt_error_counts += 1
if self._stt_error_counts <= self.conn_options.max_unrecoverable_errors:
return
And reset _stt_error_counts alongside the other counters in _aclose_impl and _update_agent_state (when state transitions to "speaking").
Environment
livekit-agents (latest main)
- STT plugin:
livekit-plugins-groq (wraps livekit-plugins-openai STT)
- Reproduced with any STT provider that returns HTTP 4xx errors after retries are exhausted
Bug Description
AgentSession._on_errorapplies anmax_unrecoverable_errorstolerance counter forllm_errorandtts_error, butstt_errorhas no such counter — it falls through and immediately closes the session on the first non-recoverable STT error.Root Cause
In
livekit-agents/livekit/agents/voice/agent_session.py, the_on_errormethod:_llm_error_countsand_tts_error_countsare initialised and reset in multiple places (__init__,_aclose_impl,_update_agent_state), but there is no equivalent_stt_error_counts.Real-World Impact
When an STT provider returns transient errors (e.g. HTTP 403/429 from Groq after exhausting per-utterance retries in the
RecognizeStream._main_taskloop), the session closes immediately on the very first non-recoverable STT error. There is no opportunity for application-level error handlers (registered viasession.on("error", ...)) to perform graceful recovery such as transferring to a human agent.The per-utterance retry loop in
RecognizeStream._main_taskalready handles transient network blips. The session-level tolerance is meant to handle the case where retries are truly exhausted — but currently STT gets zero tolerance at the session level while LLM and TTS each getmax_unrecoverable_errors(default 3).Expected Behaviour
stt_errorshould receive the same tolerance asllm_errorandtts_error: allow up tomax_unrecoverable_errorsnon-recoverable STT errors before closing the session, giving application error handlers a chance to act (e.g. fall back to another STT, transfer to human, speak an acknowledgement).Proposed Fix
Add
_stt_error_countscounter mirroring the existing LLM/TTS pattern:And reset
_stt_error_countsalongside the other counters in_aclose_impland_update_agent_state(when state transitions to"speaking").Environment
livekit-agents(latest main)livekit-plugins-groq(wrapslivekit-plugins-openaiSTT)