Skip to content

Interrupted speech dropped from conversation history when audio paused before first_frame_fut resolves #5038

@MonkeyLeeT

Description

@MonkeyLeeT

Bug Description

We're observing cases where an agent response is silently dropped from conversation history when resume_false_interruption=True. The response appears to be generated successfully (LLM and TTS both complete with cancelled=false), and LiveKit's own tracing captures an agent_turn span, but no conversation_item_added event is emitted and the agent never enters speaking state.

We've traced what we believe is the root cause through the framework code, but we may be misunderstanding part of the flow — happy to provide more details or logs if helpful.

What We Observed

Using livekit-agents 1.4.4, AssemblyAI STT with turn_detection="stt", Silero VAD, resume_false_interruption=True, false_interruption_timeout=2.0:

  • T+0.0s: Agent generation starts
  • T+0.25s: Agent → thinking (authorization gate begins)
  • T+1.7s: LLM completes (ttft=1.14s, cancelled=false)
  • T+1.9s: TTS completes (audio_duration=6.06s, chars=182, ttfb=0.27s, cancelled=false)
  • T+6.3s: Agent → listeningno speaking state in between
  • LiveKit traces an 8s agent_turn span with interrupted=true
  • No conversation_item_added event is ever emitted
  • The response is permanently lost from conversation history

The user reported hearing part of the response, but it was never committed.

Our Theory of the Root Cause

We believe the sequence is:

  1. Authorization gate passes (user stops speaking), perform_audio_forwarding starts
  2. _audio_forwarding_task calls audio_output.resume() and begins capturing TTS frames
  3. A brief user sound triggers VAD → _interrupt_by_audio_activity → since resume_false_interruption=True, this calls audio_output.pause() instead of speech_handle.interrupt()
  4. playback_started event never fires (audio is paused) → first_frame_fut stays unresolved
  5. _on_first_frame callback never runs → agent never transitions to speaking
  6. A final transcript from the sound triggers on_final_transcript_cancel_speech_pausespeech_handle.interrupt() (real interrupt)
  7. In the interrupted cleanup path of _pipeline_reply_task_impl:
forwarded_text = text_out.text if text_out else ""     # ← has the generated text
if speech_handle.interrupted:
    ...
    if audio_output is not None:
        audio_output.clear_buffer()
        playback_ev = await audio_output.wait_for_playout()
        if (
            audio_out is not None
            and audio_out.first_frame_fut.done()           # False — never resolved
            and not audio_out.first_frame_fut.cancelled()
        ):
            if playback_ev.synchronized_transcript is not None:
                forwarded_text = playback_ev.synchronized_transcript
        else:
            forwarded_text = ""                            # ← overwrites generated text

if forwarded_text:                                         # empty — skipped
    msg = chat_ctx.add_message(...)
    self._session._conversation_item_added(msg)            # ← never called

The else: forwarded_text = "" at line 2158 overwrites the generated text that was already assigned at line 2140. The message is never committed.

This same pattern appears in three places (lines 1850, 2158, 2643).

Impact

  • Agent response is silently dropped with no error or warning logged
  • Conversation history becomes inconsistent — the user heard a response that doesn't exist in chat_ctx

Possible Fix

We think removing the else: forwarded_text = "" branch would fix this — when first_frame_fut didn't resolve, fall through with text_out.text. The message would still be committed with interrupted=True so callers know it wasn't fully played.

But we're not sure if there's a deeper reason the else branch exists (e.g. preventing double-commits or handling edge cases we haven't considered). Would love your input on the right approach.

Workaround

We're disabling resume_false_interruption on our end.

Environment

  • livekit-agents 1.4.4
  • AssemblyAI STT with turn_detection="stt"
  • Silero VAD
  • resume_false_interruption=True, false_interruption_timeout=2.0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions