Skip to content

Fix playback flush and speech interruption races#1518

Merged
toubatbrian merged 10 commits into
mainfrom
brian/fix-audio-interruption-playback-races
May 21, 2026
Merged

Fix playback flush and speech interruption races#1518
toubatbrian merged 10 commits into
mainfrom
brian/fix-audio-interruption-playback-races

Conversation

@toubatbrian
Copy link
Copy Markdown
Contributor

@toubatbrian toubatbrian commented May 15, 2026

Summary

Fixes an STT-deafness wedge that occurred when a handoff cascade fires three
_updateActivity calls in <10 ms (e.g. TaskGroup resolving while the next task
starts to greet). In that window, the caller's speech triggered
interruptByAudioActivity() during the thinking → speaking transition, and a
defensive wasAgentSpeaking guard skipped clearing isAgentSpeaking — leaving
AudioRecognition permanently holding every subsequent STT event in
transcriptBuffer. The agent appeared dead for ~90 s until the false-interrupt
timer eventually unwedged it.

The fix aligns the JS implementation with the Python upstream
(livekit-agents/livekit/agents/voice/agent_activity.py) for the affected paths
and adds two narrow JS-only defenses.

agent_activity.ts — align with Python

  • interruptByAudioActivity() now always calls onEndOfAgentSpeech() and
    _updateAgentState('listening') whenever the pause path runs, instead of
    guarding on agentState === 'speaking'. This was the root cause of the wedge:
    during the thinking → speaking transition the guard was false and the
    STT-hold flag was never cleared. Python's
    _interrupt_by_audio_activity does this unconditionally.
  • onPipelineReplyDone() now also calls onEndOfAgentSpeech() and
    restoreInterruptionByAudioActivity() (matches Python's
    _on_pipeline_reply_done) as a safety net at the end of every reply task.
  • Tool-reply continuation now early-returns and cancels executeToolsTask when
    the speech handle is interrupted, matching Python's
    if speech_handle.interrupted: await cancel_and_wait(exe_task); return.
  • The 'thinking' transition is gated on !speechHandle.interrupted, matching
    Python's if not speech_handle.interrupted and len(tool_output.output) > 0.

audio_recognition.ts — JS async-only fixes

  • onEndOfAgentSpeech() clears isAgentSpeaking = false before awaiting
    the interruption sentinel write. Python is unaffected because it uses
    send_nowait; in JS the awaitable interruptionStreamChannel.write can stall
    and let new STT events get buffered against an agent that has already
    finished speaking.
  • onEndOfOverlapSpeech() now uses a snapshot of ignoreUserTranscriptUntil
    captured before the in-function mutation. The previous condition compared the
    already-mutated value and so was unreachable, suppressing the overlap-end
    notification for the rest

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 84c7298

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 32 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chatgpt-codex-connector[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Ensure agent speech end state is synchronized across VAD interruption, confirmed interruption, and pipeline completion paths so STT does not remain stuck buffering user speech as overlap. Add focused regressions for repeated VAD interruption, paused-speech interruption recovery, and pipeline completion cleanup.
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@toubatbrian toubatbrian merged commit d05fe63 into main May 21, 2026
9 checks passed
@toubatbrian toubatbrian deleted the brian/fix-audio-interruption-playback-races branch May 21, 2026 05:05
@github-actions github-actions Bot mentioned this pull request May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants