Fix playback flush and speech interruption races#1518
Merged
Conversation
🦋 Changeset detectedLatest commit: 84c7298 The changes in this PR will be included in the next version bump. This PR includes changesets to release 32 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Ensure agent speech end state is synchronized across VAD interruption, confirmed interruption, and pipeline completion paths so STT does not remain stuck buffering user speech as overlap. Add focused regressions for repeated VAD interruption, paused-speech interruption recovery, and pipeline completion cleanup.
theomonnom
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an STT-deafness wedge that occurred when a handoff cascade fires three
_updateActivitycalls in <10 ms (e.g.TaskGroupresolving while the next taskstarts to greet). In that window, the caller's speech triggered
interruptByAudioActivity()during thethinking → speakingtransition, and adefensive
wasAgentSpeakingguard skipped clearingisAgentSpeaking— leavingAudioRecognitionpermanently holding every subsequent STT event intranscriptBuffer. The agent appeared dead for ~90 s until the false-interrupttimer eventually unwedged it.
The fix aligns the JS implementation with the Python upstream
(
livekit-agents/livekit/agents/voice/agent_activity.py) for the affected pathsand adds two narrow JS-only defenses.
agent_activity.ts— align with PythoninterruptByAudioActivity()now always callsonEndOfAgentSpeech()and_updateAgentState('listening')whenever the pause path runs, instead ofguarding on
agentState === 'speaking'. This was the root cause of the wedge:during the
thinking → speakingtransition the guard was false and theSTT-hold flag was never cleared. Python's
_interrupt_by_audio_activitydoes this unconditionally.onPipelineReplyDone()now also callsonEndOfAgentSpeech()andrestoreInterruptionByAudioActivity()(matches Python's_on_pipeline_reply_done) as a safety net at the end of every reply task.executeToolsTaskwhenthe speech handle is interrupted, matching Python's
if speech_handle.interrupted: await cancel_and_wait(exe_task); return.'thinking'transition is gated on!speechHandle.interrupted, matchingPython's
if not speech_handle.interrupted and len(tool_output.output) > 0.audio_recognition.ts— JS async-only fixesonEndOfAgentSpeech()clearsisAgentSpeaking = falsebefore awaitingthe interruption sentinel write. Python is unaffected because it uses
send_nowait; in JS the awaitableinterruptionStreamChannel.writecan stalland let new STT events get buffered against an agent that has already
finished speaking.
onEndOfOverlapSpeech()now uses a snapshot ofignoreUserTranscriptUntilcaptured before the in-function mutation. The previous condition compared the
already-mutated value and so was unreachable, suppressing the overlap-end
notification for the rest