fix(voice): pause output when user starts speaking during thinking by longcw · Pull Request #5535 · livekit/agents

longcw · 2026-04-23T09:10:36Z

Summary

Fixes #5509. When the user starts a new turn while the agent is in thinking (LLM generating, TTS audio not yet flowing), the stale reply could still reach speaking and play over the new user turn. This pauses the pausable audio output on VAD SOS before any TTS frame hits the transport, so the stale reply neither plays nor promotes to speaking.

Changes

livekit-agents/livekit/agents/voice/agent_activity.py

New on_start_of_speech path: when the agent is not speaking, pauses the output if it supports pause. Uses timeout=0 so a brief false-positive VAD resumes immediately on VAD EOS; _interrupt_by_audio_activity upgrades to the real timeout if VAD confirms active speech.
_update_paused_speech preserves the agent_state captured at first pause across repeat calls
_PausedSpeechInfo carries handle + agent_state + timeout so resume restores the exact state (e.g. thinking, not speaking) that was active when the pause began.

Test helpers (tests/fake_*.py)

FakeAudioOutput(can_pause=True) supports pause now
FakeUserSpeech with empty transcript fires VAD SOS/EOS only (no STT events) — simulates sub-min_duration noise.

Known limitation

With timeout=0 on the new pause, if the user's speech is shorter than min_interruption_duration but STT still produces a final transcript that arrives after VAD EOS, the paused agent speech briefly resumes (timer fires on EOS) and then gets interrupted when the STT transcript lands — so the user hears a short snippet of the stale reply before it cuts off.

…peaking

…5509) When the user starts a new turn while the agent is in thinking (LLM generating, TTS audio not yet flowing), pause the pausable audio output on VAD SOS so no stale TTS frames reach the transport and the stale reply never promotes to speaking. - New on_start_of_speech path pauses the output when agent_state is not "speaking" and the output supports pause. Uses timeout=0 so a brief false-positive VAD resumes immediately on VAD EOS. - _update_paused_speech preserves the agent_state captured at first pause across repeat calls (Path A may re-call for the same handle with a real timeout; Path B yields to an existing pause via a call-site guard). - _pause_enabled() centralizes the resume_false_interruption / false_interruption_timeout / can_pause gate. Test helpers: - FakeAudioOutput gains optional can_pause with a virtual playout clock (_started_at / _paused_at / _total_paused) so flush completion is deferred on pause and rescheduled on resume — played_duration is accurate for clear_buffer's playback_position. - FakeUserSpeech with an empty transcript now fires VAD SOS/EOS only (no STT events), simulating sub-min_duration noise. - create_session gains can_pause_audio passthrough. Tests: - test_interrupt_before_speaking_with_pausable_audio: #5509 regression — no speaking transition, playback_finished interrupted with playback_position=0, stale reply dropped from chat_ctx. - test_false_interruption_before_speaking_resumes: brief VAD-only noise during thinking pauses then resumes on VAD EOS; speaking fires at ~3.8s (postponed) and playback completes without interruption.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

miguelmoralai · 2026-04-23T09:46:33Z

Nice approach — pausing sidesteps the idle risk entirely, no backstop needed.

One thought on the known limitation (short utterance → brief audio blip before cut-off): the resume on VAD EOS could be gated on whether the current STT interim has non-empty content. Empty interim → blip was noise, safe to resume. Non-empty interim → real speech, skip resume and let the interrupt path handle it. That's the heuristic we landed on for our own workaround and it eliminated that race cleanly.

chenghao-mou

lgtm. It worked well when I spoke during its thinking state.

toubatbrian · 2026-04-24T01:41:03Z

This is an automated Claude Code Routine created by @toubatbrian. Right now it is in experimentation stage. The automation will start porting this PR into agents-js automatically.

Tracking: this core-runtime fix (voice/agent_activity.py — pause output when user starts speaking during thinking) qualifies for porting. A corresponding PR will be opened in livekit/agents-js shortly.

Generated by Claude Code

longcw · 2026-04-24T01:42:55Z

@miguelmoralai

the resume on VAD EOS could be gated on whether the current STT interim has non-empty content.

thanks for the advise, yeah there is already an interruption logic on interim transcript, if non-empty interim transcript received, the timeout for resume will be updated to the default value (e.g. 2 seconds).

longcw added 3 commits April 23, 2026 09:53

wip

ea7dc8b

Merge remote-tracking branch 'origin/main' into longc/pause-on-user-s…

8041675

…peaking

chenghao-mou requested a review from a team April 23, 2026 09:10

devin-ai-integration Bot reviewed Apr 23, 2026

View reviewed changes

chenghao-mou approved these changes Apr 23, 2026

View reviewed changes

longcw merged commit 63fc7fc into main Apr 24, 2026
26 checks passed

longcw deleted the longc/pause-on-user-speaking branch April 24, 2026 01:40

toubatbrian mentioned this pull request Apr 24, 2026

docs(voice): scaffold resume_false_interruption port (tracks livekit/agents#5535) livekit/agents-js#1310

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): pause output when user starts speaking during thinking#5535

fix(voice): pause output when user starts speaking during thinking#5535
longcw merged 3 commits intomainfrom
longc/pause-on-user-speaking

longcw commented Apr 23, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

miguelmoralai commented Apr 23, 2026 •

edited

Loading

Uh oh!

chenghao-mou left a comment

Uh oh!

Uh oh!

toubatbrian commented Apr 24, 2026

Uh oh!

longcw commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

longcw commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Known limitation

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

miguelmoralai commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenghao-mou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toubatbrian commented Apr 24, 2026

Uh oh!

longcw commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

longcw commented Apr 23, 2026 •

edited

Loading

miguelmoralai commented Apr 23, 2026 •

edited

Loading