Skip to content

fix(voice): wait_for_playout resolves on interrupt instead of deadlocking#5700

Open
MacFall7 wants to merge 1 commit into
livekit:mainfrom
MacFall7:fix/5359-wait-for-playout-interruption
Open

fix(voice): wait_for_playout resolves on interrupt instead of deadlocking#5700
MacFall7 wants to merge 1 commit into
livekit:mainfrom
MacFall7:fix/5359-wait-for-playout-interruption

Conversation

@MacFall7
Copy link
Copy Markdown

PR Description: fix(voice): wait_for_playout resolves on interrupt instead of deadlocking

Closes #5359.

Summary

SpeechHandle.wait_for_playout() and RunContext.wait_for_playout() awaited _done_fut (or _wait_for_generation) only. When interrupt() fired, _interrupt_fut resolved but the wait did not, so callers blocked until the 5s INTERRUPTION_TIMEOUT hard-killed the surrounding tasks. In a tool preamble flow that meant scheduling paused, the worker drained, and subsequent user input was dropped.

This PR races the existing wait against _interrupt_fut in both paths, using the same primitive SpeechHandle.wait_if_not_interrupted() already uses. The wait now returns as soon as either future resolves. Callers that need to distinguish completion from interruption read speech_handle.interrupted on return.

Deviation from #5359's proposed solution (read this first)

The issue author proposed raising InterruptedError when the interrupt wins the race. I started there and the test suite caught a real problem: the SDK already has internal callers (notably agent_activity.py:2079) that await speech_handle.interrupt() and rely on wait_for_playout returning so cleanup can proceed. Raising broke seven unrelated tests, all of them about correct interrupt/handoff/drain handling.

I switched to a return-early semantic for three reasons:

  1. Contract consistency. The existing primitive SpeechHandle.wait_if_not_interrupted() (which this fix lifts the race pattern from) returns. It does not raise. The two wait_for_playout paths now match it.
  2. Backward compatibility. Every existing caller of wait_for_playout continues to compile and behave the same way on the non-interrupt path. The interrupt path was previously a 5s deadlock; it is now an immediate return. No public signature change.
  3. Opt-in raising is one line. A tool author who wants raise-on-interrupt writes:
    await ctx.wait_for_playout()
    if ctx.speech_handle.interrupted:
        raise InterruptedError("...")
    That keeps the framework default conservative and lets each call site decide its own error semantics.

If the maintainers prefer raising as the default, I am happy to follow that direction as a separate PR with the necessary callsite adjustments inside the SDK. I think return-early is the right call here, but the choice is reasonable either way and I do not want to assume.

Changes

  • livekit-agents/livekit/agents/voice/speech_handle.pywait_for_playout() now races _done_fut against _interrupt_fut with asyncio.wait(FIRST_COMPLETED). Cancels the losing future. Returns either way; self.interrupted is true if the interrupt won.
  • livekit-agents/livekit/agents/voice/events.pyRunContext.wait_for_playout() applies the same race against _interrupt_fut.
  • tests/test_speech_handle_interruption.py — three deterministic unit tests, ~0.35s total runtime. Two cover the regression: SpeechHandle.wait_for_playout and RunContext.wait_for_playout both resolve under their 1.0s timeout when _cancel() fires. The third covers the no-interrupt case to confirm the happy path is unchanged.

Diff is intentionally minimal. No public signatures changed. INTERRUPTION_TIMEOUT is unchanged: it remains the right backstop for cases where the race itself misbehaves.

Test plan

  • New unit tests pass on this branch, fail on baseline main with TimeoutError (confirmed via stash/pop cycle). The two regression tests are the bug-shape proof.
  • ruff check and ruff format clean.
  • Full pytest suite green on this branch except for two pre-existing failures (test_events_and_metrics, test_aclose_handles_precancelled_tasks_gracefully) that also fail on the stashed baseline and are unrelated to this change. Happy to triage either as a separate issue if it would be useful.

Notes for reviewers

  • The primitive being lifted (wait_if_not_interrupted) is in the same file at line 201. I cited it in the implementation rather than refactoring into a shared helper, to keep the diff scoped. Happy to refactor if you prefer a shared private helper.
  • The __await__ path on SpeechHandle (so await speech_handle works) routes through wait_for_playout, so this fix incidentally makes await speech_handle interruptable as well. That matches expectation, and it is what agent_activity.py:2079 was already implicitly relying on.

`SpeechHandle.wait_for_playout` and `RunContext.wait_for_playout`
awaited only on the playout-completion future, so callers blocked
until the 5s `INTERRUPTION_TIMEOUT` fired when an interrupt arrived
mid-playout (livekit#5359).

Race the playout future against `_interrupt_fut` using the same
`asyncio.wait(FIRST_COMPLETED)` primitive already used by
`SpeechHandle.wait_if_not_interrupted`. The wait now returns promptly
on interrupt; callers inspect `speech_handle.interrupted` to decide
how to proceed. Existing internal `await speech_handle.interrupt()`
flows that wait for cleanup keep working unchanged.

Closes livekit#5359

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 10, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

)

await asyncio.shield(self._done_fut)
await self._race_against_interrupt(self._done_fut)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot resolve the done_fut on interrupt because the audio output may not stopped when interrupt is set, especially for avatar use case. there is already a timeout in _cancel that will resolve the done after a timeout and cancel the tasks arbitrarily.

@MacFall7
Copy link
Copy Markdown
Author

MacFall7 commented May 11, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SpeechHandle.wait_for_playout() ignores interruption, causing 5s deadlock during tool preamble

3 participants