docs(voice): scaffold resume_false_interruption port (tracks livekit/agents#5535)#1310
Conversation
|
|
|
|
cc @claude rebased on top of resume false interruption branch. Adjust your change accordingly |
|
|
||
| // default to null as None, which maps to the default provider tool choice value | ||
| private toolChoice: ToolChoice | null = null; | ||
| // Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 158 line |
There was a problem hiding this comment.
🟡 Ref comment format deviates from CLAUDE.md template (line vs lines)
The // Ref: comment on line 205 uses - 158 line (singular) instead of - 158 lines (plural). CLAUDE.md specifies the template as // Ref: python <relative-file-path> - <line-range> lines and both examples in the doc use lines (plural). All other six Ref comments in this PR correctly use lines.
| // Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 158 line | |
| // Ref: python livekit-agents/livekit/agents/voice/agent_activity.py - 158 lines |
Was this helpful? React with 👍 or 👎 to provide feedback.
fc76cb3 to
e77c9d0
Compare
Summary
Automated Claude Code port of Python PR livekit/agents#5535 — "fix(voice): pause output when user starts speaking during thinking".
cc @toubatbrian @livekit/agent-devs
What the Python PR does (context for reviewers)
Fixes livekit/agents#5509. When a user starts a new turn while the agent is still
thinking(LLM generating, TTS audio not yet flowing), the stale reply could still reachspeakingand play over the new user turn.Key Python changes in agent_activity.py:
_PausedSpeechInfodataclass — carrieshandle + agent_state + timeout. The capturedagent_stateis restored on resume (instead of alwaysspeaking), so a pause that began duringthinkingresumes tothinking.on_start_of_speechpause path — whenagent_state != "speaking"and pause is enabled, pause the output withtimeout=0. A brief false-positive VAD then resumes immediately on VAD EOS;_interrupt_by_audio_activityupgrades the timeout to the realfalse_interruption_timeoutonce VAD confirms active speech._update_paused_speech(handle, timeout)and_pause_enabled()._paused_speech: SpeechHandle | None→_PausedSpeechInfo | None— all call sites updated (_start_false_interruption_timer,_cancel_speech_pause,on_end_of_speech,on_interim_transcript,on_final_transcript).FakeAudioOutput(can_pause=True)supports pause;FakeUserSpeechwith emptytranscriptfires VAD SOS/EOS only (no STT events) to simulate sub-min_durationnoise. Two new regression tests:test_interrupt_before_speaking_with_pausable_audioandtest_false_interruption_before_speaking_resumes.What this JS PR ports
Because the base feature isn't in TS yet, this PR is structural only — no behavioral change:
PausedSpeechInfointerface added inagent_activity.tswith full doc and a Python ref comment pointing to the Python dataclass._pausedSpeech: PausedSpeechInfo | null = nullfield added onAgentActivityso future wiring is a field-reuse rather than an API introduction.AgentStatetype import from./events.js(matches the PythonAgentStateimport added in the fix).onStartOfSpeech→ Python lines 1665–1683 (the new pause path, this PR's core change)interruptByAudioActivity→ Python lines 1615–1645 (pauseEnabled / updatePausedSpeech / AgentFalseInterruptionEvent / state transition)onEndOfSpeech→ Python lines 1707–1708 (timer start usingpausedSpeech.timeout)onFinalTranscript→ Python lines 1800–1811 (resume-timer + cancel-speech-pause task)patch) describing the scaffolding.Implementation nuances — why no full-feature port
The JS repo already defines the option types (
turn_config/interruption.ts):but greps show they are never read in
agents/src/voice/**— the implementation is missing:_pausedSpeech,_falseInterruptionTimer, or_cancelSpeechPauseTaskstate onAgentActivity._startFalseInterruptionTimer,_cancelSpeechPause,_pauseEnabled, or_updatePausedSpeechmethods.interruptByAudioActivityhard-interrupts viathis._currentSpeech.interrupt()instead of pausing.AgentFalseInterruptionEventemitter (not defined inevents.ts).FakeAudioOutput(JS testing mock) doesn't currently supportcanPause=true.Context:
livekit/agents-js#843("OTEL logging integration & System-wise Traces") explicitly listsresume_agent_activity/pause_agent_activityas "Pause/resume not in TS" under Python-specific features.Porting all of the above alongside PR #5535's fix in a single automated pass would materially expand the change surface (multi-hundred-line addition across
agent_activity.ts,events.ts,agent_session.ts, fake I/O, tests) and was judged out-of-scope for an automated routine. Instead this PR:TODO(port-resume-false-interruption)orlivekit/agents#5535to find every site.Follow-up (suggested scope for the full port)
A future PR should:
AgentFalseInterruptionEventinevents.tsand wire it throughAgentSession.AgentActivity: timer handles,pauseEnabled(),updatePausedSpeech(),startFalseInterruptionTimer(),cancelSpeechPause().interruptByAudioActivityfor the pause path whenpauseEnabled()is true.onStartOfSpeechpause path (the fix introduced by #5535).FakeAudioOutputwithcanPause/pause bookkeeping mirroringtests/fake_io.py.test_interrupt_before_speaking_with_pausable_audio,test_false_interruption_before_speaking_resumes).Verification
pnpm --filter @livekit/agents build— passes.pnpm --filter @livekit/agents lint— no new errors (pre-existing warnings only).pnpm format:check— clean.pnpm --filter @livekit/agents exec vitest run src/voice/agent_activity.test.ts— 8/8 passed.pnpm api:check— pre-existing api-extractor failure on main (export * as ___ not supported), unrelated to this change.Provenance
Generated by Claude Code