Describe the bug
When preemptiveGeneration: true is enabled and a user produces an end-of-utterance during a function tool's execution window, the framework starts a new generateReply (via the preemptive path) before the tool result has been added to chat context. The new generation runs to completion based on stale chat context, the LLM commonly hallucinates the tool's outcome, and its TTS plays in full alongside the legitimate post-tool reply. The user hears two similar messages back-to-back about the same outcome or two tool calls.
Where it happens (agents/src/voice/agent_activity.ts on main):
// onPreemptiveGeneration, line 1291
if (
!preemptiveOpts.enabled ||
this.schedulingPaused ||
(this._currentSpeech !== undefined && !this._currentSpeech.interrupted) ||
!(this.llm instanceof LLM)
) {
return;
}
The early-return guard checks _currentSpeech, but _currentSpeech is cleared in the main loop between LLM-stream-end and tool execution:
// mainTask, ~line 1396
this._currentSpeech = speechHandle;
speechHandle._authorizeGeneration();
await speechHandle.waitIfNotInterrupted([speechHandle._waitForGeneration()]);
this._currentSpeech = undefined; // ← cleared before pipelineReplyTask runs the tool
So while a function tool is executing in a separate task, there is no _currentSpeech, and a user EOU triggers a fresh preemptive generation that has no awareness of the in-flight tool.
The Python framework has a partial guard via _new_turns_blocked set in agent_session.py:1259 during update_agent, which catches the handoff sub-case. That field doesn't exist in agents-js. But even Python is unprotected for plain (non-handoff) tools, where _new_turns_blocked is never set — so this is a general gap, not just a JS port omission.
Relevant log output
Reconstructed from a single production call (room redacted, customer PII stripped). The user said "Yes. Thank" 0.84 s after the bookAppointment tool
started its API call:
04:15:49.748Z user EOU → speech_0f238ed2-954 created
04:15:50.970Z LLM finished (74 tokens, includes a bookAppointment tool call)
04:15:51.450Z TTS finished (39 chars stall message); _currentSpeech is now undefined
04:15:52.514Z bookAppointment tool body starts API call (~3 s)
04:15:53.158Z user starts speaking ("Yes. Thank")
04:15:53.358Z user EOU final → "Speech created" (source=generate_reply, userInitiated=true) speech_3aa64aa6-79c is born, chatCtx has NO tool result in it
04:15:55.515Z bookAppointment returns (handoff result)
04:15:56.157Z LLM for speech_3aa64aa6-79c completes (62 tokens, 228-char hallucinated confirmation)
04:15:57.247Z TTS plays in full (cancelled: false, ~9.7 s of audio): message #1
04:16:05.108Z Post-handoff agent's say() fires (323 chars): message #2
04:16:06.486Z user interrupts ("All right."): speech_84d5f0b8-13a cancelled
Describe your environment
- @livekit/agents: 1.2.8
- turnHandling.preemptiveGeneration.enabled: true
- Node.js: 22.x
- OS: linux x64
Minimal reproducible example
import { voice } from '@livekit/agents';
import { z } from 'zod';
const slowTool = llm.tool({
description: 'Simulate a slow API call',
parameters: z.object({}),
execute: async () => {
await new Promise((r) => setTimeout(r, 3000)); // 3-second tool body
return { ok: true, message: 'real tool result that the LLM cannot guess' };
},
});
const session = new voice.AgentSession({
// ...stt, llm, tts, vad...
voiceOptions: { preemptiveGeneration: true },
});
Repro:
- Start the agent. Get it to call
slowTool.
- Have the user speak any short utterance (e.g. "okay") between the time the tool body starts and when it returns — i.e. while the agent is "waiting on the tool."
- Observe two replies in sequence
Additional information
No response
Describe the bug
When
preemptiveGeneration: trueis enabled and a user produces an end-of-utterance during a function tool's execution window, the framework starts a newgenerateReply(via the preemptive path) before the tool result has been added to chat context. The new generation runs to completion based on stale chat context, the LLM commonly hallucinates the tool's outcome, and its TTS plays in full alongside the legitimate post-tool reply. The user hears two similar messages back-to-back about the same outcome or two tool calls.Where it happens (
agents/src/voice/agent_activity.tsonmain):The early-return guard checks
_currentSpeech, but_currentSpeechis cleared in the main loop between LLM-stream-end and tool execution:So while a function tool is executing in a separate task, there is no
_currentSpeech, and a user EOU triggers a fresh preemptive generation that has no awareness of the in-flight tool.The Python framework has a partial guard via
_new_turns_blockedset inagent_session.py:1259duringupdate_agent,which catches the handoff sub-case. That field doesn't exist in agents-js. But even Python is unprotected for plain (non-handoff) tools, where_new_turns_blockedis never set — so this is a general gap, not just a JS port omission.Relevant log output
Reconstructed from a single production call (room redacted, customer PII stripped). The user said "Yes. Thank" 0.84 s after the
bookAppointmenttoolstarted its API call:
Describe your environment
Minimal reproducible example
Repro:
slowTool.Additional information
No response