-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Bug Description
When the LLM returns parallel tool calls in a single generation where:
- One tool returns an Agent (triggering a handoff), and
- Another tool returns a string result (reply_required=True)
I see they both return responses correctly but the handoff never completes. The old agent stays active, on_enter() is never called on the new agent, and the conversation gets stuck in a loop.
This is the same class of bug as #4691, but on the pipeline path (STT → LLM → TTS) rather than the Gemini Realtime path.
Observed Trace Data
Cycle 1 — agent_turn (16:20:24.163 → 16:20:25.031)
| Span | Time | Key attributes |
|---|---|---|
llm_node |
24.164–24.827 | model=gemini-2.5-flash, output_tokens=33, response.text="", response.function_calls=2 calls: validate_and_save_field(isPlanTerminated=false) + switch_assistant(get-info-assistant-product-coverage) |
function_tool |
24.820–25.031 | name=validate_and_save_field, output="Field isPlanTerminated saved successfully with value False", is_error=false |
function_tool |
24.822–24.824 | name=switch_assistant, output="", is_error=false |
tts_node |
24.184–24.834 | input_text="" (nothing to speak) |
Cycle 2 — follow-up agent_turn (16:20:25.033 → 16:20:25.470)
| Span | Time | Key attributes |
|---|---|---|
llm_node |
25.033–25.446 | model=gemini-2.5-flash, output_tokens=0, response.text="", response.function_calls=[] |
tts_node |
25.055–25.470 | input_text="" (nothing to speak) |
→ Agent is silent. on_enter() never fires for the new agent. User speaks again → Cycle 1 repeats identically.
Full trace spans available on request.
Root Cause
Did a brief root cause analysis, it might be related to how parallel tool calls are handled in the agent_activity.
In agent_activity.py around line 2232-2269, after parallel tool execution:
- The SDK detects
agent_taskfrom the handoff tool and callsself._session.update_agent(new_agent_task)(line 2238-2239) — this creates an async_update_activity_task - But
fnc_executed_ev._reply_requiredisTrue(set by the other tool's string return value), so a_pipeline_reply_taskis also created (line 2243-2269) that sends tool results back to the LLM on the old agent - The follow-up LLM call completes before the handoff task can drain the old agent — the LLM returns an empty response (0 output tokens) since the old agent has nothing meaningful to say
- The old agent goes back to "listening". When the user speaks again, the old agent starts a new turn, making the same tool calls (handoff + save), creating the same race — an infinite loop
on_enter()is never called on the new agent
The key issue is at line 2243: when _handoff_required is True, the follow-up _pipeline_reply_task should not be created, since the old agent is about to be replaced.
# agent_activity.py lines 2237-2269 (simplified)
draining = self.scheduling_paused
if fnc_executed_ev._handoff_required and new_agent_task and not ignore_task_switch:
self._session.update_agent(new_agent_task) # async task
draining = True
tool_messages = new_calls + new_fnc_outputs
if fnc_executed_ev._reply_required: # ← True from the non-handoff tool's string result
# This creates a follow-up LLM call on the OLD agent,
# which races with and prevents the handoff from completing
chat_ctx.items.extend(tool_messages)
tool_response_task = self._create_speech_task(
self._pipeline_reply_task(
speech_handle=speech_handle,
chat_ctx=chat_ctx,
tools=tools,
model_settings=ModelSettings(
tool_choice="none" if draining or ... else "auto",
),
...
),
...
)Expected Behavior
When a tool returns an Agent (handoff), the handoff should complete regardless of what other parallel tools returned. on_exit() should be called on the old agent and on_enter() should be called on the new agent.
Reproduction Steps
The bug triggers when the LLM returns parallel tool calls where one returns an Agent and another returns a string. This happens naturally in production when models batch related operations (e.g., "save this field AND switch to the next assistant"). It can be hard to reproduce deterministically since it depends on LLM behavior.
from livekit.agents import Agent, RunContext, function_tool, AgentSession
class PrimaryAgent(Agent):
def __init__(self):
super().__init__(
instructions=(
"You are a helpful assistant. "
"When the user says 'switch', you MUST call BOTH save_data and "
"switch_to_secondary at the same time in the same response."
),
)
async def on_enter(self):
print("PrimaryAgent on_enter called")
await self.session.generate_reply()
@function_tool()
async def save_data(self, context: RunContext, value: str) -> str:
"""Save a piece of data. Always call this before switching agents."""
print(f"save_data called with: {value}")
return f"Saved: {value}" # <-- returns string, sets reply_required=True
@function_tool()
async def switch_to_secondary(self, context: RunContext) -> Agent:
"""Switch to the secondary agent to continue the conversation."""
print("switch_to_secondary called")
return SecondaryAgent() # <-- returns Agent, triggers handoff
class SecondaryAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a secondary assistant. Say hello.")
async def on_enter(self):
# THIS NEVER FIRES when save_data runs in the same generation
print("SecondaryAgent on_enter called")
await self.session.generate_reply()
To reproduce:
* Start the agent with uv run python main.py console
* Say "switch" — the LLM should call both save_data and switch_to_secondary in the same generation
* Observe that SecondaryAgent on_enter called never prints
* The agent goes silent. Subsequent messages keep triggering the same two tool calls in a loop
**Key condition:** The bug only triggers when the LLM batches the handoff tool with another tool that returns a non-None value in the **same generation**. If switch_to_secondary is the only tool called (or all other tools return None), the handoff works correctly.Operating System
Linux
Models Used
Deepgram Nova-3, Gemini 2.5 Flash (via vertex), Elevenlabs TTS
Package Versions
livekit-agent: 1.4.3Session/Room/Call IDs
roomName: "conv-bd37f2b5-74e9-4543-8100-8b6b83eb4156"
session ID: "RM_9eLXjPWZbyfi"
Proposed Solution
When _handoff_required is True, skip the follow-up _pipeline_reply_task. The old agent is being replaced — there's no point sending tool results back to its LLM:
if fnc_executed_ev._reply_required and not fnc_executed_ev._handoff_required:
# Only create follow-up reply task if we're NOT handing off
...Alternatively, add the tool messages to chat context (so they're preserved for the new agent) but don't trigger a new LLM generation on the old agent.
Additional Context
No response
Screenshots and Recordings
No response