Skip to content

Agent handoff fails when tool returning Agent runs in parallel with reply-producing tool (pipeline path) #5150

@onur-yildirim-infinitusai

Description

Bug Description

When the LLM returns parallel tool calls in a single generation where:

  1. One tool returns an Agent (triggering a handoff), and
  2. Another tool returns a string result (reply_required=True)

I see they both return responses correctly but the handoff never completes. The old agent stays active, on_enter() is never called on the new agent, and the conversation gets stuck in a loop.

This is the same class of bug as #4691, but on the pipeline path (STT → LLM → TTS) rather than the Gemini Realtime path.

Observed Trace Data

Cycle 1 — agent_turn (16:20:24.163 → 16:20:25.031)

Span Time Key attributes
llm_node 24.164–24.827 model=gemini-2.5-flash, output_tokens=33, response.text="", response.function_calls=2 calls: validate_and_save_field(isPlanTerminated=false) + switch_assistant(get-info-assistant-product-coverage)
function_tool 24.820–25.031 name=validate_and_save_field, output="Field isPlanTerminated saved successfully with value False", is_error=false
function_tool 24.822–24.824 name=switch_assistant, output="", is_error=false
tts_node 24.184–24.834 input_text="" (nothing to speak)

Cycle 2 — follow-up agent_turn (16:20:25.033 → 16:20:25.470)

Span Time Key attributes
llm_node 25.033–25.446 model=gemini-2.5-flash, output_tokens=0, response.text="", response.function_calls=[]
tts_node 25.055–25.470 input_text="" (nothing to speak)

→ Agent is silent. on_enter() never fires for the new agent. User speaks again → Cycle 1 repeats identically.

Full trace spans available on request.

Root Cause

Did a brief root cause analysis, it might be related to how parallel tool calls are handled in the agent_activity.
In agent_activity.py around line 2232-2269, after parallel tool execution:

  1. The SDK detects agent_task from the handoff tool and calls self._session.update_agent(new_agent_task) (line 2238-2239) — this creates an async _update_activity_task
  2. But fnc_executed_ev._reply_required is True (set by the other tool's string return value), so a _pipeline_reply_task is also created (line 2243-2269) that sends tool results back to the LLM on the old agent
  3. The follow-up LLM call completes before the handoff task can drain the old agent — the LLM returns an empty response (0 output tokens) since the old agent has nothing meaningful to say
  4. The old agent goes back to "listening". When the user speaks again, the old agent starts a new turn, making the same tool calls (handoff + save), creating the same race — an infinite loop
  5. on_enter() is never called on the new agent

The key issue is at line 2243: when _handoff_required is True, the follow-up _pipeline_reply_task should not be created, since the old agent is about to be replaced.

# agent_activity.py lines 2237-2269 (simplified)
draining = self.scheduling_paused
if fnc_executed_ev._handoff_required and new_agent_task and not ignore_task_switch:
    self._session.update_agent(new_agent_task)  # async task
    draining = True

tool_messages = new_calls + new_fnc_outputs
if fnc_executed_ev._reply_required:  # ← True from the non-handoff tool's string result
    # This creates a follow-up LLM call on the OLD agent,
    # which races with and prevents the handoff from completing
    chat_ctx.items.extend(tool_messages)
    tool_response_task = self._create_speech_task(
        self._pipeline_reply_task(
            speech_handle=speech_handle,
            chat_ctx=chat_ctx,
            tools=tools,
            model_settings=ModelSettings(
                tool_choice="none" if draining or ... else "auto",
            ),
            ...
        ),
        ...
    )

Expected Behavior

When a tool returns an Agent (handoff), the handoff should complete regardless of what other parallel tools returned. on_exit() should be called on the old agent and on_enter() should be called on the new agent.

Reproduction Steps

The bug triggers when the LLM returns parallel tool calls where one returns an Agent and another returns a string. This happens naturally in production when models batch related operations (e.g., "save this field AND switch to the next assistant"). It can be hard to reproduce deterministically since it depends on LLM behavior.


from livekit.agents import Agent, RunContext, function_tool, AgentSession


class PrimaryAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions=(
                "You are a helpful assistant. "
                "When the user says 'switch', you MUST call BOTH save_data and "
                "switch_to_secondary at the same time in the same response."
            ),
        )

    async def on_enter(self):
        print("PrimaryAgent on_enter called")
        await self.session.generate_reply()

    @function_tool()
    async def save_data(self, context: RunContext, value: str) -> str:
        """Save a piece of data. Always call this before switching agents."""
        print(f"save_data called with: {value}")
        return f"Saved: {value}"  # <-- returns string, sets reply_required=True

    @function_tool()
    async def switch_to_secondary(self, context: RunContext) -> Agent:
        """Switch to the secondary agent to continue the conversation."""
        print("switch_to_secondary called")
        return SecondaryAgent()  # <-- returns Agent, triggers handoff


class SecondaryAgent(Agent):
    def __init__(self):
        super().__init__(instructions="You are a secondary assistant. Say hello.")

    async def on_enter(self):
        # THIS NEVER FIRES when save_data runs in the same generation
        print("SecondaryAgent on_enter called")
        await self.session.generate_reply()

To reproduce:

* Start the agent with uv run python main.py console
* Say "switch" — the LLM should call both save_data and switch_to_secondary in the same generation
* Observe that SecondaryAgent on_enter called never prints
* The agent goes silent. Subsequent messages keep triggering the same two tool calls in a loop

**Key condition:** The bug only triggers when the LLM batches the handoff tool with another tool that returns a non-None value in the **same generation**. If switch_to_secondary is the only tool called (or all other tools return None), the handoff works correctly.

Operating System

Linux

Models Used

Deepgram Nova-3, Gemini 2.5 Flash (via vertex), Elevenlabs TTS

Package Versions

livekit-agent: 1.4.3

Session/Room/Call IDs

roomName: "conv-bd37f2b5-74e9-4543-8100-8b6b83eb4156"
session ID: "RM_9eLXjPWZbyfi"

Proposed Solution

When _handoff_required is True, skip the follow-up _pipeline_reply_task. The old agent is being replaced — there's no point sending tool results back to its LLM:

if fnc_executed_ev._reply_required and not fnc_executed_ev._handoff_required:
    # Only create follow-up reply task if we're NOT handing off
    ...

Alternatively, add the tool messages to chat context (so they're preserved for the new agent) but don't trigger a new LLM generation on the old agent.

Additional Context

No response

Screenshots and Recordings

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions