Agent handoff fails when tool returning Agent runs in parallel with reply-producing tool (pipeline path)

### Bug Description

When the LLM returns parallel tool calls in a single generation where:

1. One tool returns an Agent (triggering a handoff), and
2. Another tool returns a string result (reply_required=True)

I see they both return responses correctly but the handoff never completes. The old agent stays active, `on_enter()` is never called on the new agent, and the conversation gets stuck in a loop.

This is the same class of bug as #4691, but on the pipeline path (STT → LLM → TTS) rather than the Gemini Realtime path.

### Observed Trace Data

**Cycle 1 — agent_turn (16:20:24.163 → 16:20:25.031)**

| Span | Time | Key attributes |
|------|------|---------------|
| `llm_node` | 24.164–24.827 | model=`gemini-2.5-flash`, output_tokens=**33**, response.text=`""`, response.function_calls=**2 calls**: `validate_and_save_field(isPlanTerminated=false)` + `switch_assistant(get-info-assistant-product-coverage)` |
| `function_tool` | 24.820–25.031 | name=`validate_and_save_field`, output=`"Field isPlanTerminated saved successfully with value False"`, is_error=false |
| `function_tool` | 24.822–24.824 | name=`switch_assistant`, output=`""`, is_error=false |
| `tts_node` | 24.184–24.834 | input_text=`""` (nothing to speak) |

**Cycle 2 — follow-up agent_turn (16:20:25.033 → 16:20:25.470)**

| Span | Time | Key attributes |
|------|------|---------------|
| `llm_node` | 25.033–25.446 | model=`gemini-2.5-flash`, output_tokens=**0**, response.text=`""`, response.function_calls=`[]` |
| `tts_node` | 25.055–25.470 | input_text=`""` (nothing to speak) |

→ Agent is silent. `on_enter()` never fires for the new agent. User speaks again → Cycle 1 repeats identically.

*Full trace spans available on request.*


### Root Cause
Did a brief root cause analysis, it might be related to how parallel tool calls are handled in the `agent_activity`.
In `agent_activity.py` around line 2232-2269, after parallel tool execution:

1. The SDK detects `agent_task` from the handoff tool and calls `self._session.update_agent(new_agent_task)` (line 2238-2239) — this creates an **async** `_update_activity_task`
2. But `fnc_executed_ev._reply_required` is `True` (set by the other tool's string return value), so a `_pipeline_reply_task` is **also created** (line 2243-2269) that sends tool results back to the LLM on the **old agent**
3. The follow-up LLM call completes before the handoff task can drain the old agent — the LLM returns an empty response (0 output tokens) since the old agent has nothing meaningful to say
4. The old agent goes back to "listening". When the user speaks again, the old agent starts a new turn, making the same tool calls (handoff + save), creating the same race — an infinite loop
5. `on_enter()` is never called on the new agent

The key issue is at line 2243: when `_handoff_required` is `True`, the follow-up `_pipeline_reply_task` should not be created, since the old agent is about to be replaced.

```python
# agent_activity.py lines 2237-2269 (simplified)
draining = self.scheduling_paused
if fnc_executed_ev._handoff_required and new_agent_task and not ignore_task_switch:
    self._session.update_agent(new_agent_task)  # async task
    draining = True

tool_messages = new_calls + new_fnc_outputs
if fnc_executed_ev._reply_required:  # ← True from the non-handoff tool's string result
    # This creates a follow-up LLM call on the OLD agent,
    # which races with and prevents the handoff from completing
    chat_ctx.items.extend(tool_messages)
    tool_response_task = self._create_speech_task(
        self._pipeline_reply_task(
            speech_handle=speech_handle,
            chat_ctx=chat_ctx,
            tools=tools,
            model_settings=ModelSettings(
                tool_choice="none" if draining or ... else "auto",
            ),
            ...
        ),
        ...
    )
```

### Expected Behavior

When a tool returns an Agent (handoff), the handoff should complete regardless of what other parallel tools returned. on_exit() should be called on the old agent and on_enter() should be called on the new agent.

### Reproduction Steps

```bash
The bug triggers when the LLM returns parallel tool calls where one returns an Agent and another returns a string. This happens naturally in production when models batch related operations (e.g., "save this field AND switch to the next assistant"). It can be hard to reproduce deterministically since it depends on LLM behavior.


from livekit.agents import Agent, RunContext, function_tool, AgentSession


class PrimaryAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions=(
                "You are a helpful assistant. "
                "When the user says 'switch', you MUST call BOTH save_data and "
                "switch_to_secondary at the same time in the same response."
            ),
        )

    async def on_enter(self):
        print("PrimaryAgent on_enter called")
        await self.session.generate_reply()

    @function_tool()
    async def save_data(self, context: RunContext, value: str) -> str:
        """Save a piece of data. Always call this before switching agents."""
        print(f"save_data called with: {value}")
        return f"Saved: {value}"  # <-- returns string, sets reply_required=True

    @function_tool()
    async def switch_to_secondary(self, context: RunContext) -> Agent:
        """Switch to the secondary agent to continue the conversation."""
        print("switch_to_secondary called")
        return SecondaryAgent()  # <-- returns Agent, triggers handoff


class SecondaryAgent(Agent):
    def __init__(self):
        super().__init__(instructions="You are a secondary assistant. Say hello.")

    async def on_enter(self):
        # THIS NEVER FIRES when save_data runs in the same generation
        print("SecondaryAgent on_enter called")
        await self.session.generate_reply()

To reproduce:

* Start the agent with uv run python main.py console
* Say "switch" — the LLM should call both save_data and switch_to_secondary in the same generation
* Observe that SecondaryAgent on_enter called never prints
* The agent goes silent. Subsequent messages keep triggering the same two tool calls in a loop

**Key condition:** The bug only triggers when the LLM batches the handoff tool with another tool that returns a non-None value in the **same generation**. If switch_to_secondary is the only tool called (or all other tools return None), the handoff works correctly.
```

### Operating System

Linux

### Models Used

Deepgram Nova-3, Gemini 2.5 Flash (via vertex), Elevenlabs TTS

### Package Versions

```bash
livekit-agent: 1.4.3
```

### Session/Room/Call IDs

roomName: "conv-bd37f2b5-74e9-4543-8100-8b6b83eb4156"
session ID: "RM_9eLXjPWZbyfi"

### Proposed Solution

When `_handoff_required` is `True`, skip the follow-up `_pipeline_reply_task`. The old agent is being replaced — there's no point sending tool results back to its LLM:


```python
if fnc_executed_ev._reply_required and not fnc_executed_ev._handoff_required:
    # Only create follow-up reply task if we're NOT handing off
    ...
```

Alternatively, add the tool messages to chat context (so they're preserved for the new agent) but don't trigger a new LLM generation on the old agent.


### Additional Context

_No response_

### Screenshots and Recordings

_No response_

Span	Time	Key attributes
`llm_node`	24.164–24.827	model=`gemini-2.5-flash`, output_tokens=33, response.text=`""`, response.function_calls=2 calls: `validate_and_save_field(isPlanTerminated=false)` + `switch_assistant(get-info-assistant-product-coverage)`
`function_tool`	24.820–25.031	name=`validate_and_save_field`, output=`"Field isPlanTerminated saved successfully with value False"`, is_error=false
`function_tool`	24.822–24.824	name=`switch_assistant`, output=`""`, is_error=false
`tts_node`	24.184–24.834	input_text=`""` (nothing to speak)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent handoff fails when tool returning Agent runs in parallel with reply-producing tool (pipeline path) #5150

Bug Description

Observed Trace Data

Root Cause

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Span	Time	Key attributes
`llm_node`	25.033–25.446	model=`gemini-2.5-flash`, output_tokens=0, response.text=`""`, response.function_calls=`[]`
`tts_node`	25.055–25.470	input_text=`""` (nothing to speak)

Agent handoff fails when tool returning Agent runs in parallel with reply-producing tool (pipeline path) #5150

Description

Bug Description

Observed Trace Data

Root Cause

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions