Sub-agent with sequential LRO tools fails to resume

## 🔴 Required Information

**Describe the Bug:**

When a sub-agent has multiple LRO tools that are called sequentially, the first HITL pause/resume works correctly, but the **second resumption fails** — the runner early-exits with no LLM content. Two independent bugs combine to cause this.

**Steps to Reproduce:**

1. Create a resumable app with an orchestrator `LlmAgent` that delegates to a sub-agent
2. The sub-agent has two `LongRunningFunctionTool`s (e.g., a picker and a confirmation step)
3. User triggers the first LRO tool → agent pauses → user responds → **agent resumes correctly**
4. Agent calls the second LRO tool → agent pauses → user responds → **agent fails to resume**

See **Minimal Reproduction Code** below for a copy-paste-runnable script.

**Expected Behavior:**

Both resumptions should work identically — the sub-agent receives the tool response, the LLM is called, and it generates text or further tool calls.

**Observed Behavior:**

The first resumption works. The second resumption produces no LLM-generated content — the agent silently stops.

**Environment Details:**

- ADK Library Version: 1.30.0 (bug present in 1.27.0–1.30.0, worked in 1.26.0)
- Desktop OS: macOS
- Python Version: 3.13+

**Model Information:**

- Are you using LiteLLM: No
- Which model is being used: gemini-2.0-flash (not model-dependent — bug is in the runner/agent framework)

---

## 🟡 Optional Information

**Regression:**

Yes — worked in ADK **1.26.0**, broken since **1.27.0** (the `_resolve_invocation_id` rewrite).

In 1.26.0, if no `invocation_id` was passed, the runner always created a new invocation — no stale flags to trip over. In 1.27.0, `_resolve_invocation_id` (`runners.py:356-383`) auto-infers `invocation_id` from the `FunctionResponse` by searching `session.events` for the matching `FunctionCall`. This forces the resumed-invocation path, which replays stale flags from the previous pause.

**Additional Context:**

### Root cause

The first resumption works because the orchestrator's initial run exits via `should_pause` (line 494), which does **not** set `end_of_agent`. The problems appear on the second resumption, when the orchestrator enters the sub-agent resume block (line 474-483) for the first time.

**Bug 1 — `should_pause_invocation` ignores existing responses**

`should_pause_invocation` (`invocation_context.py`) returns `True` if any `FunctionCall` ID is in `long_running_tool_ids` — without checking whether a `FunctionResponse` already exists. On the second resumption, the already-answered first LRO still triggers the pause guard in `base_llm_flow.py:838-851`, and the sub-agent's LLM flow exits immediately.

**Bug 2 — Premature `end_of_agent` on the orchestrator**

After the first resumption, the orchestrator enters the sub-agent resume block (line 474-483) and unconditionally sets `end_of_agent=True` — even when the sub-agent only *paused* for the second LRO (not truly completed). On the second resumption, `populate_invocation_agent_states` replays this stale flag and the runner early-exits at `runners.py:597`.

### Trigger conditions

All three are required:

1. `ResumabilityConfig(is_resumable=True)`
2. Root `LlmAgent` delegates to a sub-agent that has LRO tools
3. The sub-agent calls LRO tools sequentially (two or more pause/resume cycles in the same invocation)

### Suggested fix

**Bug 1** is addressed by PR #5072 (`has_unresolved_long_running_tool_calls`), open since 2026-03-30. The fix below is only for **Bug 2**, which #5072 does not cover.

**Bug 2** — The orchestrator should only set `end_of_agent=True` when the sub-agent truly completed, not when it paused for another LRO:

<details>
<summary>Diff for <code>llm_agent.py</code> (sub-agent resume block, ~line 474)</summary>

```diff
   if agent_state is not None and (
       agent_to_transfer := self._get_subagent_to_resume(ctx)
   ):
+    sub_agent_paused = False
     async with Aclosing(agent_to_transfer.run_async(ctx)) as agen:
       async for event in agen:
+        # Requires Bug 1 fix — should_pause_invocation must be response-aware
+        if ctx.should_pause_invocation(event):
+          sub_agent_paused = True
         yield event

-    ctx.set_agent_state(self.name, end_of_agent=True)
-    yield self._create_agent_state_event(ctx)
+    if not sub_agent_paused:
+      ctx.set_agent_state(self.name, end_of_agent=True)
+      yield self._create_agent_state_event(ctx)
     return
```

</details>

Backward-compatible — non-LRO sub-agent completions still set the flag exactly as before. The check is skipped entirely when `is_resumable` is `False` or when the root agent calls LRO tools directly (no sub-agent delegation), so existing setups are unaffected.

Verified: with both #5072 (Bug 1) and this fix (Bug 2) applied to ADK 1.30.0, all sequential resumptions work correctly.

**Minimal Reproduction Code:**

```python
import asyncio
from google.adk.agents import LlmAgent
from google.adk.apps import App, ResumabilityConfig
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.tools import LongRunningFunctionTool
from google.genai import types


async def select_item(tool_context) -> dict:
    """Step 1: user picks an item (blocks for user input)."""
    return {"status": "pending"}


async def confirm_choice(tool_context) -> dict:
    """Step 2: user confirms the choice (blocks for user input)."""
    return {"status": "pending"}


sub_agent = LlmAgent(
    name="picker",
    model="gemini-2.0-flash",
    instruction=(
        "First call select_item to let the user pick. "
        "After they respond, call confirm_choice to confirm. "
        "After both are done, summarize what was picked and confirmed."
    ),
    tools=[
        LongRunningFunctionTool(func=select_item),
        LongRunningFunctionTool(func=confirm_choice),
    ],
)
root_agent = LlmAgent(
    name="orchestrator",
    model="gemini-2.0-flash",
    instruction="Delegate to the picker agent.",
    sub_agents=[sub_agent],
)
app = App(
    name="repro",
    root_agent=root_agent,
    resumability_config=ResumabilityConfig(is_resumable=True),
)
session_service = InMemorySessionService()
runner = Runner(app=app, session_service=session_service)


async def main():
    session = await session_service.create_session(app_name="repro", user_id="u")

    # Step 1: agent delegates → picker calls select_item → pauses for user input
    step1_events = []
    async for event in runner.run_async(
        user_id="u",
        session_id=session.id,
        new_message=types.Content(
            role="user", parts=[types.Part(text="Pick something")]
        ),
    ):
        step1_events.append(event)

    fc1_id = next(
        fc.id
        for e in step1_events
        for fc in e.get_function_calls()
        if fc.name == "select_item" and e.long_running_tool_ids
    )

    # Step 2: user responds → picker resumes → calls confirm_choice → pauses again
    step2_events = []
    async for event in runner.run_async(
        user_id="u",
        session_id=session.id,
        new_message=types.Content(
            role="user",
            parts=[
                types.Part(
                    function_response=types.FunctionResponse(
                        id=fc1_id,
                        name="select_item",
                        response={"result": "option_a"},
                    )
                )
            ],
        ),
    ):
        step2_events.append(event)

    fc2_id = next(
        fc.id
        for e in step2_events
        for fc in e.get_function_calls()
        if fc.name == "confirm_choice" and e.long_running_tool_ids
    )

    # Step 3: user confirms → BUG: no LLM content, agent silently stops
    step3_events = []
    async for event in runner.run_async(
        user_id="u",
        session_id=session.id,
        new_message=types.Content(
            role="user",
            parts=[
                types.Part(
                    function_response=types.FunctionResponse(
                        id=fc2_id,
                        name="confirm_choice",
                        response={"confirmed": True},
                    )
                )
            ],
        ),
    ):
        step3_events.append(event)

    has_llm_content = any(
        e.get_function_calls()
        or (
            e.content
            and e.content.parts
            and any(p.text for p in e.content.parts)
        )
        for e in step3_events
        if e.author != "user"
    )
    assert has_llm_content, (
        f"BUG: Step 3 produced {len(step3_events)} events but no LLM content"
    )


asyncio.run(main())
```

**How often has this issue occurred?:**

- Always (100%) — deterministic with sequential LRO tools in a sub-agent

**Related Issues:**

- **#5064** — Covers Bug 1. **PR #5072** fixes it with `has_unresolved_long_running_tool_calls()`. Verified that #5072 alone is **not sufficient** — Bug 2 must also be fixed (different code path at ~line 474, the sub-agent resume block, not the early-return at line 496 that #5072 modifies).
- **#3986** — Related: `FunctionResponse` routing bypasses custom agent orchestration.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-agent with sequential LRO tools fails to resume #5349

🔴 Required Information

🟡 Optional Information

Root cause

Trigger conditions

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sub-agent with sequential LRO tools fails to resume #5349

Description

🔴 Required Information

🟡 Optional Information

Root cause

Trigger conditions

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions