π΄ Required Information
Describe the Bug:
When a sub-agent has multiple LRO tools that are called sequentially, the first HITL pause/resume works correctly, but the second resumption fails β the runner early-exits with no LLM content. Two independent bugs combine to cause this.
Steps to Reproduce:
- Create a resumable app with an orchestrator
LlmAgent that delegates to a sub-agent
- The sub-agent has two
LongRunningFunctionTools (e.g., a picker and a confirmation step)
- User triggers the first LRO tool β agent pauses β user responds β agent resumes correctly
- Agent calls the second LRO tool β agent pauses β user responds β agent fails to resume
See Minimal Reproduction Code below for a copy-paste-runnable script.
Expected Behavior:
Both resumptions should work identically β the sub-agent receives the tool response, the LLM is called, and it generates text or further tool calls.
Observed Behavior:
The first resumption works. The second resumption produces no LLM-generated content β the agent silently stops.
Environment Details:
- ADK Library Version: 1.30.0 (bug present in 1.27.0β1.30.0, worked in 1.26.0)
- Desktop OS: macOS
- Python Version: 3.13+
Model Information:
- Are you using LiteLLM: No
- Which model is being used: gemini-2.0-flash (not model-dependent β bug is in the runner/agent framework)
π‘ Optional Information
Regression:
Yes β worked in ADK 1.26.0, broken since 1.27.0 (the _resolve_invocation_id rewrite).
In 1.26.0, if no invocation_id was passed, the runner always created a new invocation β no stale flags to trip over. In 1.27.0, _resolve_invocation_id (runners.py:356-383) auto-infers invocation_id from the FunctionResponse by searching session.events for the matching FunctionCall. This forces the resumed-invocation path, which replays stale flags from the previous pause.
Additional Context:
Root cause
The first resumption works because the orchestrator's initial run exits via should_pause (line 494), which does not set end_of_agent. The problems appear on the second resumption, when the orchestrator enters the sub-agent resume block (line 474-483) for the first time.
Bug 1 β should_pause_invocation ignores existing responses
should_pause_invocation (invocation_context.py) returns True if any FunctionCall ID is in long_running_tool_ids β without checking whether a FunctionResponse already exists. On the second resumption, the already-answered first LRO still triggers the pause guard in base_llm_flow.py:838-851, and the sub-agent's LLM flow exits immediately.
Bug 2 β Premature end_of_agent on the orchestrator
After the first resumption, the orchestrator enters the sub-agent resume block (line 474-483) and unconditionally sets end_of_agent=True β even when the sub-agent only paused for the second LRO (not truly completed). On the second resumption, populate_invocation_agent_states replays this stale flag and the runner early-exits at runners.py:597.
Trigger conditions
All three are required:
ResumabilityConfig(is_resumable=True)
- Root
LlmAgent delegates to a sub-agent that has LRO tools
- The sub-agent calls LRO tools sequentially (two or more pause/resume cycles in the same invocation)
Suggested fix
Bug 1 is addressed by PR #5072 (has_unresolved_long_running_tool_calls), open since 2026-03-30. The fix below is only for Bug 2, which #5072 does not cover.
Bug 2 β The orchestrator should only set end_of_agent=True when the sub-agent truly completed, not when it paused for another LRO:
Diff for llm_agent.py (sub-agent resume block, ~line 474)
if agent_state is not None and (
agent_to_transfer := self._get_subagent_to_resume(ctx)
):
+ sub_agent_paused = False
async with Aclosing(agent_to_transfer.run_async(ctx)) as agen:
async for event in agen:
+ # Requires Bug 1 fix β should_pause_invocation must be response-aware
+ if ctx.should_pause_invocation(event):
+ sub_agent_paused = True
yield event
- ctx.set_agent_state(self.name, end_of_agent=True)
- yield self._create_agent_state_event(ctx)
+ if not sub_agent_paused:
+ ctx.set_agent_state(self.name, end_of_agent=True)
+ yield self._create_agent_state_event(ctx)
return
Backward-compatible β non-LRO sub-agent completions still set the flag exactly as before. The check is skipped entirely when is_resumable is False or when the root agent calls LRO tools directly (no sub-agent delegation), so existing setups are unaffected.
Verified: with both #5072 (Bug 1) and this fix (Bug 2) applied to ADK 1.30.0, all sequential resumptions work correctly.
Minimal Reproduction Code:
import asyncio
from google.adk.agents import LlmAgent
from google.adk.apps import App, ResumabilityConfig
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.tools import LongRunningFunctionTool
from google.genai import types
async def select_item(tool_context) -> dict:
"""Step 1: user picks an item (blocks for user input)."""
return {"status": "pending"}
async def confirm_choice(tool_context) -> dict:
"""Step 2: user confirms the choice (blocks for user input)."""
return {"status": "pending"}
sub_agent = LlmAgent(
name="picker",
model="gemini-2.0-flash",
instruction=(
"First call select_item to let the user pick. "
"After they respond, call confirm_choice to confirm. "
"After both are done, summarize what was picked and confirmed."
),
tools=[
LongRunningFunctionTool(func=select_item),
LongRunningFunctionTool(func=confirm_choice),
],
)
root_agent = LlmAgent(
name="orchestrator",
model="gemini-2.0-flash",
instruction="Delegate to the picker agent.",
sub_agents=[sub_agent],
)
app = App(
name="repro",
root_agent=root_agent,
resumability_config=ResumabilityConfig(is_resumable=True),
)
session_service = InMemorySessionService()
runner = Runner(app=app, session_service=session_service)
async def main():
session = await session_service.create_session(app_name="repro", user_id="u")
# Step 1: agent delegates β picker calls select_item β pauses for user input
step1_events = []
async for event in runner.run_async(
user_id="u",
session_id=session.id,
new_message=types.Content(
role="user", parts=[types.Part(text="Pick something")]
),
):
step1_events.append(event)
fc1_id = next(
fc.id
for e in step1_events
for fc in e.get_function_calls()
if fc.name == "select_item" and e.long_running_tool_ids
)
# Step 2: user responds β picker resumes β calls confirm_choice β pauses again
step2_events = []
async for event in runner.run_async(
user_id="u",
session_id=session.id,
new_message=types.Content(
role="user",
parts=[
types.Part(
function_response=types.FunctionResponse(
id=fc1_id,
name="select_item",
response={"result": "option_a"},
)
)
],
),
):
step2_events.append(event)
fc2_id = next(
fc.id
for e in step2_events
for fc in e.get_function_calls()
if fc.name == "confirm_choice" and e.long_running_tool_ids
)
# Step 3: user confirms β BUG: no LLM content, agent silently stops
step3_events = []
async for event in runner.run_async(
user_id="u",
session_id=session.id,
new_message=types.Content(
role="user",
parts=[
types.Part(
function_response=types.FunctionResponse(
id=fc2_id,
name="confirm_choice",
response={"confirmed": True},
)
)
],
),
):
step3_events.append(event)
has_llm_content = any(
e.get_function_calls()
or (
e.content
and e.content.parts
and any(p.text for p in e.content.parts)
)
for e in step3_events
if e.author != "user"
)
assert has_llm_content, (
f"BUG: Step 3 produced {len(step3_events)} events but no LLM content"
)
asyncio.run(main())
How often has this issue occurred?:
- Always (100%) β deterministic with sequential LRO tools in a sub-agent
Related Issues:
π΄ Required Information
Describe the Bug:
When a sub-agent has multiple LRO tools that are called sequentially, the first HITL pause/resume works correctly, but the second resumption fails β the runner early-exits with no LLM content. Two independent bugs combine to cause this.
Steps to Reproduce:
LlmAgentthat delegates to a sub-agentLongRunningFunctionTools (e.g., a picker and a confirmation step)See Minimal Reproduction Code below for a copy-paste-runnable script.
Expected Behavior:
Both resumptions should work identically β the sub-agent receives the tool response, the LLM is called, and it generates text or further tool calls.
Observed Behavior:
The first resumption works. The second resumption produces no LLM-generated content β the agent silently stops.
Environment Details:
Model Information:
π‘ Optional Information
Regression:
Yes β worked in ADK 1.26.0, broken since 1.27.0 (the
_resolve_invocation_idrewrite).In 1.26.0, if no
invocation_idwas passed, the runner always created a new invocation β no stale flags to trip over. In 1.27.0,_resolve_invocation_id(runners.py:356-383) auto-infersinvocation_idfrom theFunctionResponseby searchingsession.eventsfor the matchingFunctionCall. This forces the resumed-invocation path, which replays stale flags from the previous pause.Additional Context:
Root cause
The first resumption works because the orchestrator's initial run exits via
should_pause(line 494), which does not setend_of_agent. The problems appear on the second resumption, when the orchestrator enters the sub-agent resume block (line 474-483) for the first time.Bug 1 β
should_pause_invocationignores existing responsesshould_pause_invocation(invocation_context.py) returnsTrueif anyFunctionCallID is inlong_running_tool_idsβ without checking whether aFunctionResponsealready exists. On the second resumption, the already-answered first LRO still triggers the pause guard inbase_llm_flow.py:838-851, and the sub-agent's LLM flow exits immediately.Bug 2 β Premature
end_of_agenton the orchestratorAfter the first resumption, the orchestrator enters the sub-agent resume block (line 474-483) and unconditionally sets
end_of_agent=Trueβ even when the sub-agent only paused for the second LRO (not truly completed). On the second resumption,populate_invocation_agent_statesreplays this stale flag and the runner early-exits atrunners.py:597.Trigger conditions
All three are required:
ResumabilityConfig(is_resumable=True)LlmAgentdelegates to a sub-agent that has LRO toolsSuggested fix
Bug 1 is addressed by PR #5072 (
has_unresolved_long_running_tool_calls), open since 2026-03-30. The fix below is only for Bug 2, which #5072 does not cover.Bug 2 β The orchestrator should only set
end_of_agent=Truewhen the sub-agent truly completed, not when it paused for another LRO:Diff for
llm_agent.py(sub-agent resume block, ~line 474)if agent_state is not None and ( agent_to_transfer := self._get_subagent_to_resume(ctx) ): + sub_agent_paused = False async with Aclosing(agent_to_transfer.run_async(ctx)) as agen: async for event in agen: + # Requires Bug 1 fix β should_pause_invocation must be response-aware + if ctx.should_pause_invocation(event): + sub_agent_paused = True yield event - ctx.set_agent_state(self.name, end_of_agent=True) - yield self._create_agent_state_event(ctx) + if not sub_agent_paused: + ctx.set_agent_state(self.name, end_of_agent=True) + yield self._create_agent_state_event(ctx) returnBackward-compatible β non-LRO sub-agent completions still set the flag exactly as before. The check is skipped entirely when
is_resumableisFalseor when the root agent calls LRO tools directly (no sub-agent delegation), so existing setups are unaffected.Verified: with both #5072 (Bug 1) and this fix (Bug 2) applied to ADK 1.30.0, all sequential resumptions work correctly.
Minimal Reproduction Code:
How often has this issue occurred?:
Related Issues:
has_unresolved_long_running_tool_calls(). Verified that fix(flows): resume long-running tools after matching responsesΒ #5072 alone is not sufficient β Bug 2 must also be fixed (different code path at ~line 474, the sub-agent resume block, not the early-return at line 496 that fix(flows): resume long-running tools after matching responsesΒ #5072 modifies).FunctionResponserouting bypasses custom agent orchestration.