Summary
AgentEngineSandboxCodeExecutor catches the wrong exception class when attempting to recover from externally-deleted sandboxes, causing sessions to crash instead of silently recreating a new sandbox. Confirmed on ADK v1.27.2; affected code path unchanged through v1.31.1.
Affected file: google/adk/code_executors/agent_engine_sandbox_code_executor.py:103,119 (recovery path around sandboxes.get()).
Relation to other issues: This is a separate, additional bug from the field-name mismatches tracked in #3690. The two can be fixed independently.
Current behavior
When a cached sandbox is externally deleted (TTL expiry, quota recycle, manual cleanup, maintenance), the wrapper tries to detect this via sandboxes.get() and fall back to creating a new sandbox:
# google/adk/code_executors/agent_engine_sandbox_code_executor.py
from google.api_core import exceptions
...
try:
sandbox = self._get_api_client().agent_engines.sandboxes.get(name=sandbox_name)
if sandbox is None or sandbox.state != "STATE_RUNNING":
create_new_sandbox = True
except exceptions.NotFound:
create_new_sandbox = True
The except clause catches google.api_core.exceptions.NotFound. But the sandboxes.get() call through the Vertex SDK raises google.genai.errors.ClientError (wrapping HTTP 404). The two class hierarchies are disjoint:
google.api_core.exceptions.NotFound → api_core.ClientError → ...
google.genai.errors.ClientError → genai.APIError → Exception
Effect: the genai 404 propagates past the except clause, and the session becomes unrecoverable. The next execute_code() call crashes; the session must be manually reset.
Reproducer
# 1. Dispatch one execution; session state now contains sandbox_name.
# 2. Manually delete the sandbox (gcloud or Vertex UI) OR wait for TTL expiry.
# 3. Dispatch another execution in the same session.
# Observed: google.genai.errors.ClientError propagates; session is dead.
# Expected: sandbox silently recreated; execution proceeds normally.
Proposed fix
Add a parallel except clause that catches the genai exception class and checks for 404:
from google.api_core import exceptions as api_core_exc
from google.genai import errors as genai_errors
...
try:
sandbox = self._get_api_client().agent_engines.sandboxes.get(name=sandbox_name)
if sandbox is None or sandbox.state != "STATE_RUNNING":
create_new_sandbox = True
except api_core_exc.NotFound:
create_new_sandbox = True
except genai_errors.ClientError as exc:
status = getattr(exc, "code", None) or getattr(exc, "status_code", None)
if status == 404 or "NOT_FOUND" in str(exc):
create_new_sandbox = True
else:
raise
Reference workaround
utils/sandbox_executor_patched.py#L110-L123 — PatchedAgentEngineSandboxCodeExecutor subclass in continuous production use since April 2026 with no observed regressions.
Environment
- ADK: v1.27.2 (reproduces on v1.28.x-v1.31.1 — affected paths unchanged)
google-genai: 1.x
google-cloud-aiplatform / vertexai: 1.x
- Python: 3.13
- Agent Engine resource: any
projects/.../reasoningEngines/... parent
Impact
Any long-running session whose sandbox is deleted externally becomes unrecoverable. In production multi-agent workflows, sandbox deletions happen routinely via TTL expiry; this bug turns a transient, recoverable condition into a permanent session failure.
Summary
AgentEngineSandboxCodeExecutorcatches the wrong exception class when attempting to recover from externally-deleted sandboxes, causing sessions to crash instead of silently recreating a new sandbox. Confirmed on ADK v1.27.2; affected code path unchanged through v1.31.1.Affected file:
google/adk/code_executors/agent_engine_sandbox_code_executor.py:103,119(recovery path aroundsandboxes.get()).Relation to other issues: This is a separate, additional bug from the field-name mismatches tracked in #3690. The two can be fixed independently.
Current behavior
When a cached sandbox is externally deleted (TTL expiry, quota recycle, manual cleanup, maintenance), the wrapper tries to detect this via
sandboxes.get()and fall back to creating a new sandbox:The
exceptclause catchesgoogle.api_core.exceptions.NotFound. But thesandboxes.get()call through the Vertex SDK raisesgoogle.genai.errors.ClientError(wrapping HTTP 404). The two class hierarchies are disjoint:Effect: the genai 404 propagates past the
exceptclause, and the session becomes unrecoverable. The nextexecute_code()call crashes; the session must be manually reset.Reproducer
Proposed fix
Add a parallel
exceptclause that catches the genai exception class and checks for 404:Reference workaround
utils/sandbox_executor_patched.py#L110-L123 —
PatchedAgentEngineSandboxCodeExecutorsubclass in continuous production use since April 2026 with no observed regressions.Environment
google-genai: 1.xgoogle-cloud-aiplatform/vertexai: 1.xprojects/.../reasoningEngines/...parentImpact
Any long-running session whose sandbox is deleted externally becomes unrecoverable. In production multi-agent workflows, sandbox deletions happen routinely via TTL expiry; this bug turns a transient, recoverable condition into a permanent session failure.