Environment
google-cloud-aiplatform (vertexai SDK)
vertexai/agent_engines/templates/adk.py
- Agent Runtime with
identity_type=AGENT_IDENTITY
- ADK >= 1.17.0
Description
Problem
The ADK template's streaming_agent_run_with_events and async_stream_query methods call _force_flush_otel() in their finally blocks on every request. This triggers a synchronous BatchSpanProcessor.force_flush() that blocks the response stream until all queued spans are exported to telemetry.googleapis.com.
In Agent Identity environments where the export uses certificate-bound tokens (mTLS via SPIFFE/WIF), the authentication overhead is significantly higher than standard OAuth, and the blocking duration becomes noticeable to end users. Setting GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false eliminates the overhead, confirming the flush is the sole cause.
Root Cause
_force_flush_otel() in adk.py calls tracer_provider.force_flush() via asyncio.to_thread(), which invokes BatchSpanProcessor._export(BatchExportStrategy.EXPORT_ALL) — a blocking synchronous export. The default OTEL_BSP_EXPORT_TIMEOUT is 30,000ms, so the flush can block for up to 30 seconds depending on the mTLS authentication latency.
The comment on the call site reads:
# Avoid telemetry data loss having to do with CPU throttling on instance turndown
This concern is valid, but per-request flush is the wrong mechanism for it. BatchSpanProcessor already exports asynchronously at a configurable interval (default 5s). Telemetry data loss on shutdown should be handled by a shutdown hook, not by blocking every request.
Reproduction
- Deploy an ADK agent to Agent Runtime with
identity_type=AGENT_IDENTITY
- Ensure telemetry is enabled (
GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true, or leave at default for ADK >= 1.17)
- Send a streaming query
- Observe a significant gap between agent execution completion and response delivery
- Set
GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false and repeat — the gap disappears
Expected Behavior
The OTel flush should not block the request path. BatchSpanProcessor is designed to export spans asynchronously in background threads. Per-request force_flush() negates this design and introduces user-facing latency proportional to the telemetry export duration.
When running the same agent on Cloud Run with ADK Runner.run_async() directly (bypassing the ADK template wrapper), no such overhead exists — the agent execution time is the same, but the framework-level gap is negligible.
Code References
Call sites (per-request flush in finally blocks):
streaming_agent_run_with_events
async_stream_query
Flush implementation:
Workaround
Set GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false in the Agent Runtime deployment's env_vars. This disables Cloud Trace export but does not affect BigQuery Agent Analytics Plugin (which uses its own data path).
Environment
google-cloud-aiplatform(vertexai SDK)vertexai/agent_engines/templates/adk.pyidentity_type=AGENT_IDENTITYDescription
Problem
The ADK template's
streaming_agent_run_with_eventsandasync_stream_querymethods call_force_flush_otel()in theirfinallyblocks on every request. This triggers a synchronousBatchSpanProcessor.force_flush()that blocks the response stream until all queued spans are exported totelemetry.googleapis.com.In Agent Identity environments where the export uses certificate-bound tokens (mTLS via SPIFFE/WIF), the authentication overhead is significantly higher than standard OAuth, and the blocking duration becomes noticeable to end users. Setting
GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=falseeliminates the overhead, confirming the flush is the sole cause.Root Cause
_force_flush_otel()inadk.pycallstracer_provider.force_flush()viaasyncio.to_thread(), which invokesBatchSpanProcessor._export(BatchExportStrategy.EXPORT_ALL)— a blocking synchronous export. The defaultOTEL_BSP_EXPORT_TIMEOUTis 30,000ms, so the flush can block for up to 30 seconds depending on the mTLS authentication latency.The comment on the call site reads:
This concern is valid, but per-request flush is the wrong mechanism for it.
BatchSpanProcessoralready exports asynchronously at a configurable interval (default 5s). Telemetry data loss on shutdown should be handled by a shutdown hook, not by blocking every request.Reproduction
identity_type=AGENT_IDENTITYGOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true, or leave at default for ADK >= 1.17)GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=falseand repeat — the gap disappearsExpected Behavior
The OTel flush should not block the request path.
BatchSpanProcessoris designed to export spans asynchronously in background threads. Per-requestforce_flush()negates this design and introduces user-facing latency proportional to the telemetry export duration.When running the same agent on Cloud Run with ADK
Runner.run_async()directly (bypassing the ADK template wrapper), no such overhead exists — the agent execution time is the same, but the framework-level gap is negligible.Code References
Call sites (per-request flush in
finallyblocks):streaming_agent_run_with_eventsasync_stream_queryFlush implementation:
_force_flush_otelWorkaround
Set
GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=falsein the Agent Runtime deployment'senv_vars. This disables Cloud Trace export but does not affect BigQuery Agent Analytics Plugin (which uses its own data path).