Skip to content

Per-request _force_flush_otel() in ADK template blocks streaming responses #6807

@yshimojo

Description

@yshimojo

Environment

  • google-cloud-aiplatform (vertexai SDK)
  • vertexai/agent_engines/templates/adk.py
  • Agent Runtime with identity_type=AGENT_IDENTITY
  • ADK >= 1.17.0

Description

Problem

The ADK template's streaming_agent_run_with_events and async_stream_query methods call _force_flush_otel() in their finally blocks on every request. This triggers a synchronous BatchSpanProcessor.force_flush() that blocks the response stream until all queued spans are exported to telemetry.googleapis.com.

In Agent Identity environments where the export uses certificate-bound tokens (mTLS via SPIFFE/WIF), the authentication overhead is significantly higher than standard OAuth, and the blocking duration becomes noticeable to end users. Setting GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false eliminates the overhead, confirming the flush is the sole cause.

Root Cause

_force_flush_otel() in adk.py calls tracer_provider.force_flush() via asyncio.to_thread(), which invokes BatchSpanProcessor._export(BatchExportStrategy.EXPORT_ALL) — a blocking synchronous export. The default OTEL_BSP_EXPORT_TIMEOUT is 30,000ms, so the flush can block for up to 30 seconds depending on the mTLS authentication latency.

The comment on the call site reads:

# Avoid telemetry data loss having to do with CPU throttling on instance turndown

This concern is valid, but per-request flush is the wrong mechanism for it. BatchSpanProcessor already exports asynchronously at a configurable interval (default 5s). Telemetry data loss on shutdown should be handled by a shutdown hook, not by blocking every request.

Reproduction

  1. Deploy an ADK agent to Agent Runtime with identity_type=AGENT_IDENTITY
  2. Ensure telemetry is enabled (GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true, or leave at default for ADK >= 1.17)
  3. Send a streaming query
  4. Observe a significant gap between agent execution completion and response delivery
  5. Set GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false and repeat — the gap disappears

Expected Behavior

The OTel flush should not block the request path. BatchSpanProcessor is designed to export spans asynchronously in background threads. Per-request force_flush() negates this design and introduces user-facing latency proportional to the telemetry export duration.

When running the same agent on Cloud Run with ADK Runner.run_async() directly (bypassing the ADK template wrapper), no such overhead exists — the agent execution time is the same, but the framework-level gap is negligible.

Code References

Call sites (per-request flush in finally blocks):

  • streaming_agent_run_with_events
  • async_stream_query

Flush implementation:

  • _force_flush_otel

Workaround

Set GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false in the Agent Runtime deployment's env_vars. This disables Cloud Trace export but does not affect BigQuery Agent Analytics Plugin (which uses its own data path).

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: vertex-aiIssues related to the googleapis/python-aiplatform API.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions