Per-request `_force_flush_otel()` in ADK template blocks streaming responses

### Environment

- `google-cloud-aiplatform` (vertexai SDK)
- `vertexai/agent_engines/templates/adk.py`
- Agent Runtime with `identity_type=AGENT_IDENTITY`
- ADK >= 1.17.0

### Description

#### Problem

The ADK template's `streaming_agent_run_with_events` and `async_stream_query` methods call `_force_flush_otel()` in their `finally` blocks on **every request**. This triggers a synchronous `BatchSpanProcessor.force_flush()` that blocks the response stream until all queued spans are exported to `telemetry.googleapis.com`.

In Agent Identity environments where the export uses certificate-bound tokens (mTLS via SPIFFE/WIF), the authentication overhead is significantly higher than standard OAuth, and the blocking duration becomes noticeable to end users. Setting `GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false` eliminates the overhead, confirming the flush is the sole cause.

#### Root Cause

`_force_flush_otel()` in `adk.py` calls `tracer_provider.force_flush()` via `asyncio.to_thread()`, which invokes `BatchSpanProcessor._export(BatchExportStrategy.EXPORT_ALL)` — a blocking synchronous export. The default `OTEL_BSP_EXPORT_TIMEOUT` is 30,000ms, so the flush can block for up to 30 seconds depending on the mTLS authentication latency.

The comment on the call site reads:
```
# Avoid telemetry data loss having to do with CPU throttling on instance turndown
```

This concern is valid, but per-request flush is the wrong mechanism for it. `BatchSpanProcessor` already exports asynchronously at a configurable interval (default 5s). Telemetry data loss on shutdown should be handled by a shutdown hook, not by blocking every request.

#### Reproduction

1. Deploy an ADK agent to Agent Runtime with `identity_type=AGENT_IDENTITY`
2. Ensure telemetry is enabled (`GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=true`, or leave at default for ADK >= 1.17)
3. Send a streaming query
4. Observe a significant gap between agent execution completion and response delivery
5. Set `GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false` and repeat — the gap disappears

#### Expected Behavior

The OTel flush should not block the request path. `BatchSpanProcessor` is designed to export spans asynchronously in background threads. Per-request `force_flush()` negates this design and introduces user-facing latency proportional to the telemetry export duration.

When running the same agent on Cloud Run with ADK `Runner.run_async()` directly (bypassing the ADK template wrapper), no such overhead exists — the agent execution time is the same, but the framework-level gap is negligible.

#### Code References

**Call sites** (per-request flush in `finally` blocks):
- `streaming_agent_run_with_events`
- `async_stream_query`

**Flush implementation**:
- `_force_flush_otel`

### Workaround

Set `GOOGLE_CLOUD_AGENT_ENGINE_ENABLE_TELEMETRY=false` in the Agent Runtime deployment's `env_vars`. This disables Cloud Trace export but does not affect BigQuery Agent Analytics Plugin (which uses its own data path).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-request `_force_flush_otel()` in ADK template blocks streaming responses #6807

Environment

Description

Problem

Root Cause

Reproduction

Expected Behavior

Code References

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Per-request _force_flush_otel() in ADK template blocks streaming responses #6807

Description

Environment

Description

Problem

Root Cause

Reproduction

Expected Behavior

Code References

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Per-request `_force_flush_otel()` in ADK template blocks streaming responses #6807