Skip to content

Python: read response headers defensively to support stream wrappers without .headers (#6028)#6029

Open
EmilienMottet wants to merge 1 commit into
microsoft:mainfrom
EmilienMottet:fix/openai-asyncstreamwrapper-headers-attr
Open

Python: read response headers defensively to support stream wrappers without .headers (#6028)#6029
EmilienMottet wants to merge 1 commit into
microsoft:mainfrom
EmilienMottet:fix/openai-asyncstreamwrapper-headers-attr

Conversation

@EmilienMottet
Copy link
Copy Markdown

Summary

Fixes #6028.

Since #5910, OpenAIChatClient._inner_get_response() reads .headers on the raw response returned by client.responses.with_raw_response.create/retrieve(...) to surface the x-ms-served-model Azure header. When azure-ai-projects experimental GenAI tracing is enabled (AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true), the instrumentor wraps the raw streaming response in an AsyncStreamWrapper (defined inline at azure/ai/projects/telemetry/_responses_instrumentor.py:2929) that exposes .response / .stream_async_iter but not .headers. Reading raw_create_response.headers then raises:

AttributeError: 'AsyncStreamWrapper' object has no attribute 'headers'

FoundryChatClient rethrows this as a ChatClientException and every streaming call breaks (workflows + free chat) as soon as you upgrade to agent-framework-openai>=1.6.0 with experimental Azure tracing on.

Fix

Read headers defensively at all four call sites in agent_framework_openai/_chat_client.py:

served_model = self._extract_served_model(getattr(raw_response, "headers", None))

_extract_served_model() (line 739) already short-circuits on None, so the served-model surfacing degrades gracefullyupdate.model falls back to the deployment alias instead of crashing the call. No behavior change when .headers is present.

The four call sites covered:

  • _chat_client.py:639raw_stream_response (retrieve-streaming, continuation token)
  • _chat_client.py:680raw_create_response (streaming create)
  • _chat_client.py:709raw_response (non-streaming retrieve)
  • _chat_client.py:731raw_response (non-streaming create)

Test plan

  • New regression test test_streaming_response_without_headers_attribute_does_not_crash simulates a stream wrapper that raises AttributeError on .headers, mirroring the azure-ai-projects instrumentor behavior. Asserts the stream completes and update.model falls back to the deployment alias.
  • Manual repro in M365 Agents Playground against gpt-5.1-dzs and gpt-5.5-dzs Azure deployments confirms the crash disappears with this patch (validated independently of any local workarounds).
  • Existing tests test_served_model_header_* already cover the happy path (headers present) and are not touched by this change.

Notes for reviewers

  • The defensive getattr is preferred over a try/except AttributeError block to avoid swallowing unrelated attribute errors that might surface from response model changes.
  • An alternative would be to make the instrumentor wrapper expose .headers, but that lives in azure-ai-projects (separate repo) and only this side can land quickly to unblock users on the 1.6.0 train.
  • Documented as experimental by Microsoft: "GenAI tracing instrumentation is an experimental preview feature" — so silently dropping the served-model surfacing in this niche is consistent with the broader contract.

@EmilienMottet EmilienMottet force-pushed the fix/openai-asyncstreamwrapper-headers-attr branch from b194ac4 to 758e1a2 Compare May 22, 2026 11:16
@EmilienMottet
Copy link
Copy Markdown
Author

EmilienMottet commented May 22, 2026

@microsoft-github-policy-service agree company=Michelin

….headers` (microsoft#6028)

`OpenAIChatClient._inner_get_response()` reads `.headers` on the raw streaming
response returned by `client.responses.with_raw_response.create(stream=True)`
(and its three sibling call sites - retrieve-streaming, non-streaming create
and background retrieve) to surface the `x-ms-served-model` Azure header,
introduced in microsoft#5910.

When `azure-ai-projects>=2.1.0` experimental GenAI tracing is enabled
(`AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true`), the instrumentor wraps the
raw streaming response in an inline `AsyncStreamWrapper` that exposes
`.response` but not `.headers`. Reading `raw_create_response.headers` then
raises `AttributeError: 'AsyncStreamWrapper' object has no attribute 'headers'`,
which `FoundryChatClient` rethrows as a `ChatClientException` and breaks every
streaming call (workflows and free chat).

Fix: read the header dict via `getattr(raw_response, "headers", None)` at all
four call sites. `_extract_served_model()` already short-circuits on `None`,
so the served-model surfacing degrades gracefully (model stays the deployment
alias) instead of crashing when the response is wrapped by an instrumentor
that does not proxy `.headers`.

Regression test added:
`test_streaming_response_without_headers_attribute_does_not_crash`
simulates a stream wrapper that raises `AttributeError` on `.headers` and
asserts the stream still completes with the deployment alias as `update.model`.

Fixes microsoft#6028
@EmilienMottet EmilienMottet force-pushed the fix/openai-asyncstreamwrapper-headers-attr branch from 758e1a2 to ad7b9ff Compare May 22, 2026 12:25
Copy link
Copy Markdown

@jluocsa jluocsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. The getattr(raw_response, "headers", None) choice is correct, and your description-rationale for preferring it over try/except AttributeError (so unrelated attribute errors still surface) is the right call. The _StreamWrapperWithoutHeaders mock is faithful to the real azure.ai.projects.telemetry._responses_instrumentor.AsyncStreamWrapper — the explicit assert not hasattr(headerless_stream, "headers") sanity check is a nice belt-and-suspenders touch that will catch regressions if anyone later switches the production code to a try/except Exception: and accidentally swallows unrelated errors.

Two non-blocking thoughts you might consider for a small follow-up — happy if neither lands:

1. Hoist the defensive read so the rationale lives in one place

Right now four call sites repeat the same pattern, with three of them carrying a # See note above on raw_stream_response.headers. reference back to the first site. If a later refactor moves or splits the first comment, the other three orphan. A tiny helper would DRY this up:

def _safe_extract_served_model(self, raw_response: object) -> str | None:
    """Read ``x-ms-served-model`` defensively. Telemetry instrumentors
    (e.g. ``azure-ai-projects`` experimental tracing) wrap streaming
    responses in objects that don't proxy ``.headers``; degrade gracefully.
    """
    return self._extract_served_model(getattr(raw_response, "headers", None))

Each call site then collapses back to one line with no inline comment needed:

served_model = self._safe_extract_served_model(raw_stream_response)

Trade-off: a small extra method on the class vs. single-source-of-truth for the defensive read. If _extract_served_model(headers) is intentionally kept narrow ("headers mapping → str | None", no knowledge of response objects), the current inline approach is also defensible — purely your call.

2. Make the test assertion's intent explicit

assert update.model == "test-model" near the end of test_streaming_response_without_headers_attribute_does_not_crash proves the graceful-degradation path (deployment alias preserved when the served-model header is missing), not just that the stream doesn't crash. Worth one line of comment so a future reader doesn't mistake it for a plain "model name round-trip" assertion:

for update in updates:
    # No header → no override → model stays the deployment alias (graceful degradation, not crash).
    assert update.model == "test-model"

LGTM either way — this unblocks 1.6.0 users on experimental Azure tracing today, the test is a real regression test (not a smoke test), and the design rationale for getattr vs try/except in the PR description is exactly right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: FoundryChatClient crashes with AsyncStreamWrapper has no attribute 'headers' when azure-ai-projects experimental tracing is enabled

4 participants