Confirm this is a feature request for the Python library and not the underlying OpenAI API
- This is a feature request for the Python library
Describe the feature or improvement you're looking for
For Azure OpenAI clients (AzureOpenAI / AsyncAzureOpenAI), please promote the value of the x-ms-served-model response header into Response.model (and into event.response.model for streamed response.* events) when calling the Responses API.
This would bring AzureOpenAI in line with OpenAI's own behavior on every other endpoint.
Background
OpenAI's own (non-Azure) endpoints already return the dated snapshot of the model that served the request, regardless of the alias sent in the request. Empirically (today, via openai 1.x):
Sent model (alias) |
Response.model from OpenAI |
gpt-4o-mini |
gpt-4o-mini-2024-07-18 |
gpt-4o |
gpt-4o-2024-08-06 |
gpt-5-nano |
gpt-5-nano-2025-08-07 |
The same is true on Azure Chat Completions — the body already returns the snapshot — and (as a courtesy) Azure surfaces the deployment alias in x-ms-deployment-name:
[Chat Completions sent=gpt-5-nano]
body.model = 'gpt-5-nano-2025-08-07' ✅ snapshot
x-ms-served-model = None
x-ms-deployment-name = 'gpt-5-nano'
But on the Azure Responses API specifically, the body carries the deployment alias and the snapshot is moved into the x-ms-served-model response header (not exposed by the SDK today):
[Responses sent=gpt-5-nano, non-streaming]
body.model = 'gpt-5-nano' ❌ alias, not snapshot
x-ms-served-model = 'gpt-5-nano-2025-08-07'
x-ms-deployment-name = (not sent on Responses)
[Responses sent=gpt-5-nano, streaming]
body.model (per event)= 'gpt-5-nano' ❌
x-ms-served-model = 'gpt-5-nano-2025-08-07'
So Azure Responses is the outlier among all four code paths.
Additional context
Why this matters
Downstream consumers (telemetry, eval, RAG/agent frameworks, etc.) read Response.model to record which model actually served the request — to attribute traces, costs, regressions, and behavior changes to specific dated snapshots. With the current SDK behavior, Azure Responses users see the deployment alias only, which can mask:
- Auto-rollouts when an alias deployment is bumped from one snapshot to a newer one.
- Spillover routing (
x-ms-is-spilled-over: true) where a different snapshot serves the request.
- Regional differences in which snapshot backs the same alias.
Proposed change (behavioral)
In openai/lib/azure.py, in BaseAzureClient (and async equivalent), add response post-processing such that:
- For
/responses endpoint responses, when the x-ms-served-model response header is present and non-empty, replace Response.model with that value before the SDK returns the parsed object.
- For streaming
responses.create(stream=True, ...) and responses.retrieve(..., stream=True), do the same on every event whose event.response.model is currently set (e.g. response.created, response.in_progress, response.completed, ...) so that the value is consistent across the lifetime of the stream.
- A non-empty / stripped guard on the header keeps the SDK robust to empty values.
We deliberately suggest the behavioral fix (overwrite Response.model) rather than a backward-compatible additive Response.served_model field, because:
- It restores parity with what
Response.model already means on OpenAI's own endpoints and on Azure Chat Completions: the snapshot of the model that actually served the request.
- It avoids forcing every framework / telemetry library to learn an Azure-specific second field; today they read
response.model and that should "just work".
- The Azure-side mapping from alias to snapshot is only ever known on the response, so promoting the header at the SDK boundary is the natural place to do it (callers can still get the alias they sent from their own request options).
Impact / breakage analysis
The only consumers that would observe a behavior change are those who:
- Use
AzureOpenAI, AND
- Call the Responses API specifically, AND
- Read
Response.model expecting the deployment alias.
For those callers, the deployment alias they sent is already known on their side (it's the model they passed in). Azure also still surfaces it in x-ms-deployment-name for Chat Completions — and could trivially start sending the same header on Responses if needed for parity.
Reference implementation in a third-party framework
We've shipped a local fix in microsoft/agent-framework#5910 that does exactly this at the framework level (because we cannot wait on an SDK release). Doing this in openai-python would let us drop that workaround and benefit every Azure customer of the Responses API uniformly.
Confirm this is a feature request for the Python library and not the underlying OpenAI API
Describe the feature or improvement you're looking for
For Azure OpenAI clients (
AzureOpenAI/AsyncAzureOpenAI), please promote the value of thex-ms-served-modelresponse header intoResponse.model(and intoevent.response.modelfor streamedresponse.*events) when calling the Responses API.This would bring
AzureOpenAIin line with OpenAI's own behavior on every other endpoint.Background
OpenAI's own (non-Azure) endpoints already return the dated snapshot of the model that served the request, regardless of the alias sent in the request. Empirically (today, via
openai1.x):model(alias)Response.modelfrom OpenAIgpt-4o-minigpt-4o-mini-2024-07-18gpt-4ogpt-4o-2024-08-06gpt-5-nanogpt-5-nano-2025-08-07The same is true on Azure Chat Completions — the body already returns the snapshot — and (as a courtesy) Azure surfaces the deployment alias in
x-ms-deployment-name:But on the Azure Responses API specifically, the body carries the deployment alias and the snapshot is moved into the
x-ms-served-modelresponse header (not exposed by the SDK today):So Azure Responses is the outlier among all four code paths.
Additional context
Why this matters
Downstream consumers (telemetry, eval, RAG/agent frameworks, etc.) read
Response.modelto record which model actually served the request — to attribute traces, costs, regressions, and behavior changes to specific dated snapshots. With the current SDK behavior, Azure Responses users see the deployment alias only, which can mask:x-ms-is-spilled-over: true) where a different snapshot serves the request.Proposed change (behavioral)
In
openai/lib/azure.py, inBaseAzureClient(and async equivalent), add response post-processing such that:/responsesendpoint responses, when thex-ms-served-modelresponse header is present and non-empty, replaceResponse.modelwith that value before the SDK returns the parsed object.responses.create(stream=True, ...)andresponses.retrieve(..., stream=True), do the same on every event whoseevent.response.modelis currently set (e.g.response.created,response.in_progress,response.completed, ...) so that the value is consistent across the lifetime of the stream.We deliberately suggest the behavioral fix (overwrite
Response.model) rather than a backward-compatible additiveResponse.served_modelfield, because:Response.modelalready means on OpenAI's own endpoints and on Azure Chat Completions: the snapshot of the model that actually served the request.response.modeland that should "just work".Impact / breakage analysis
The only consumers that would observe a behavior change are those who:
AzureOpenAI, ANDResponse.modelexpecting the deployment alias.For those callers, the deployment alias they sent is already known on their side (it's the
modelthey passed in). Azure also still surfaces it inx-ms-deployment-namefor Chat Completions — and could trivially start sending the same header on Responses if needed for parity.Reference implementation in a third-party framework
We've shipped a local fix in microsoft/agent-framework#5910 that does exactly this at the framework level (because we cannot wait on an SDK release). Doing this in
openai-pythonwould let us drop that workaround and benefit every Azure customer of the Responses API uniformly.