Skip to content

AzureOpenAI/Foundry: promote x-ms-served-model header into Response.model for the Responses API (parity with OpenAI/Azure Chat Completions) #3271

@eavanvalkenburg

Description

@eavanvalkenburg

Confirm this is a feature request for the Python library and not the underlying OpenAI API

  • This is a feature request for the Python library

Describe the feature or improvement you're looking for

For Azure OpenAI clients (AzureOpenAI / AsyncAzureOpenAI), please promote the value of the x-ms-served-model response header into Response.model (and into event.response.model for streamed response.* events) when calling the Responses API.

This would bring AzureOpenAI in line with OpenAI's own behavior on every other endpoint.

Background

OpenAI's own (non-Azure) endpoints already return the dated snapshot of the model that served the request, regardless of the alias sent in the request. Empirically (today, via openai 1.x):

Sent model (alias) Response.model from OpenAI
gpt-4o-mini gpt-4o-mini-2024-07-18
gpt-4o gpt-4o-2024-08-06
gpt-5-nano gpt-5-nano-2025-08-07

The same is true on Azure Chat Completions — the body already returns the snapshot — and (as a courtesy) Azure surfaces the deployment alias in x-ms-deployment-name:

[Chat Completions sent=gpt-5-nano]
  body.model            = 'gpt-5-nano-2025-08-07'   ✅ snapshot
  x-ms-served-model     = None
  x-ms-deployment-name  = 'gpt-5-nano'

But on the Azure Responses API specifically, the body carries the deployment alias and the snapshot is moved into the x-ms-served-model response header (not exposed by the SDK today):

[Responses sent=gpt-5-nano, non-streaming]
  body.model            = 'gpt-5-nano'              ❌ alias, not snapshot
  x-ms-served-model     = 'gpt-5-nano-2025-08-07'
  x-ms-deployment-name  = (not sent on Responses)

[Responses sent=gpt-5-nano, streaming]
  body.model (per event)= 'gpt-5-nano'              ❌
  x-ms-served-model     = 'gpt-5-nano-2025-08-07'

So Azure Responses is the outlier among all four code paths.

Additional context

Why this matters

Downstream consumers (telemetry, eval, RAG/agent frameworks, etc.) read Response.model to record which model actually served the request — to attribute traces, costs, regressions, and behavior changes to specific dated snapshots. With the current SDK behavior, Azure Responses users see the deployment alias only, which can mask:

  • Auto-rollouts when an alias deployment is bumped from one snapshot to a newer one.
  • Spillover routing (x-ms-is-spilled-over: true) where a different snapshot serves the request.
  • Regional differences in which snapshot backs the same alias.

Proposed change (behavioral)

In openai/lib/azure.py, in BaseAzureClient (and async equivalent), add response post-processing such that:

  • For /responses endpoint responses, when the x-ms-served-model response header is present and non-empty, replace Response.model with that value before the SDK returns the parsed object.
  • For streaming responses.create(stream=True, ...) and responses.retrieve(..., stream=True), do the same on every event whose event.response.model is currently set (e.g. response.created, response.in_progress, response.completed, ...) so that the value is consistent across the lifetime of the stream.
  • A non-empty / stripped guard on the header keeps the SDK robust to empty values.

We deliberately suggest the behavioral fix (overwrite Response.model) rather than a backward-compatible additive Response.served_model field, because:

  1. It restores parity with what Response.model already means on OpenAI's own endpoints and on Azure Chat Completions: the snapshot of the model that actually served the request.
  2. It avoids forcing every framework / telemetry library to learn an Azure-specific second field; today they read response.model and that should "just work".
  3. The Azure-side mapping from alias to snapshot is only ever known on the response, so promoting the header at the SDK boundary is the natural place to do it (callers can still get the alias they sent from their own request options).

Impact / breakage analysis

The only consumers that would observe a behavior change are those who:

  • Use AzureOpenAI, AND
  • Call the Responses API specifically, AND
  • Read Response.model expecting the deployment alias.

For those callers, the deployment alias they sent is already known on their side (it's the model they passed in). Azure also still surfaces it in x-ms-deployment-name for Chat Completions — and could trivially start sending the same header on Responses if needed for parity.

Reference implementation in a third-party framework

We've shipped a local fix in microsoft/agent-framework#5910 that does exactly this at the framework level (because we cannot wait on an SDK release). Doing this in openai-python would let us drop that workaround and benefit every Azure customer of the Responses API uniformly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions