# Technical Migration Guide: GPT-4o & GPT-4o-mini to GPT-5.1 & GPT-4.1-mini

This notebook provides a **code-focused** guide for migrating from GPT-4o and GPT-4o-mini to their recommended successors on Azure OpenAI.

## Migration Paths

| Source Model | Target Model | Model Type |
|--------------|--------------|------------|
| **GPT-4o** (all versions) | **GPT-5.1** | Reasoning model |
| **GPT-4o-mini** | **GPT-4.1-mini** | Standard model |

> **Important**: GPT-5.1 is a reasoning model (no temperature/top_p). GPT-4.1-mini is NOT a reasoning model and supports standard parameters.

## Related Resources

- **Microsoft Learn**: [Azure OpenAI Reasoning Models](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning)
- **Model Retirements**: [Model Retirement Dates](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-retirements)

---

## Migration Strategy

1. **Phase 1: Lift & Shift** - Minimal code changes to get the new models running
2. **Phase 2: Optimization** - Leverage GPT-5.1 specific features (reasoning models only)

---

# Phase 1: Lift & Shift

The goal of Phase 1 is to migrate with **minimal changes** to establish a baseline and validate that the new model works correctly with your existing prompts.

## 1.1 Install Dependencies

Install the required packages using the provided `requirements.txt`:

```bash
pip install -r requirements.txt
```

Per [Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning):

> *"You'll need to upgrade your OpenAI client library for access to the latest parameters."*

In [1]:
# Install dependencies from requirements.txt
# Run this in your terminal: pip install -r requirements.txt

# Or uncomment to install directly from notebook:
# !pip install -r requirements.txt

print("Required packages:")
print("  - openai>=1.40.0")
print("  - azure-identity>=1.15.0")
print("  - python-dotenv>=1.0.0")
print("\nInstall with: pip install -r requirements.txt")

Required packages:
  - openai>=1.40.0
  - azure-identity>=1.15.0
  - python-dotenv>=1.0.0

Install with: pip install -r requirements.txt


## 1.2 Configuration Changes

### Key Differences: GPT-4o vs GPT-5.1 (and GPT-4o-mini vs GPT-4.1-mini)

| Aspect | GPT-4o / GPT-4o-mini | GPT-5.1 | GPT-4.1-mini | Reference |
|--------|---------------------|---------|--------------|----------|
| **Endpoint** | `/openai/deployments/.../chat/completions?api-version=...` | `/openai/v1/chat/completions` | `/openai/v1/chat/completions` | [MS Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning) |
| **API Version** | Required (e.g., `2024-02-15-preview`) | Not required (v1 API) | Not required (v1 API) | [MS Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/model-inference-to-openai-migration) |
| **Max Tokens** | `max_tokens` | `max_completion_tokens` | `max_completion_tokens` | [MS Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning) |
| **Temperature** | Supported | Not supported | Supported | [MS Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning) |
| **Reasoning** | N/A | `reasoning_effort` | N/A | [OpenAI Docs](https://platform.openai.com/docs/guides/latest-model) |

In [2]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# =============================================================================
# CONFIGURATION - Update these values for your Azure environment
# =============================================================================

AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT", "https://YOUR-RESOURCE.openai.azure.com")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")

# =============================================================================
# SOURCE MODEL DEPLOYMENTS (for comparison testing)
# =============================================================================
GPT4O_DEPLOYMENT = os.getenv("GPT4O_DEPLOYMENT", "gpt-4o")
GPT4O_MINI_DEPLOYMENT = os.getenv("GPT4O_MINI_DEPLOYMENT", "gpt-4o-mini")

# =============================================================================
# TARGET MODEL DEPLOYMENTS (migration destinations)
# =============================================================================
GPT51_DEPLOYMENT = os.getenv("GPT51_DEPLOYMENT", "gpt-51")           # For GPT-4o migration
GPT41_MINI_DEPLOYMENT = os.getenv("GPT41_MINI_DEPLOYMENT", "gpt-41-mini")  # For GPT-4o-mini migration

print("Configuration loaded:")
print(f"  Endpoint: {AZURE_OPENAI_ENDPOINT}")
print(f"  API Key:  {'***' + AZURE_OPENAI_API_KEY[-4:] if AZURE_OPENAI_API_KEY else 'NOT SET'}")
print(f"\nSource Models:")
print(f"  GPT-4o:      {GPT4O_DEPLOYMENT}")
print(f"  GPT-4o-mini: {GPT4O_MINI_DEPLOYMENT}")
print(f"\nTarget Models:")
print(f"  GPT-5.1:      {GPT51_DEPLOYMENT}")
print(f"  GPT-4.1-mini: {GPT41_MINI_DEPLOYMENT}")

Configuration loaded:
  Endpoint: https://models-llm-migration-swe-eaf.cognitiveservices.azure.com
  API Key:  ***wNdv

Source Models:
  GPT-4o:      gpt-4o
  GPT-4o-mini: gpt-4o-mini

Target Models:
  GPT-5.1:      gpt-5.1
  GPT-4.1-mini: gpt-4.1-mini


## 1.3 Client Initialization: Before & After

According to [Microsoft Learn - Migration Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/model-inference-to-openai-migration):

> *"Change endpoint URLs from `.services.ai.azure.com/models` to `.openai.azure.com/openai/v1/`"*

> *"The v1 API eliminates the need to frequently update `api-version` parameters."*

In [3]:
# =============================================================================
# BEFORE: GPT-4o Client Initialization (with Entra ID - recommended)
# =============================================================================

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# Use Entra ID authentication (requires: az login)
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

# Old approach with AzureOpenAI client and api_version
client_gpt4o = AzureOpenAI(
    azure_ad_token_provider=token_provider,  # Entra ID instead of api_key
    api_version="2024-02-15-preview",
    azure_endpoint=AZURE_OPENAI_ENDPOINT
)

print("GPT-4o client initialized (Entra ID authentication)")

GPT-4o client initialized (Entra ID authentication)


In [4]:
# =============================================================================
# AFTER: GPT-5.1 and GPT-4.1-mini Client Initialization (with Entra ID)
# =============================================================================

from openai import OpenAI

# New approach with OpenAI client and v1 endpoint
# Same client works for both GPT-5.1 and GPT-4.1-mini
# Using Entra ID token from previous cell
client_new = OpenAI(
    api_key=token_provider(),  # Get token from Entra ID
    base_url=f"{AZURE_OPENAI_ENDPOINT}/openai/v1/"  # v1 API - no version needed
)

print("New client initialized (works for both GPT-5.1 and GPT-4.1-mini)")

New client initialized (works for both GPT-5.1 and GPT-4.1-mini)


### Alternative: Microsoft Entra ID Authentication (Recommended for Production)

From [Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning):

In [5]:
# =============================================================================
# ALTERNATIVE: Microsoft Entra ID Authentication (Recommended for Production)
# =============================================================================

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# Get token provider for Cognitive Services
token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

# Initialize client with Entra ID (works for all models)
client_entra = OpenAI(
    base_url=f"{AZURE_OPENAI_ENDPOINT}/openai/v1/",
    api_key=token_provider  # Token provider instead of API key
)

print("Client initialized with Entra ID (works for GPT-5.1 and GPT-4.1-mini)")

Client initialized with Entra ID (works for GPT-5.1 and GPT-4.1-mini)


## 1.4 API Call Changes: Before & After

### Critical Parameter Changes

From [Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning):

> *"Reasoning models will only work with the `max_completion_tokens` parameter when using the Chat Completions API."*

### Two Migration Paths

1. **GPT-4o -> GPT-5.1**: Remove temperature/top_p, add `reasoning_effort="none"`
2. **GPT-4o-mini -> GPT-4.1-mini**: Keep temperature/top_p, just update endpoint and `max_tokens` -> `max_completion_tokens`

In [6]:
# =============================================================================
# BEFORE: GPT-4o and GPT-4o-mini API Calls
# =============================================================================

def call_gpt4o(user_message: str, system_prompt: str = "You are a helpful assistant."):
    """GPT-4o call with original parameters"""
    response = client_gpt4o.chat.completions.create(
        model=GPT4O_DEPLOYMENT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        max_tokens=4096,
        temperature=0.7,
        top_p=0.95
    )
    return response.choices[0].message.content

def call_gpt4o_mini(user_message: str, system_prompt: str = "You are a helpful assistant."):
    """GPT-4o-mini call with original parameters"""
    response = client_gpt4o.chat.completions.create(
        model=GPT4O_MINI_DEPLOYMENT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        max_tokens=4096,
        temperature=0.7,
        top_p=0.95
    )
    return response.choices[0].message.content

print("GPT-4o function defined")
print("GPT-4o-mini function defined")

GPT-4o function defined
GPT-4o-mini function defined


In [7]:
# =============================================================================
# AFTER: GPT-5.1 and GPT-4.1-mini API Calls (Phase 1 - Lift & Shift)
# =============================================================================

def call_gpt51_phase1(user_message: str, system_prompt: str = "You are a helpful assistant."):
    """
    GPT-5.1 call with minimal changes for Phase 1 migration (from GPT-4o).
    
    Key changes:
    - max_tokens -> max_completion_tokens
    - Added reasoning_effort="none" to match GPT-4o behavior
    - Removed temperature and top_p (not supported with reasoning models)
    - Kept "system" role (backward compatible)
    """
    response = client_new.chat.completions.create(
        model=GPT51_DEPLOYMENT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        max_completion_tokens=4096,
        reasoning_effort="none"  # Critical: match GPT-4o behavior
        # No temperature, top_p (not supported in reasoning models)
    )
    return response.choices[0].message.content


def call_gpt41_mini_phase1(user_message: str, system_prompt: str = "You are a helpful assistant."):
    """
    GPT-4.1-mini call for Phase 1 migration (from GPT-4o-mini).
    
    Key changes:
    - max_tokens -> max_completion_tokens
    - temperature and top_p are SUPPORTED (not a reasoning model)
    """
    response = client_new.chat.completions.create(
        model=GPT41_MINI_DEPLOYMENT,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ],
        max_completion_tokens=4096,
        temperature=0.7,  # Supported in GPT-4.1-mini
        top_p=0.95        # Supported in GPT-4.1-mini
    )
    return response.choices[0].message.content

print("GPT-5.1 Phase 1 function defined (reasoning model - no temperature)")
print("GPT-4.1-mini Phase 1 function defined (supports temperature)")

GPT-5.1 Phase 1 function defined (reasoning model - no temperature)
GPT-4.1-mini Phase 1 function defined (supports temperature)


## 1.5 Parameters NOT Supported in GPT-5.1 (Reasoning Models Only)

From [Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning):

> *"The following are currently unsupported with reasoning models: `temperature`, `top_p`, `presence_penalty`, `frequency_penalty`, `logprobs`, `top_logprobs`, `logit_bias`, `max_tokens`"*

### This applies ONLY to GPT-5.1 (reasoning model)

**GPT-4.1-mini supports all standard parameters** (temperature, top_p, etc.) because it is NOT a reasoning model.

In [8]:
# =============================================================================
# UNSUPPORTED PARAMETERS - For GPT-5.1 (reasoning model) ONLY
# =============================================================================

UNSUPPORTED_IN_GPT51 = [
    "temperature",
    "top_p",
    "presence_penalty",
    "frequency_penalty",
    "logprobs",
    "top_logprobs",
    "logit_bias",
    "max_tokens",  # Use max_completion_tokens instead
]

print("Parameters NOT supported in GPT-5.1 (reasoning model):")
for param in UNSUPPORTED_IN_GPT51:
    print(f"   - {param}")

print("\nUse 'max_completion_tokens' instead of 'max_tokens'")
print("\n" + "=" * 60)
print("GPT-4.1-mini SUPPORTS all these parameters (not a reasoning model)")
print("   - temperature")
print("   - top_p")
print("   - etc.")

Parameters NOT supported in GPT-5.1 (reasoning model):
   - temperature
   - top_p
   - presence_penalty
   - frequency_penalty
   - logprobs
   - top_logprobs
   - logit_bias
   - max_tokens

Use 'max_completion_tokens' instead of 'max_tokens'

GPT-4.1-mini SUPPORTS all these parameters (not a reasoning model)
   - temperature
   - top_p
   - etc.


## 1.6 Test Phase 1 Migration

Following the [OpenAI GPT-5 Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide):

> *"Step 3: Run Evals for a baseline. After model + effort are aligned, run your eval suite. If results look good (often better at med/high), you're ready to ship."*

In [9]:
# =============================================================================
# TEST: Compare Source vs Target Models with identical prompts
# =============================================================================

test_prompt = "Explain microservices architecture in 3 sentences."
system_prompt = "You are a technical architect. Be concise and precise."

print("=" * 70)
print("TEST: Comparing Migration Paths")
print("=" * 70)
print(f"\nPrompt: {test_prompt}")
print(f"System: {system_prompt}")

# =============================================================================
# Migration Path 1: GPT-4o -> GPT-5.1
# =============================================================================
print("\n" + "-" * 70)
print("Migration Path 1: GPT-4o -> GPT-5.1")
print("-" * 70)

# Uncomment to run actual tests (requires valid Azure credentials)
print("\n--- GPT-4o Response ---")
response_4o = call_gpt4o(test_prompt, system_prompt)
print(response_4o)

# print("\n--- GPT-5.1 Response (Phase 1) ---")
response_51 = call_gpt51_phase1(test_prompt, system_prompt)
print(response_51)

TEST: Comparing Migration Paths

Prompt: Explain microservices architecture in 3 sentences.
System: You are a technical architect. Be concise and precise.

----------------------------------------------------------------------
Migration Path 1: GPT-4o -> GPT-5.1
----------------------------------------------------------------------

--- GPT-4o Response ---
Microservices architecture is a design approach where an application is built as a collection of small, independent services, each responsible for a specific business functionality. These services communicate via lightweight protocols, typically HTTP/REST or messaging, and can be developed, deployed, and scaled independently. This architecture promotes modularity, agility, and fault isolation, making systems easier to maintain and evolve.
Microservices architecture is an approach where an application is built as a suite of small, independently deployable services, each responsible for a specific business capability. These services co

In [10]:
# =============================================================================
# Migration Path 2: GPT-4o-mini -> GPT-4.1-mini
# =============================================================================
print("\n" + "-" * 70)
print("Migration Path 2: GPT-4o-mini -> GPT-4.1-mini")
print("-" * 70)

# Uncomment to run actual tests
print("\n--- GPT-4o-mini Response ---")
response_4o_mini = call_gpt4o_mini(test_prompt, system_prompt)
print(response_4o_mini)

# print("\n--- GPT-4.1-mini Response (Phase 1) ---")
response_41_mini = call_gpt41_mini_phase1(test_prompt, system_prompt)
print(response_41_mini)


----------------------------------------------------------------------
Migration Path 2: GPT-4o-mini -> GPT-4.1-mini
----------------------------------------------------------------------

--- GPT-4o-mini Response ---
Microservices architecture is an approach to software development where applications are structured as a collection of loosely coupled, independently deployable services. Each service is designed to perform a specific business function and can be developed, deployed, and scaled independently. This architecture enhances flexibility, allows for continuous delivery, and enables teams to use different technologies for different services.
Microservices architecture is a design approach where an application is composed of small, independent services that communicate over well-defined APIs. Each service focuses on a specific business capability and can be developed, deployed, and scaled independently. This architecture enhances flexibility, scalability, and fault isolation comp

## 1.7 Phase 1 Migration Checklist

Based on [Microsoft Learn Migration Checklist](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/model-inference-to-openai-migration):

> *"Use this checklist to ensure a smooth migration:*
> 1. *Install the OpenAI SDK for your programming language*
> 2. *Update authentication code (API key or Microsoft Entra ID)*
> 3. *Change endpoint URLs*
> 4. *Update client initialization code*
> 5. *Always specify the model parameter with your deployment name*
> 6. *Update request method calls*
> 7. *Test all functionality thoroughly"*

In [11]:
# =============================================================================
# PHASE 1 CHECKLIST
# =============================================================================

print("Phase 1 Migration Checklist:")
print("=" * 60)

print("\nFor BOTH migration paths:")
checklist_common = {
    "SDK upgraded to latest version": True,
    "python-dotenv installed": True,
    "Endpoint changed to /openai/v1/": True,
    "Client uses OpenAI instead of AzureOpenAI": True,
    "api_version parameter removed": True,
    "max_tokens renamed to max_completion_tokens": True,
    "Prompts kept IDENTICAL to source model": True,
    "Baseline evaluation completed": False,  # TODO: Run your evals
}

for item, completed in checklist_common.items():
    status = "[x]" if completed else "[ ]"
    print(f"  {status} {item}")

print("\nFor GPT-4o -> GPT-5.1 migration:")
checklist_gpt51 = {
    "reasoning_effort='none' added": True,
    "temperature parameter REMOVED": True,
    "top_p parameter REMOVED": True,
}

for item, completed in checklist_gpt51.items():
    status = "[x]" if completed else "[ ]"
    print(f"  {status} {item}")

print("\nFor GPT-4o-mini -> GPT-4.1-mini migration:")
checklist_mini = {
    "temperature parameter KEPT": True,
    "top_p parameter KEPT": True,
    "No reasoning_effort needed": True,
}

for item, completed in checklist_mini.items():
    status = "[x]" if completed else "[ ]"
    print(f"  {status} {item}")

Phase 1 Migration Checklist:

For BOTH migration paths:
  [x] SDK upgraded to latest version
  [x] python-dotenv installed
  [x] Endpoint changed to /openai/v1/
  [x] Client uses OpenAI instead of AzureOpenAI
  [x] api_version parameter removed
  [x] max_tokens renamed to max_completion_tokens
  [x] Prompts kept IDENTICAL to source model
  [ ] Baseline evaluation completed

For GPT-4o -> GPT-5.1 migration:
  [x] reasoning_effort='none' added
  [x] temperature parameter REMOVED
  [x] top_p parameter REMOVED

For GPT-4o-mini -> GPT-4.1-mini migration:
  [x] temperature parameter KEPT
  [x] top_p parameter KEPT
  [x] No reasoning_effort needed


---

# Phase 2: Optimization (GPT-5.1 Only)

Once Phase 1 is validated, we can leverage GPT-5.1 specific features to improve performance.

> **Note**: Phase 2 optimizations apply primarily to **GPT-5.1** (reasoning model). GPT-4.1-mini uses standard chat patterns and does not require these changes.

From the [OpenAI GPT-5 Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide):

> *"GPT-5.x models are especially well-suited for production agents that prioritize reliability, evaluability, and consistent behavior. They perform strongly across coding, document analysis, finance, and multi-tool agentic scenarios."*

## 2.1 Developer Messages vs System Messages

According to [Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning):

> *"Functionally developer messages `"role": "developer"` are the same as system messages."*

> *"When you use a system message with o4-mini, o3, o3-mini, and o1 it will be treated as a developer message. You should not use both a developer message and a system message in the same API request."*

### Important Rule: Never mix `system` and `developer` roles!

In [12]:
# =============================================================================
# PHASE 2: Using developer role (GPT-5.1 only)
# =============================================================================

def call_gpt51_phase2(user_message: str, developer_prompt: str = "You are a helpful assistant."):
    """
    GPT-5.1 call with Phase 2 optimizations.
    
    Key changes from Phase 1:
    - "system" role -> "developer" role (more explicit for reasoning models)
    """
    response = client_new.chat.completions.create(
        model=GPT51_DEPLOYMENT,
        messages=[
            {"role": "developer", "content": developer_prompt},  # Use developer role
            {"role": "user", "content": user_message}
        ],
        max_completion_tokens=4096,
        reasoning_effort="none"
    )
    return response.choices[0].message.content

print("GPT-5.1 Phase 2 function defined (with developer role)")
print("\nNote: GPT-4.1-mini continues to use 'system' role (standard chat model)")

GPT-5.1 Phase 2 function defined (with developer role)

Note: GPT-4.1-mini continues to use 'system' role (standard chat model)


In [13]:
# =============================================================================
# ANTI-PATTERN: Never mix system and developer roles!
# =============================================================================

# This will cause issues - DO NOT DO THIS:
bad_messages_example = [
    {"role": "system", "content": "You are a helpful assistant."},     # BAD
    {"role": "developer", "content": "Always respond in French."},     # CONFLICT!
    {"role": "user", "content": "Hello"}
]

print("ANTI-PATTERN: Never use both 'system' and 'developer' in the same request!")
print("\nBad example:")
for msg in bad_messages_example:
    print(f"   {msg}")

ANTI-PATTERN: Never use both 'system' and 'developer' in the same request!

Bad example:
   {'role': 'system', 'content': 'You are a helpful assistant.'}
   {'role': 'developer', 'content': 'Always respond in French.'}
   {'role': 'user', 'content': 'Hello'}


## 2.2 Prompt Optimization for GPT-5.1

According to the [OpenAI GPT-5 Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide):

> *"GPT-5.x models deliver:*
> - *More deliberate scaffolding: Builds clearer plans and intermediate structure by default*
> - *Generally lower verbosity: More concise and task-focused*
> - *Stronger instruction adherence: Less drift from user intent*
> - *Conservative grounding bias: Tends to favor correctness and explicit reasoning"*

### 2.2.1 Verbosity Control

> *"Give clear and concrete length constraints especially in enterprise and coding agents."*

In [14]:
# =============================================================================
# VERBOSITY CONTROL - Recommended prompt pattern
# =============================================================================

VERBOSITY_CONTROL_PROMPT = """
<output_verbosity_spec>
- Default: 3-6 sentences or <=5 bullets for typical answers.
- For simple "yes/no + short explanation" questions: <=2 sentences.
- For complex multi-step or multi-file tasks: 
  - 1 short overview paragraph
  - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
- Avoid long narrative paragraphs; prefer compact bullets and short sections.
- Do not rephrase the user's request unless it changes semantics.
</output_verbosity_spec>
"""

print("Verbosity Control Prompt Template:")
print(VERBOSITY_CONTROL_PROMPT)

Verbosity Control Prompt Template:

<output_verbosity_spec>
- Default: 3-6 sentences or <=5 bullets for typical answers.
- For simple "yes/no + short explanation" questions: <=2 sentences.
- For complex multi-step or multi-file tasks: 
  - 1 short overview paragraph
  - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
- Avoid long narrative paragraphs; prefer compact bullets and short sections.
- Do not rephrase the user's request unless it changes semantics.
</output_verbosity_spec>



### 2.2.2 Scope Drift Prevention

From the [OpenAI GPT-5 Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide):

> *"GPT-5.x is stronger at structured code but may produce more code than the minimal UX specs and design systems. To stay within the scope, explicitly forbid extra features and uncontrolled styling."*

In [15]:
# =============================================================================
# SCOPE DRIFT PREVENTION - Recommended prompt pattern
# =============================================================================

SCOPE_CONTROL_PROMPT = """
<design_and_scope_constraints>
- Explore any existing design systems and understand it deeply. 
- Implement EXACTLY and ONLY what the user requests.
- No extra features, no added components, no UX embellishments.
- Style aligned to the design system at hand. 
- Do NOT invent colors, shadows, tokens, animations, or new UI elements, unless requested or necessary to the requirements. 
- If any instruction is ambiguous, choose the simplest valid interpretation.
</design_and_scope_constraints>
"""

print("Scope Control Prompt Template:")
print(SCOPE_CONTROL_PROMPT)

Scope Control Prompt Template:

<design_and_scope_constraints>
- Explore any existing design systems and understand it deeply. 
- Implement EXACTLY and ONLY what the user requests.
- No extra features, no added components, no UX embellishments.
- Style aligned to the design system at hand. 
- Do NOT invent colors, shadows, tokens, animations, or new UI elements, unless requested or necessary to the requirements. 
- If any instruction is ambiguous, choose the simplest valid interpretation.
</design_and_scope_constraints>



### 2.2.3 Ambiguity and Hallucination Prevention

From the [OpenAI GPT-5 Prompting Guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5_prompting_guide):

> *"Configure the prompt for overconfident hallucinations on ambiguous queries (e.g., unclear requirements, missing constraints, or questions that need fresh data but no tools are called)."*

In [16]:
# =============================================================================
# AMBIGUITY HANDLING - Recommended prompt pattern
# =============================================================================

AMBIGUITY_HANDLING_PROMPT = """
<uncertainty_and_ambiguity>
- If the question is ambiguous or underspecified, explicitly call this out and:
  - Ask up to 1-3 precise clarifying questions, OR
  - Present 2-3 plausible interpretations with clearly labeled assumptions.
- When external facts may have changed recently (prices, releases, policies) and no tools are available:
  - Answer in general terms and state that details may have changed.
- Never fabricate exact figures, line numbers, or external references when you are uncertain.
- When you are unsure, prefer language like "Based on the provided context..." instead of absolute claims.
</uncertainty_and_ambiguity>
"""

print("Ambiguity Handling Prompt Template:")
print(AMBIGUITY_HANDLING_PROMPT)

Ambiguity Handling Prompt Template:

<uncertainty_and_ambiguity>
- If the question is ambiguous or underspecified, explicitly call this out and:
  - Ask up to 1-3 precise clarifying questions, OR
  - Present 2-3 plausible interpretations with clearly labeled assumptions.
- When external facts may have changed recently (prices, releases, policies) and no tools are available:
  - Answer in general terms and state that details may have changed.
- Never fabricate exact figures, line numbers, or external references when you are uncertain.
- When you are unsure, prefer language like "Based on the provided context..." instead of absolute claims.
</uncertainty_and_ambiguity>



## 2.3 Complete Optimized Developer Prompt

In [17]:
# =============================================================================
# COMPLETE OPTIMIZED DEVELOPER PROMPT FOR GPT-5.1
# =============================================================================

def build_optimized_developer_prompt(base_instructions: str) -> str:
    """
    Build an optimized developer prompt for GPT-5.1 by adding
    recommended control patterns from the OpenAI guide.
    """
    return f"""{base_instructions}

<output_verbosity_spec>
- Default: 3-6 sentences or <=5 bullets for typical answers.
- For simple questions: <=2 sentences.
- For complex tasks: 1 overview paragraph + <=5 tagged bullets.
- Avoid long narrative paragraphs; prefer concise responses.
</output_verbosity_spec>

<scope_constraints>
- Implement EXACTLY and ONLY what the user requests.
- No extra features or embellishments unless explicitly requested.
- If instructions are ambiguous, choose the simplest valid interpretation.
</scope_constraints>

<uncertainty_handling>
- If uncertain, acknowledge it explicitly.
- Never fabricate exact figures or references.
- Use "Based on the provided context..." when unsure.
</uncertainty_handling>
"""

# Example usage
base_prompt = "You are a senior software architect specializing in cloud solutions."
optimized_prompt = build_optimized_developer_prompt(base_prompt)

print("Optimized Developer Prompt for GPT-5.1:")
print("=" * 60)
print(optimized_prompt)

Optimized Developer Prompt for GPT-5.1:
You are a senior software architect specializing in cloud solutions.

<output_verbosity_spec>
- Default: 3-6 sentences or <=5 bullets for typical answers.
- For simple questions: <=2 sentences.
- For complex tasks: 1 overview paragraph + <=5 tagged bullets.
- Avoid long narrative paragraphs; prefer concise responses.
</output_verbosity_spec>

<scope_constraints>
- Implement EXACTLY and ONLY what the user requests.
- No extra features or embellishments unless explicitly requested.
- If instructions are ambiguous, choose the simplest valid interpretation.
</scope_constraints>

<uncertainty_handling>
- If uncertain, acknowledge it explicitly.
- Never fabricate exact figures or references.
- Use "Based on the provided context..." when unsure.
</uncertainty_handling>



## 2.4 Using the Responses API (Advanced - GPT-5.1 Only)

For more advanced control, GPT-5.1 supports the Responses API with additional parameters.

From [Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning):

> *"When using the latest reasoning models with the Responses API you can use the reasoning summary parameter to receive summaries of the model's chain of thought reasoning."*

In [18]:
# =============================================================================
# RESPONSES API - Advanced usage with verbosity control (GPT-5.1 only)
# =============================================================================

def call_gpt51_responses_api(
    user_message: str,
    reasoning_effort: str = "none",
    verbosity: str = "low"
):
    """
    GPT-5.1 call using the Responses API for advanced control.
    
    Parameters:
    - reasoning_effort: none, minimal, low, medium, high, xhigh
    - verbosity: low, medium, high (new GPT-5 parameter)
    """
    response = client_new.responses.create(
        model=GPT51_DEPLOYMENT,
        input=user_message,
        reasoning={
            "effort": reasoning_effort,
            "summary": "auto"  # auto, concise, or detailed
        },
        text={
            "verbosity": verbosity  # New GPT-5 parameter
        }
    )
    return response

print("Responses API function defined (GPT-5.1 only)")
print("\nAvailable reasoning_effort levels:")
print("   - none: No reasoning (fastest, similar to GPT-4o)")
print("   - minimal: Minimal reasoning")
print("   - low: Light reasoning")
print("   - medium: Moderate reasoning (default for GPT-5)")
print("   - high: Deep reasoning")
print("   - xhigh: Maximum reasoning")

Responses API function defined (GPT-5.1 only)

Available reasoning_effort levels:
   - none: No reasoning (fastest, similar to GPT-4o)
   - minimal: Minimal reasoning
   - low: Light reasoning
   - medium: Moderate reasoning (default for GPT-5)
   - high: Deep reasoning
   - xhigh: Maximum reasoning


## 2.5 Adjusting Reasoning Effort by Use Case

From the [OpenAI Guide](https://platform.openai.com/docs/guides/latest-model):

> *"The `reasoning effort` parameter controls how many reasoning tokens the model generates before producing a response. Earlier reasoning models like o3 supported only low, medium, and high. With GPT-5.2, the lowest setting is none to provide lower-latency interactions."*

In [19]:
# =============================================================================
# REASONING EFFORT RECOMMENDATIONS BY USE CASE
# =============================================================================

REASONING_RECOMMENDATIONS = {
    "Simple Q&A / Chat": {
        "effort": "none",
        "description": "Fast responses, similar to GPT-4o behavior"
    },
    "Content Generation": {
        "effort": "none",
        "description": "Creative tasks don't need deep reasoning"
    },
    "Code Review / Simple Coding": {
        "effort": "low",
        "description": "Light reasoning for code analysis"
    },
    "Complex Analysis / Reports": {
        "effort": "medium",
        "description": "Balanced reasoning for analytical tasks"
    },
    "Algorithm Design / Architecture": {
        "effort": "high",
        "description": "Deep reasoning for complex technical decisions"
    },
    "Math / Scientific Problems": {
        "effort": "high",
        "description": "Maximum reasoning for precise calculations"
    },
    "Critical Business Decisions": {
        "effort": "xhigh",
        "description": "Maximum quality, cost secondary (GPT-5.2 only)"
    }
}

print("Reasoning Effort Recommendations by Use Case:")
print("=" * 70)
for use_case, config in REASONING_RECOMMENDATIONS.items():
    print(f"\n{use_case}:")
    print(f"   Effort: {config['effort']}")
    print(f"   {config['description']}")

Reasoning Effort Recommendations by Use Case:

Simple Q&A / Chat:
   Effort: none
   Fast responses, similar to GPT-4o behavior

Content Generation:
   Effort: none
   Creative tasks don't need deep reasoning

Code Review / Simple Coding:
   Effort: low
   Light reasoning for code analysis

Complex Analysis / Reports:
   Effort: medium
   Balanced reasoning for analytical tasks

Algorithm Design / Architecture:
   Effort: high
   Deep reasoning for complex technical decisions

Math / Scientific Problems:
   Effort: high
   Maximum reasoning for precise calculations

Critical Business Decisions:
   Effort: xhigh
   Maximum quality, cost secondary (GPT-5.2 only)


---

# Summary: Migration Reference Card

## Migration Path 1: GPT-4o -> GPT-5.1

| Aspect | GPT-4o | GPT-5.1 (Phase 1) | GPT-5.1 (Phase 2) |
|--------|--------|-------------------|-------------------|
| **Client** | `AzureOpenAI` | `OpenAI` | `OpenAI` |
| **Endpoint** | `.../chat/completions?api-version=...` | `.../openai/v1/chat/completions` | `.../openai/v1/responses` |
| **Max Tokens** | `max_tokens` | `max_completion_tokens` | `max_output_tokens` |
| **System Prompt** | `"role": "system"` | `"role": "system"` (compat) | `"role": "developer"` |
| **Temperature** | Supported | Not supported | Not supported |
| **Reasoning** | N/A | `reasoning_effort="none"` | Adjust per use case |

## Migration Path 2: GPT-4o-mini -> GPT-4.1-mini

| Aspect | GPT-4o-mini | GPT-4.1-mini |
|--------|-------------|-------------|
| **Client** | `AzureOpenAI` | `OpenAI` |
| **Endpoint** | `.../chat/completions?api-version=...` | `.../openai/v1/chat/completions` |
| **Max Tokens** | `max_tokens` | `max_completion_tokens` |
| **System Prompt** | `"role": "system"` | `"role": "system"` |
| **Temperature** | Supported | Supported |
| **Reasoning** | N/A | N/A (not a reasoning model) |

## Official Documentation Links

- **Microsoft Learn**: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/reasoning
- **Model Retirements**: https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-retirements

---

## Next Steps

1. **Run Phase 1** with your existing prompts and evaluate results
2. **Compare metrics** between source and target models
3. **For GPT-5.1**: Iterate on Phase 2 optimizations based on evaluation results

## Migration Decision Tree

```
What model are you currently using?
+-- GPT-4o (any version)
|   +-- Migrate to GPT-5.1
|       +-- Remove temperature/top_p
|       +-- Add reasoning_effort="none"
|       +-- Consider Phase 2 optimizations
|
+-- GPT-4o-mini
    +-- Migrate to GPT-4.1-mini
        +-- Keep temperature/top_p
        +-- Just update endpoint and max_tokens
```

## Additional Resources

- [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/)
- [Model Retirements](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-retirements)