# Azure OpenAI reasoning models

Azure OpenAI reasoning models are designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user's request, making them exceptionally strong in areas like science, coding, and math compared to previous iterations.

Key capabilities of reasoning models:

- Complex Code Generation: Capable of generating algorithms and handling advanced coding tasks to support developers.
- Advanced Problem Solving: Ideal for comprehensive brainstorming sessions and addressing multifaceted challenges.
- Complex Document Comparison: Perfect for analyzing contracts, case files, or legal documents to identify subtle differences.
- Instruction Following and Workflow Management: Particularly effective for managing workflows requiring shorter contexts.




In [None]:
import os
import logging
from openai import OpenAI
from dotenv import load_dotenv

logger = logging.getLogger()
logging.basicConfig(level=logging.INFO)

load_dotenv()

url = f"{os.getenv('AZURE_OPENAI_ENDPOINT')}openai/v1"
key = os.getenv("AZURE_OPENAI_API_KEY")
model = os.getenv("AZURE_OPENAI_COMPLETION_MODEL")

client = OpenAI(
    api_key=key,
    base_url=url
)

response = client.chat.completions.create(
    model="gpt-5-mini", # replace with the model deployment name of your o1 deployment.
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

# Reasoning effort
Reasoning models have reasoning_tokens as part of completion_tokens_details in the model response. These are hidden tokens that aren't returned as part of the message response content but are used by the model to help generate a final answer to your request. reasoning_effort can be set to low, medium, or high for all reasoning models except o1-mini. GPT-5 reasoning models support a new reasoning_effort setting of minimal. The higher the effort setting, the longer the model will spend processing the request, which will generally result in a larger number of reasoning_tokens.

In [None]:
response = client.chat.completions.create(
    model="gpt-5-mini", 
    messages=[
        {"role": "developer","content": "You are a helpful assistant."}, 
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000,
    reasoning_effort = "medium" # low, medium, or high
)

print(response.model_dump_json(indent=2))

# Reasoning summary
When using the latest reasoning models with the Responses API you can use the reasoning summary parameter to receive summaries of the model's chain of thought reasoning.

In [None]:
response = client.responses.create(
    input="Tell me about the curious case of neural text degeneration",
    model="gpt-5", # replace with model deployment name
    reasoning={
        "effort": "medium",
        "summary": "auto" # auto, concise, or detailed, gpt-5 series do not support concise 
    },
    text={
        "verbosity": "low" # New with GPT-5 models
    }
)

print(response.model_dump_json(indent=2))