# Structured Outputs

By default, models return responses in plain text format. **Structured Outputs** is a feature that can force a model to generate responses in JSON format, based on the JSON schema provided by you.

Structured Outputs is available in two forms in the OpenAI API:
- **Function Calling**: Demonstrated in the next examples
- **JSON Schema Response Format**: Specify a `text_format` to directly control the structure of the model's output

In this demo, we'll focus on using the **JSON Schema Response Format**.

## Steps:
1. **Define your schema**: Write Pydantic classes to define the object schema that represents the structure of the desired output
2. **Supply your schema to the API call**: Pass the object schema to the model using the `text_format` parameter
3. **Handle edge cases**: In some cases, the model might not generate a valid response that matches the provided JSON schema

**Important**: Instead of `client.chat.completions.create`, use `client.responses.parse` for structured output

## Documentation References:
- [Azure OpenAI Structured Outputs](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/structured-outputs)
- [OpenAI Structured Outputs Guide](https://platform.openai.com/docs/guides/structured-outputs)
- [OpenAI Cookbook: Structured Outputs](https://cookbook.openai.com/examples/structured_outputs_intro)

## Prerequisites

Before running this notebook, ensure you have:

1. **Python Environment**: Make sure that python3 is installed on your system
2. **Virtual Environment** (recommended):
   ```bash
   python3 -m venv venv
   source venv/bin/activate
   ```
3. **Required Libraries**: Install using requirements.txt:
   ```bash
   pip3 install -r requirements.txt
   ```
4. **Environment Variables**: Create a `.env` file with:
   ```
   AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>
   AZURE_OPENAI_MODEL=<your_azure_openai_model>
   AZURE_OPENAI_VERSION=<your_azure_openai_api_version>  # Should be 2023-05-15 or newer
   AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>
   ```

## Environment Setup and Library Installation

Let's start by installing and importing all the necessary libraries for working with Azure OpenAI structured outputs.

In [9]:
! pip3 install openai python-dotenv pydantic


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [10]:
# Import required modules
from openai import AzureOpenAI, OpenAI     # The `AzureOpenAI` library is used to interact with the Azure OpenAI API
from dotenv import load_dotenv             # The `dotenv` library is used to load environment variables from a .env file
import os                                  # Used to get the values from environment variables
from pprint import pprint                  # The `pprint` library is used to pretty-print objects
from pydantic import BaseModel, Field      # Pydantic is used to define the structure of the output we want
from typing import List, Optional          # Used for type hints in our Pydantic models
import json                                # Used to work with JSON data

print("✅ All libraries imported successfully!")

✅ All libraries imported successfully!


## Load Environment Variables and Initialize Azure OpenAI Client

Now let's load our environment variables and set up the Azure OpenAI client.

In [11]:
# Load environment variables from .env file
load_dotenv()

# Extract environment variables and store them explicitly to ensure they're available
AZURE_OPENAI_ENDPOINT = os.environ.get('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_MODEL = os.environ.get('AZURE_OPENAI_MODEL')
AZURE_OPENAI_VERSION = os.environ.get('AZURE_OPENAI_VERSION')  # Make sure this matches your .env file
AZURE_OPENAI_API_KEY = os.environ.get('AZURE_OPENAI_API_KEY')

# Verify that all required environment variables are loaded
required_vars = {
    'AZURE_OPENAI_ENDPOINT': AZURE_OPENAI_ENDPOINT,
    'AZURE_OPENAI_MODEL': AZURE_OPENAI_MODEL,
    'AZURE_OPENAI_VERSION': AZURE_OPENAI_VERSION,
    'AZURE_OPENAI_API_KEY': AZURE_OPENAI_API_KEY
}

missing_vars = [var for var, value in required_vars.items() if not value]
if missing_vars:
    print(f"❌ Missing environment variables: {', '.join(missing_vars)}")
    print("Please check your .env file and ensure all required variables are set.")
else:
    print("✅ All environment variables loaded successfully!")

✅ All environment variables loaded successfully!


In [12]:
# Initialize the Azure OpenAI client
client = AzureOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,  # The endpoint URL for the Azure OpenAI service
    api_key=AZURE_OPENAI_API_KEY,          # The API key for the Azure OpenAI service
    api_version=AZURE_OPENAI_VERSION       # The API version to use (should be 2023-05-15 or newer)
)

deployment_name = AZURE_OPENAI_MODEL  # The deployment name of the model to use

print(f"✅ Azure OpenAI client initialized successfully!")
print(f"Using deployment: {deployment_name}")

✅ Azure OpenAI client initialized successfully!
Using deployment: gpt-4.1-mini


## Define Pydantic Models for Basic Structured Output

Let's define a Pydantic model that represents the structure we want for our calendar event extraction. This model will serve as a schema for the structured output.

In [13]:
class CalendarEvent(BaseModel):
    name: str = Field(description="The name of the event")
    date: str = Field(description="The date of the event")
    participants: List[str] = Field(description="List of participants attending the event")

print("✅ CalendarEvent Pydantic model defined successfully!")
print("\nModel schema:")
print(json.dumps(CalendarEvent.model_json_schema(), indent=2))

✅ CalendarEvent Pydantic model defined successfully!

Model schema:
{
  "properties": {
    "name": {
      "description": "The name of the event",
      "title": "Name",
      "type": "string"
    },
    "date": {
      "description": "The date of the event",
      "title": "Date",
      "type": "string"
    },
    "participants": {
      "description": "List of participants attending the event",
      "items": {
        "type": "string"
      },
      "title": "Participants",
      "type": "array"
    }
  },
  "required": [
    "name",
    "date",
    "participants"
  ],
  "title": "CalendarEvent",
  "type": "object"
}


## Implement Basic Structured Output Example

Now let's test our basic structured output with some sample inputs. The model will extract event information and return it in the exact structure we defined.

In [14]:
print("=== Example 1: Basic Structured Output ===")

inputs = [
    "Mike will attend the Chris Rock Concert on 24 Jan 2025",
    "Vijay and Venu are going to a science fair on Friday."
]

for input_text in inputs:
    print(f"\n--- Test ---")
    print(f"Input: {input_text}")
    
    try:
        # Instead of `client.responses.create`,
        # use `client.responses.parse` for structured output
        response = client.responses.parse(
            model=deployment_name,
            temperature=0,
            input=[
                {"role": "developer", "content": "Extract the event information from the provided user input"},
                {"role": "user", "content": input_text},
            ],
            text_format=CalendarEvent  # Pass the Pydantic class to `text_format`
        )

        # If the model refuses to respond, you will get a refusal message
        if (response.output[0].content[0].type == "refusal"):
            print(f"❌ Model refused to respond: {response.output[0].content[0].refusal}")
        else:
            response_json = response.output_parsed
            print(f"\n✅ Extracted Event Information:")
            print(f"   Name: {response_json.name}")
            print(f"   Date: {response_json.date}")
            print(f"   Participants: {', '.join(response_json.participants)}")
            
            print(f"\n📋 Raw JSON Response:")
            print(json.dumps(response_json.model_dump(), indent=2))
    
    except Exception as e:
        print(f"❌ Error getting answer from AI: {e}")
    
    print("-" * 50)

=== Example 1: Basic Structured Output ===

--- Test ---
Input: Mike will attend the Chris Rock Concert on 24 Jan 2025

✅ Extracted Event Information:
   Name: Chris Rock Concert
   Date: 2025-01-24
   Participants: Mike

📋 Raw JSON Response:
{
  "name": "Chris Rock Concert",
  "date": "2025-01-24",
  "participants": [
    "Mike"
  ]
}
--------------------------------------------------

--- Test ---
Input: Vijay and Venu are going to a science fair on Friday.

✅ Extracted Event Information:
   Name: Science Fair
   Date: Friday
   Participants: Vijay, Venu

📋 Raw JSON Response:
{
  "name": "Science Fair",
  "date": "Friday",
  "participants": [
    "Vijay",
    "Venu"
  ]
}
--------------------------------------------------


## Define Enhanced Pydantic Models with Confidence Scoring

With structured output, the model will always try to adhere to the provided schema, which can result in hallucinations if the data in input is insufficient or the data is completely unrelated to the schema.

In most cases, we don't have control over the input data quality. Therefore, it's suggested to force the model to provide a confidence score too.

In [15]:
# Define a confidence model to track the model's certainty
class LLMConfidence(BaseModel):
    confidence: float = Field(description="Confidence level in the prediction. " \
                                    "Highest confidence - when all values are clearly mentioned in the input. " \
                                    "More the assumptions made by the model, lower the confidence. " \
                                    "Value between 0 lowest to 100 highest.")
    confidence_reason: str = Field(description="Reasoning behind the confidence level.")
    assumptions: List[str] = Field(description="List of assumptions made by the model.")

# Enhanced calendar event model with confidence scoring
class CalendarEventWithConfidence(BaseModel):
    name: str = Field(description="The name of the event")
    date: str = Field(description="The date of the event")
    participants: List[str] = Field(description="List of participants attending the event")
    llm_confidence: LLMConfidence = Field(description="Confidence information from the model")

print("✅ Enhanced Pydantic models with confidence scoring defined successfully!")
print("\nCalendarEventWithConfidence schema:")
print(json.dumps(CalendarEventWithConfidence.model_json_schema(), indent=2))

✅ Enhanced Pydantic models with confidence scoring defined successfully!

CalendarEventWithConfidence schema:
{
  "$defs": {
    "LLMConfidence": {
      "properties": {
        "confidence": {
          "description": "Confidence level in the prediction. Highest confidence - when all values are clearly mentioned in the input. More the assumptions made by the model, lower the confidence. Value between 0 lowest to 100 highest.",
          "title": "Confidence",
          "type": "number"
        },
        "confidence_reason": {
          "description": "Reasoning behind the confidence level.",
          "title": "Confidence Reason",
          "type": "string"
        },
        "assumptions": {
          "description": "List of assumptions made by the model.",
          "items": {
            "type": "string"
          },
          "title": "Assumptions",
          "type": "array"
        }
      },
      "required": [
        "confidence",
        "confidence_reason",
        "assumpt

## Implement Structured Output with Confidence Scoring

Now let's test our enhanced model that includes confidence scoring. This helps us understand how certain the model is about its extractions.

In [17]:
print("=== Example 2: Structured Output with Confidence Score ===")

# Test with a variety of inputs to see the confidence scoring in action
test_inputs = [
    "Mike will attend the Chris Rock Concert on 24 Jan 2025",
    "Vijay and Venu are going to a science fair on Friday.",
    "The project deadline is next Monday.",
    "Vijay and Venu are going to a science fair",
    "Build Team is planning a team outing first week of August",
    "My name is Agni. How are you"
]

for input_text in test_inputs:
    print(f"\n--- Test ---")
    print(f"Input: {input_text}")
    
    try:
        # Instead of `client.chat.completions.create`,
        # use `client.responses.parse` for structured output
        response = client.responses.parse(
            model=deployment_name,
            temperature=0,
            input=[
                {"role": "system", "content": "Extract the event information from the provided user input"},
                {"role": "user", "content": input_text},
            ],
            text_format=CalendarEventWithConfidence
        )

        # If the model refuses to respond, you will get a refusal message
        if (response.output[0].content[0].type == "refusal"):
            print(f"❌ Model refused to respond: {response.output[0].content[0].refusal}")
        else:
            response_json = response.output_parsed
            
            print(f"\n✅ Extracted Event Information:")
            print(f"   Name: {response_json.name}")
            print(f"   Date: {response_json.date}")
            print(f"   Participants: {', '.join(response_json.participants)}")
            
            print(f"\n🎯 Confidence Information:")
            print(f"   Confidence: {response_json.llm_confidence.confidence}%")
            print(f"   Reason: {response_json.llm_confidence.confidence_reason}")
            print(f"   Assumptions: {', '.join(response_json.llm_confidence.assumptions) if response_json.llm_confidence.assumptions else 'None'}")
    
    except Exception as e:
        print(f"❌ Error getting answer from AI: {e}")
    
    print("-" * 60)

=== Example 2: Structured Output with Confidence Score ===

--- Test ---
Input: Mike will attend the Chris Rock Concert on 24 Jan 2025

✅ Extracted Event Information:
   Name: Chris Rock Concert
   Date: 2025-01-24
   Participants: Mike

🎯 Confidence Information:
   Confidence: 100.0%
   Reason: The event name, date, and participant are explicitly mentioned in the input without ambiguity.
   Assumptions: None
------------------------------------------------------------

--- Test ---
Input: Vijay and Venu are going to a science fair on Friday.

✅ Extracted Event Information:
   Name: Science Fair
   Date: Friday
   Participants: Vijay, Venu

🎯 Confidence Information:
   Confidence: 90.0%
   Reason: The event name 'science fair' and participants 'Vijay' and 'Venu' are clearly mentioned. The date is given as 'Friday' but the exact date is not specified, so the confidence is slightly reduced.
   Assumptions: The event is named 'Science Fair' based on the description., 'Friday' refers to th