# Structured Output in AI Agents

When building AI agents, one of the most critical challenges is ensuring that the agent's responses are not just accurate, but also structured in a predictable, machine-readable format. This notebook introduces structured output using LangChain and Pydantic models, enabling us to transform free-form AI responses into well-defined data structures.

#### Why structured output matters
Imagine asking an AI agent to extract information about a person from text. Without structured output, we might get a paragraph of text. With structured output, we get a clean JSON object with specific fields like `name`, `email`, and `phone`. This makes it easy to:
- Store data in databases
- Pass information between systems
- Validate data quality
- Build reliable AI applications

In [1]:
from pydantic import BaseModel, Field
from pydantic import ValidationError
from typing import Optional
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
import os

### Pydantic models

Pydantic is a Python library for data validation using type annotations. In the context of AI agents, Pydantic models define the schema (structure) of the output we want from our agent.

Let's start by creating a simple Pydantic model to represent a person's contact information. We will use basic Python types: `str` (string), `int` (integer), `bool` (boolean), and `float`.

In [2]:
# Define a Pydantic model for contact information
class ContactInfo(BaseModel):
    """Contact information for a person."""
    
    # String field for the person's full name
    name: str = Field(description="The full name of the person")
    
    # String field for email address
    email: str = Field(description="The email address of the person")
    
    # String field for phone number
    phone: str = Field(description="The phone number of the person")
    
    # Integer field for age
    age: int = Field(description="The age of the person in years")
    
    # Boolean field to indicate if the person is a premium member
    is_premium: bool = Field(description="Whether the person has premium membership")

# Example: Create an instance of ContactInfo
contact = ContactInfo(
    name="John Smith",
    email="john.smith@example.com",
    phone="+1-555-0123",
    age=35,
    is_premium=True
)

print(contact)
print("\nAs JSON:")
print(contact.model_dump_json(indent=2))

name='John Smith' email='john.smith@example.com' phone='+1-555-0123' age=35 is_premium=True

As JSON:
{
  "name": "John Smith",
  "email": "john.smith@example.com",
  "phone": "+1-555-0123",
  "age": 35,
  "is_premium": true
}


- `BaseModel` from Pydantic is the base class for all Pydantic models
- We define a `ContactInfo` class that inherits from `BaseModel`
- Each field has a type annotation (`str`, `int`, `bool`) that tells Pydantic what type of data to expect
- The `Field()` function adds descriptions that help the AI understand what each field represents
- We create an instance and display it both as a Python object and as JSON

### Using structured output with LangChain
LangChain provides multiple strategies for getting structured output from AI models. The **Response Format strategy** is the most straightforward approach for beginners. It uses the model's native JSON mode to ensure the output matches our schema.

Let's create an AI agent that extracts contact information from natural language text.

In [3]:
# Initialize the language model
# We will use GPT-4 for better structured output support
llm = ChatOpenAI(
    model="gpt-4o-mini-2024-07-18",
    api_key=os.getenv("OPENAI_API_KEY", "").strip(),
    temperature=0  # Set to 0 for more deterministic outputs
)

# Create an agent with structured output
# The response_format parameter tells the agent to use our ContactInfo schema
agent = create_agent(
    model=llm,
    tools=[],  # No tools needed for this simple example
    response_format=ContactInfo  # This automatically uses ProviderStrategy
)

# Test the agent with a sample text
test_text = """
Extract contact info from: John Smith is 35 years old and can be reached at 
john.smith@example.com or by phone at +1-555-0123. He has a premium membership.
"""

result = agent.invoke({
    "messages": [
        {"role": "user", "content": test_text}
    ]
})

print("Extracted Contact Information:")
print(result["structured_response"])

Extracted Contact Information:
name='John Smith' email='john.smith@example.com' phone='+1-555-0123' age=35 is_premium=True


- We initialize a `ChatOpenAI` model with `temperature=0` for consistent outputs
- We create an agent using `create_agent()` and pass our `ContactInfo` model as the `response_format`
- LangChain automatically detects that the model supports native structured output and uses the `ProviderStrategy`
- When we invoke the agent, it returns a structured `ContactInfo` object instead of plain text

### Example - Product information extraction
Let's build a more practical example: extracting product details from e-commerce descriptions.

#### Defining the product schema

In [4]:
class ProductInfo(BaseModel):
    """Information about a product."""
    
    # Product name
    name: str = Field(description="The name of the product")
    
    # Product category
    category: str = Field(description="The category of the product")
    
    # Price as a float
    price: float = Field(description="The price of the product in USD")
    
    # Boolean indicating if the product is in stock
    in_stock: bool = Field(description="Whether the product is currently in stock")
    
    # Optional rating (might not always be present)
    rating: Optional[float] = Field(default=None, description="Product rating from 0 to 5")
    
    # Number of reviews
    review_count: int = Field(default=0, description="Number of customer reviews")

# Example: Create a product instance
product = ProductInfo(
    name="Wireless Bluetooth Headphones",
    category="Electronics",
    price=79.99,
    in_stock=True,
    rating=4.5,
    review_count=1250
)

print(product.model_dump_json(indent=2))

{
  "name": "Wireless Bluetooth Headphones",
  "category": "Electronics",
  "price": 79.99,
  "in_stock": true,
  "rating": 4.5,
  "review_count": 1250
}


- `Optional[float]` allows a field to be either a float or `None`
- `default=None` provides a default value if the field is not provided
- `default=0` sets a default integer value

#### Creating the product extraction agent

In [5]:
# Initialize the model
llm = ChatOpenAI(
    model="gpt-4o-mini-2024-07-18",
    api_key=os.getenv("OPENAI_API_KEY", "").strip(),
    temperature=0
)

# Create agent with ProductInfo schema
product_agent = create_agent(
    model=llm,
    tools=[],
    response_format=ProductInfo
)

# Test with a product description
product_description = """
Wireless Bluetooth Headphones - Premium noise-canceling headphones with 30-hour battery life.
Price: $79.99. Currently in stock. Rated 4.5 stars based on 1,250 customer reviews.
Category: Electronics.
"""

result = product_agent.invoke({
    "messages": [
        {"role": "user", "content": f"Extract product information from: {product_description}"}
    ]
})

product_info = result["structured_response"]

print("Extracted Product Information:")
print(f"Name: {product_info.name}")
print(f"Category: {product_info.category}")
print(f"Price: ${product_info.price}")
print(f"In Stock: {'Yes' if product_info.in_stock else 'No'}")
print(f"Rating: {product_info.rating or 'N/A'}")
print(f"Review Count: {product_info.review_count}")

Extracted Product Information:
Name: Wireless Bluetooth Headphones
Category: Electronics
Price: $79.99
In Stock: Yes
Rating: 4.5
Review Count: 1250


- The agent automatically extracts structured data from unstructured text
- All fields are properly typed (strings, floats, booleans, integers)
- We can access individual fields directly from the result object
- The output is guaranteed to match our schema

### Error handling

#### Understanding Validation Errors
Pydantic automatically validates data against our schema. If the data doesn't match, it raises a `ValidationError`.

In [6]:
# This will raise a validation error because age should be an integer
try:
    invalid_contact = ContactInfo(
        name="Jane Doe",
        email="jane@example.com",
        phone="555-0199",
        age="thirty-five",  # Wrong type! Should be int, not str
        is_premium=True
    )
except ValidationError as e:
    print("Validation Error:")
    print(e)

Validation Error:
1 validation error for ContactInfo
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='thirty-five', input_type=str]
    For further information visit https://errors.pydantic.dev/2.12/v/int_parsing


- Pydantic checks each field's type
- When it finds a mismatch (string instead of integer), it raises a `ValidationError`
- The error message tells you exactly which field failed and why

#### Handling agent errors
When working with AI agents, the model might occasionally produce output that doesn't match our schema. Here is how to handle it:

In [7]:
try:
    # Create agent
    llm = ChatOpenAI(
        model="gpt-4o-mini-2024-07-18",
        api_key=os.getenv("OPENAI_API_KEY", "").strip(),
        temperature=0
    )
    agent = create_agent(
        model=llm,
        tools=[],
        response_format=ContactInfo
    )
    
    # Invoke with ambiguous input
    result = agent.invoke({
        "messages": [
            {"role": "user", "content": "Extract contact info from: John works at Acme Corp"}
        ]
    })
    
    print("Success:", result["structured_response"])
    
except Exception as e:
    print(f"Error occurred: {type(e).__name__}")
    print(f"Details: {e}")
    print("\nTip: Make sure your input contains all required fields!")

Success: name='John' email='' phone='' age=0 is_premium=False


- Always wrap agent invocations in try-except blocks
- Provide clear error messages to help debug issues
- Ensure the input text contains enough information for all required fields
- Consider making some fields `Optional` if they might not always be present

## Best practices

### 1. Start simple
Begin with single-level structures using basic types before moving to complex nested models.

```python
# Good for beginners
class SimpleUser(BaseModel):
    name: str
    email: str
    age: int

# More complex
class ComplexUser(BaseModel):
    personal_info: dict
    addresses: list
    preferences: Optional[dict]
```

### 2. Use descriptive field names and descriptions

```python
# Good: Clear field names and descriptions
class Person(BaseModel):
    full_name: str = Field(description="The person's complete name")
    email_address: str = Field(description="Primary email contact")

# Avoid: Vague names without descriptions
class Person(BaseModel):
    n: str
    e: str
```

### 3. Set appropriate default values

```python
class UserProfile(BaseModel):
    username: str  # Required field
    is_active: bool = True  # Default to True
    login_count: int = 0  # Default to 0
    bio: Optional[str] = None  # Optional field
```

### 4. Use type hints correctly

```python
from typing import Optional

class Product(BaseModel):
    name: str  # Required string
    price: float  # Required float
    discount: Optional[float] = None  # Optional float
    quantity: int = 1  # Required int with default
```

### 5. Test the schemas
Always test the Pydantic models with sample data before using them with AI agents.

```python
# Test your schema
test_data = {
    "name": "Test Product",
    "price": 29.99,
    "quantity": 5
}

try:
    product = Product(**test_data)
    print("Schema validation passed!")
    print(product.model_dump_json(indent=2))
except ValidationError as e:
    print("Schema validation failed:")
    print(e)
```

### 6. Keep temperature low for structured output
When creating agents for structured output, use `temperature=0` or very low values for more consistent results.

```python
# Recommended for structured output
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Avoid high temperature for structured output
# llm = ChatOpenAI(model="gpt-4", temperature=0.9)  # Too random!
```