# Direct API Integration: Using Pydantic Schemas with LLM Providers

This notebook demonstrates using Pydantic models directly in API calls to various LLM providers to receive structured output. Instead of manually implementing retry mechanisms, modern frameworks can handle validation automatically when you pass Pydantic models directly to the API.

Key concepts demonstrated:
- Use Pydantic models directly in LLM API calls
- Receive properly structured responses using different frameworks and providers
- Compare approaches across OpenAI, Anthropic, and other providers

---

### Import libraries and set up environment

In [1]:
# Import packages
from pydantic import BaseModel, Field, EmailStr
from typing import List, Literal, Optional
from openai import OpenAI
import instructor
import anthropic
from dotenv import load_dotenv
from datetime import date

### Define Pydantic models for user input and LLM output

In [2]:
# Define the UserInput model for customer support queries
class UserInput(BaseModel):
    name: str
    email: EmailStr
    query: str
    order_id: Optional[int] = Field(
        None,
        description="5-digit order number (cannot start with 0)",
        ge=10000,
        le=99999
    )
    purchase_date: Optional[date] = None

# Define the CustomerQuery model that inherits from UserInput
class CustomerQuery(UserInput):
    priority: str = Field(
        ..., description="Priority level: low, medium, high"
    )
    category: Literal[
        'refund_request', 'information_request', 'other'
    ] = Field(..., description="Query category")
    is_complaint: bool = Field(
        ..., description="Whether this is a complaint"
    )
    tags: List[str] = Field(..., description="Relevant keyword tags")

### Sample input data and model validation

In [None]:
# Define input data as a JSON string
user_input_json = '''{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I ordered a new computer monitor and it arrived with the screen cracked. This is the second time this has happened. I need a replacement ASAP.",
    "order_number": 12345,
    "purchase_date": "2025-12-31"
}'''

In [None]:
# Validate the input data by creating a UserInput instance
user_input = UserInput.model_validate_json(user_input_json)

### Anthropic API with Instructor for structured output

In [5]:
prompt = (
    f"Analyze the following customer query {user_input} "
    f"and provide a structured response."
)

In [None]:
# Load environment variables
load_dotenv()
# Use Anthropic with Instructor to get structured output
anthropic_client = instructor.from_anthropic(
    anthropic.Anthropic()
)

response = anthropic_client.messages.create(
    model="claude-3-7-sonnet-latest",  
    max_tokens=1024,
    messages=[
        {
            "role": "user", 
            "content": prompt
        }
    ],
    response_model=CustomerQuery  
)

In [8]:
# Inspect the returned structured data
print(type(response))
print(response.model_dump_json(indent=2))

<class '__main__.CustomerQuery'>
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I ordered a new computer monitor and it arrived with the screen cracked. This is the second time this has happened. I need a replacement ASAP.",
  "order_id": null,
  "purchase_date": "2025-12-31",
  "priority": "high",
  "category": "refund_request",
  "is_complaint": true,
  "tags": [
    "damaged product",
    "monitor",
    "replacement",
    "repeated issue"
  ]
}


### OpenAI's structured output API with Pydantic schema

In [None]:
# Initialize OpenAI client and call with CustomerQuery schema
openai_client = OpenAI()
response = openai_client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}],
    response_format=CustomerQuery
)
response_content = response.choices[0].message.content
print(type(response_content))
print(response_content)

### Advanced usage examples and API exploration

In [None]:
# Validate the response from the LLM
valid_data = CustomerQuery.model_validate_json(
    response_content
)
print(type(valid_data))
print(valid_data.model_dump_json(indent=2))

In [None]:
# Test the responses API from OpenAI
response = openai_client.responses.parse(
    model="gpt-4o",
    input=[{"role": "user", "content": prompt}],
    text_format=CustomerQuery
)

print(type(response))

In [None]:
# Explore class inheritance structure of the OpenAI response
def print_class_inheritence(llm_response):
    for cls in type(llm_response).mro():
        print(f"{cls.__module__}.{cls.__name__}")

print_class_inheritence(response)

In [None]:
# Display the response type and content 
print(type(response.output_parsed))
print(response.output_parsed.model_dump_json(indent=2))

In [None]:
# Test the Pydantic AI package for agent-based structured responses
from pydantic_ai import Agent
import nest_asyncio
nest_asyncio.apply()

agent = Agent(
    model="google-gla:gemini-2.0-flash",
    output_type=CustomerQuery,
)

response = agent.run_sync(prompt)

In [None]:
# Display the response type and content
print(type(response.output))
print(response.output.model_dump_json(indent=2))

---

## Summary

This notebook demonstrates how to use Pydantic models to extract structured, validated output directly from LLMs using various API providers including OpenAI and Anthropic. By defining expected output schemas with Pydantic and passing them directly to API calls, you can eliminate manual parsing and validation code while receiving reliable, well-formed responses in a single API call. This approach enables focus on designing clear data models and prompts, making code more maintainable and robust.