Structured responses from LLM APIs and tool calling are the two key use cases for Pydantic in the context of LLMs. 

Tool calling (or function calling) is a way of connecting an external LLM to other APIs - a piece of functionality we may give the model access to in order to generate a response to a prompt. 

We could give the model access to tools that:

    - Get today's weather for a location (external API)
    - Access account details for a given user ID
    - Issue refunds for a lost order

or any other source that we would like the model to pull additional context information from when responding to a prompt.

The way this process works is illustrated on the diagram below:

![AI tool use process](function-calling-diagram-steps.png)

When tools are available, the model examines a prompt, and then may determine that in order to follow the instructions in the prompt, it needs to call one of the tools made available to it.

When calling these tools (making an API call) data needs to be in the correct, valid format.

In [3]:
# Import packages
from pydantic import BaseModel, Field, EmailStr, field_validator
from pydantic_ai import Agent
from typing import Literal, List, Optional
from datetime import datetime, date
import json
from openai import OpenAI
import anthropic
import instructor
from dotenv import load_dotenv
load_dotenv()
import nest_asyncio
nest_asyncio.apply()

Define a Pydantic model using the typing library and a special field_validator function to check the format of the 'order_id' field using regular expressions.

In [4]:
# Define your UserInput model
class UserInput(BaseModel):
    name: str = Field(..., description="User's name")
    email: EmailStr = Field(..., description="User's email address")
    query: str = Field(..., description="User's query")
    order_id: Optional[str] = Field(
        None,
        description="Order ID if available (format: ABC-12345)"
    )
    # Validate order_id format (e.g., ABC-12345)
    @field_validator("order_id")
    def validate_order_id(cls, order_id):
        import re
        if order_id is None:
            return order_id
        pattern = r"^[A-Z]{3}-\d{5}$"
        if not re.match(pattern, order_id):
            raise ValueError(
                "order_id must be in format ABC-12345 "
                "(3 uppercase letters, dash, 5 digits)"
            )
        return order_id
    purchase_date: Optional[date] = None


Define a CustomerQuery Pydantic model that inherits from the UserInput model - ie. it contains all the fields in the UserInput model plus some new ones.

In [5]:
# Define your CustomerQuery model
class CustomerQuery(UserInput):
    priority: str = Field(
        ..., description="Priority level: low, medium, high"
    )
    category: Literal[
        'refund_request', 'information_request', 'other'
    ] = Field(..., description="Query category")
    is_complaint: bool = Field(
        ..., description="Whether this is a complaint"
    )
    tags: List[str] = Field(..., description="Relevant keyword tags")

Data validation steps in a project where user input (question, comment, feedback) is passed to an LLM given access to tools.
![Validation flow](diagram_1.png) 

In [6]:
# Define a function to validate user input (1st step in the diagram above) - no LLMs involved...
def validate_user_input(user_json: str):
    """Validate user input from a JSON string and return a UserInput 
    instance if valid."""
    try:
        user_input = (
            UserInput.model_validate_json(user_json)
        )
        print("user input validated...")
        return user_input
    except Exception as e:
        print(f" Unexpected error: {e}")
        return None

In [7]:
# Define a function to call an LLM using Pydantic AI to create an instance of CustomerQuery. It takes user input in valid JSON format as input. It outputs an instance of CustomerQuery.
def create_customer_query(valid_user_json: str) -> CustomerQuery:
    customer_query_agent = Agent(
        model="google-gla:gemini-2.0-flash",
        output_type=CustomerQuery,
    )
    response = customer_query_agent.run_sync(valid_user_json)
    print("CustomerQuery generated...")
    return response.output

Try out user input validation on sample data

In [8]:
# Define user input JSON data
user_input_json = '''
{
    "name": "Aga Mucha",
    "email": "aga.mucha@interia.pl",
    "query": "When can I expect delivery of the headphones I ordered?",
    "order_id": "ABC-12345",
    "purchase_date": "2025-02-01"
}
'''
# Validate user input and create a CustomerQuery
valid_data = validate_user_input(user_input_json).model_dump_json()
customer_query = create_customer_query(valid_data)
print(type(customer_query))
print(customer_query.model_dump_json(indent=2))

user input validated...
CustomerQuery generated...
<class '__main__.CustomerQuery'>
{
  "name": "Aga Mucha",
  "email": "aga.mucha@interia.pl",
  "query": "When can I expect delivery of the headphones I ordered?",
  "order_id": "ABC-12345",
  "purchase_date": "2025-02-01",
  "priority": "medium",
  "category": "information_request",
  "is_complaint": false,
  "tags": [
    "delivery",
    "headphones"
  ]
}


I'll be using a Q&A search tool. It will take two arguments: query and tags (a list of tags). I'll define a Pydantic model representing the expected format of the data.

In [None]:
# Define FAQ Lookup tool input as a Pydantic model
class FAQLookupArgs(BaseModel):
    query: str = Field(..., description="User's query") 
    tags: List[str] = Field(
        ..., description="Relevant keyword tags from the customer query"
    )

I'll use a second tool that checks the status of an order.

In [None]:
# Define Check Order Status tool input as a Pydantic model
class CheckOrderStatusArgs(BaseModel):
    order_id: str = Field(
        ..., description="Customer's order ID (format: ABC-12345)"
    )
    email: EmailStr = Field(..., description="Customer's email address")

    @field_validator("order_id")
    def validate_order_id(order_id):
        import re
        pattern = r"^[A-Z]{3}-\d{5}$"
        if not re.match(pattern, order_id):
            raise ValueError(
                "order_id must be in format ABC-12345 "
                "(3 uppercase letters, dash, 5 digits)"
            )
        return order_id

In [None]:
# Create a sample Q&A list with the question, answer and keywords fields
sample_faq = [
    {
        "question": "What is the return policy?",
        "answer": (
            "You can return most items within 30 days of delivery "
            "for a full refund."
        ),
        "keywords": ["return", "refund", "policy"]
    },
    {
        "question": "How long does shipping take?",
        "answer": (
            "Standard shipping usually takes 5-7 business days. "
            "Expedited options are available at checkout."
        ),
        "keywords": ["shipping", "delivery", "time"]
    },
    {
        "question": "How can I track my order?",
        "answer": (
            "Once your order ships, you'll receive an email with "
            "tracking information."
        ),
        "keywords": ["track", "order", "tracking"]
    },
    {
        "question": "Can I change my shipping address?",
        "answer": (
            "You can change your shipping address before your order "
            "ships by contacting customer support."
        ),
        "keywords": ["change", "shipping", "address"]
    },
    {
        "question": "What payment methods do you accept?",
        "answer": (
            "We accept Visa, MasterCard, American Express, PayPal, "
            "and Apple Pay."
        ),
        "keywords": ["payment", "methods", "accept"]
    }
]

In [None]:
# Create a sample order database with order_id as key and status, purchase_date, estimated_delivery, email as values
sample_orders = {
    "ABC-12345": {
        "status": "Shipped",
        "purchase_date": "2025-02-01",
        "estimated_delivery": "2025-02-08",
        "email": "aga.mucha@interia.pl"
    },
    "DEF-67890": {
        "status": "Processing",
        "purchase_date": "2025-02-10",
        "estimated_delivery": "2025-02-15",
        "email": "john.doe@example.com"
    },
    "GHI-54321": {
        "status": "Delivered",
        "purchase_date": "2025-01-25",
        "estimated_delivery": "2025-01-30",
        "email": "jane.smith@example.com"
    },
    "JKL-09876": {
        "status": "Cancelled",
        "purchase_date": "2025-02-05",
        "estimated_delivery": "2025-02-12",
        "email": "bob.johnson@example.com"
    }
}


In [None]:
# Define a Q&A search tool function
# The function takes FAQLookupArgs as input (query string and tags list of strings) and returns the best matching FAQ answer as a string.
def lookup_faq_answer(args: FAQLookupArgs) -> str:
    """Look up an FAQ answer by matching tags and words in query 
    to FAQ entry keywords."""
    #I will use a simple keyword matching approach for this example.
    # query_words, tag_set, keywords are sets of lowercase words. The score will be calculated based on the length of their intersection (elements in common).
    query_words = set(word.lower() for word in args.query.split())
    tag_set = set(tag.lower() for tag in args.tags)
    best_match = None
    best_score = 0
    for faq in faq_db:
        keywords = set(k.lower() for k in faq["keywords"])
        score = len(keywords & tag_set) + len(keywords & query_words)
        if score > best_score:
            best_score = score
            best_match = faq
    if best_match and best_score > 0:
        return best_match["answer"]
    return "Sorry, I couldn't find an FAQ answer for your question."

In [None]:
# Define your check order status tool
# The function takes CheckOrderStatusArgs as input (order_id string and email string) and returns a dictionary with order status information.
def check_order_status(args: CheckOrderStatusArgs):
    """Simulate checking the status of a customer's order by 
    order_id and email."""
    order = order_db.get(args.order_id)
    if not order:
        return {
            "order_id": args.order_id,
            "status": "not found",
            "estimated_delivery": None,
            "note": "order_id not found"
        }
    if args.email.lower() != order.get("email", "").lower():
        return {
            "order_id": args.order_id,
            "status": order["status"],
            "estimated_delivery": order["estimated_delivery"],
            "note": "order_id found but email mismatch"
        }
    return {
        "order_id": args.order_id,
        "status": order["status"],
        "estimated_delivery": order["estimated_delivery"],
        "note": "order_id and email match"
    }

In this step, I'll define the tools for the OpenAi tool API call
The documentation for the tool calling API is [here](https://platform.openai.com/docs/guides/function-calling?lang=python)