# Introduction to Pydantic AI

https://ai.pydantic.dev/

In [1]:
!uv add pydantic-ai

[2K[2mResolved [1m239 packages[0m [2min 2.40s[0m[0m                                       [0m
[2K[37m⠙[0m [2mPreparing packages...[0m (0/52)                                                  
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/52)----------------[0m[0m     0 B/7.47 KiB         [1A
[2K[1A[37m⠙[0m [2mPreparing packages...[0m (0/52)----------------------[0m[0m     0 B/7.47 KiB   [1A
[2mopentelemetry-util-http      [0m [32m[2m------------------------------[0m[0m     0 B/7.47 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/52)----------------------[0m[0m     0 B/32.25 KiB  [2A
[2mopentelemetry-util-http      [0m [32m[2m------------------------------[0m[0m     0 B/7.47 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0m (0/52)----------------------[0m[0m     0 B/32.25 KiB  [2A
[2mopentelemetry-util-http      [0m [32m[2m------------------------------[0m[0m     0 B/7.47 KiB
[2K[2A[37m⠙[0m [2mPreparing packages...[0

## Setting Up the Data and Search

In [1]:
import docs

github_data = docs.read_github_data()
parsed_data = docs.parse_data(github_data)
chunks = docs.chunk_documents(parsed_data)

## Search tool

Create the search index:

In [2]:
from minsearch import Index

index = Index(
    text_fields=["content", "filename", "title", "description"],
)

index.fit(chunks)

<minsearch.minsearch.Index at 0x11bb85fd0>

Define search functions:

In [3]:
from typing import Any, Dict, List, TypedDict

class SearchResult(TypedDict):
    """Represents a single search result entry."""
    start: int
    content: str
    title: str
    description: str
    filename: str

def search(query: str) -> List[SearchResult]:
    """
    Search the index for documents matching the given query.

    Args:
        query (str): The search query string.

    Returns:
        List[SearchResult]: A list of search results. Each result dictionary contains:
            - start (int): The starting position or offset within the source file.
            - content (str): A text excerpt or snippet containing the match.
            - filename (str): The path or name of the source file.
    """
    return index.search(
        query=query,
        num_results=5,
    )

## File Reading tool

Set up the file index for quick access to complete documents:

In [4]:
file_index = {}

for doc in parsed_data:
    filename = doc['filename']
    file_index[filename] = doc

Create the file reading function:

In [5]:
from typing import Optional

def read_file(filename: str) -> Optional[str]:
    """
    Retrieve the content of a file from the repository.

    Args:
        filename (str): The name or path of the file to read.

    Returns:
        Optional[str]: The file content as a string if the file exists;
        otherwise, returns None.
    """
    if filename in file_index:
        return file_index[filename]['content']
    return None

## Agent instructions

In [6]:
instructions = """
You are an assistant that helps improve and generate high-quality documentation for the project.

You have access to the following tools:
- search — Use this to explore topics in depth. Make multiple search calls if needed to gather comprehensive information.
- read_file — Use this when code snippets are missing or when you need to retrieve the full content of a file for context.

Critical Rule

Before generating or finalizing any code example or technical explanation, you must always call `read_file`
to cross-check the correctness of the code.
Do not rely solely on search results or assumptions — always verify by reading the actual file content.

If `read_file` cannot be used or the file content is unavailable, clearly state:
> "Unable to verify with read_file."

When answering a question:
1. Provide file references for all source materials.  
   Use this format:  
   [{filename}](https://github.com/evidentlyai/docs/blob/main/{filename})
2. If the topic is covered in multiple documents, cite all relevant sources.
3. Include code examples whenever they clarify or demonstrate the concept.
4. Be concise, accurate, and helpful — focus on clarity and usability for developers.
5. If documentation is missing or unclear, infer from context and note that explicitly.

Example Citation

See the full implementation in [metrics/api_reference.md](https://github.com/evidentlyai/docs/blob/main/metrics/api_reference.md).
""".strip()

## Creating the PydanticAI Agent

In [7]:
agent_tools = [search, read_file]

from pydantic_ai import Agent

agent = Agent(
    name='docs_agent',
    instructions=instructions,
    tools=agent_tools,
    model='gpt-4o-mini',
)

### Running Individual Queries

In [9]:
results = await agent.run(
    user_prompt="how do I run llm as a judge evals?",
)

In [10]:
for message in results.new_messages():
    print(message.kind)

    for part in message.parts:
        print(part.part_kind)

    print()

request
user-prompt

response
tool-call

request
tool-return

response
tool-call

request
tool-return

response
text



In [11]:
print(results.output)

To run an LLM (Large Language Model) as a judge for evaluations (evals), you can follow these steps based on the tutorial from Evidently AI. The process involves setting up evaluation criteria, generating datasets, and then using the LLM to assess the quality of responses. Here’s a streamlined overview of the steps:

### 1. Prerequisites
- **Python Knowledge**: Basic understanding of Python is necessary.
- **OpenAI API Key**: Ensure you have an API key to access the LLM.

### 2. Installation
First, you'll need to install the Evidently library:
```bash
pip install evidently
```

### 3. Import Required Libraries
In your Python script or notebook, import the necessary modules:
```python
import pandas as pd
import numpy as np
from evidently import Dataset
from evidently import DataDefinition
from evidently import Report
from evidently.descriptors import *
from evidently.llm.templates import BinaryClassificationPromptTemplate
import os

# Set up the OpenAI API key
os.environ["OPENAI_API_KEY

You can also access the complete message history:

In [12]:
results.all_messages()

[ModelRequest(parts=[UserPromptPart(content='how do I run llm as a judge evals?', timestamp=datetime.datetime(2025, 10, 18, 17, 7, 36, 184965, tzinfo=datetime.timezone.utc))], instructions='You are an assistant that helps improve and generate high-quality documentation for the project.\n\nYou have access to the following tools:\n- search — Use this to explore topics in depth. Make multiple search calls if needed to gather comprehensive information.\n- read_file — Use this when code snippets are missing or when you need to retrieve the full content of a file for context.\n\nCritical Rule\n\nBefore generating or finalizing any code example or technical explanation, you must always call `read_file`\nto cross-check the correctness of the code.\nDo not rely solely on search results or assumptions — always verify by reading the actual file content.\n\nIf `read_file` cannot be used or the file content is unavailable, clearly state:\n> "Unable to verify with read_file."\n\nWhen answering a que

In [13]:
results = await agent.run(
    user_prompt="show me a complete example for llm as a judge reports",
    message_history=results.all_messages()
)

In [15]:
print(results.output)

Here’s a complete example of how to run an LLM as a judge for evaluations, using the Evidently library. This example follows the steps outlined in the tutorial, allowing you to evaluate responses and generate reports. 

### Step-by-Step Example

#### 1. Installation

Make sure to install the Evidently library:
```bash
pip install evidently
```

#### 2. Imports

Import the necessary modules:
```python
import pandas as pd
import numpy as np
from evidently import Dataset, DataDefinition, Report
from evidently.descriptors import LLMEval, ExactMatch
from evidently.llm.templates import BinaryClassificationPromptTemplate
import os

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "YOUR_KEY"
```

#### 3. Create the Evaluation Dataset

Create a toy Q&A dataset:
```python
data = [
    ["How can I reset my password?", 
     "To reset your password, click 'Forgot Password' on the login page.", 
     "To reset my password, click 'Forgot Password' on the login screen.", 
     "correct", ""],

### Usage and Cost Tracking

Check token usage:

In [16]:
results.usage()

RunUsage(input_tokens=21911, cache_read_tokens=17024, output_tokens=956, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, requests=2, tool_calls=1)

Calculate the cost using ToyAIKit's pricing utilities:

In [17]:
from toyaikit.pricing import PricingConfig
pricing = PricingConfig()

usage = results.usage()

pricing.calculate_cost(
    model=agent.model.model_name,
    input_tokens=usage.input_tokens,
    output_tokens=usage.output_tokens
)

CostInfo(input_cost=0.00328665, output_cost=0.0005736, total_cost=0.00386025)

### Interactive Chat Interface

In [8]:
from toyaikit.chat import IPythonChatInterface
from toyaikit.chat.runners import PydanticAIRunner

chat_interface = IPythonChatInterface()
runner = PydanticAIRunner(
    chat_interface=chat_interface,
    agent=agent
)

In [9]:
await runner.run();

You: how do I run llm as a judge evals?


You: show me a complete example for llm as a judge reports


You: what are drift thresholds and how do I configure them?


You: stop


Chat ended.
