Why Use Pydantic for LLM and GenAI?
When working with LLMs, we deal with unstructured data (e.g., text responses, JSON API calls). Pydantic helps by:

Validating API Requests & Responses – Ensures that input and output conform to a structured format.
Defining Typed Data Models – Helps define clear schemas for prompts, completions, and metadata.
Ensuring Consistency in AI Pipelines – Reduces errors in data exchange between LLMs and downstream applications.
Efficient Parsing & Serialization – Converts between Python objects and JSON efficiently.


In [1]:
from pydantic import BaseModel

In [4]:
class get_data(BaseModel):
  name: str
  age: int
  gender: str

In [6]:
data = {"name": "Prakash", "age": 30, "gender": "m"}

In [8]:
user = get_data(**data)

In [9]:
user

get_data(name='Prakash', age=30, gender='m')

In [10]:
user.name

'Prakash'

In [11]:
print(user)

name='Prakash' age=30 gender='m'


2. Using Pydantic in LLM and **GenAI**

# Example 1: Validating LLM Requests
When sending prompts to an LLM, ensure they meet required constraints.

In [12]:
from pydantic import BaseModel, Field

In [13]:
class LLMRequest(BaseModel):
    prompt: str = Field(..., min_length=5, description="Input text for the model")
    max_tokens: int = Field(..., ge=10, le=500, description="Limit response length")

In [26]:
data={"prompt": "What is LLM?", "max_tokens": "100"}

In [27]:
LLMRequest(**data)

LLMRequest(prompt='What is LLM?', max_tokens=100)

# Example 2: Parsing LLM Responses

After getting a response from an LLM API, validate the output before using it.

In [28]:
from pydantic import BaseModel

In [30]:
class LLMResponse(BaseModel):
    text: str
    model: str
    token_usage: int

In [33]:
# Simulated response from an LLM API
api_response = {"text": "Hello, how can I help you?", "model": "gpt-4", "token_usage": 32}
parsed_response = LLMResponse(**api_response)
print(parsed_response)

text='Hello, how can I help you?' model='gpt-4' token_usage=32


# 3. Using Pydantic with Generative AI Pipelines

In [40]:
from pydantic import BaseModel
from typing import List


In [41]:
class Document(BaseModel):
  title: str
  content: str

In [43]:
class SummarizationRequest(BaseModel):
  documents: List[Document]
  summary_length: int

In [44]:
data = {
    "documents": [
        {"title": "AI", "content": "Artificial Intelligence is transforming industries."},
        {"title": "ML", "content": "Machine learning is a subset of AI focused on algorithms."}
    ],
    "summary_length": 50
}


In [46]:
summary_request = SummarizationRequest(**data)
print(summary_request)

documents=[Document(title='AI', content='Artificial Intelligence is transforming industries.'), Document(title='ML', content='Machine learning is a subset of AI focused on algorithms.')] summary_length=50
