# Pydantic Basics

In this notebook, we will explore the fundamentals of Pydantic models for data validation using a customer support system as your example application. You'll see how to define data models, validate user input, and handle validation errors gracefully.

By the end of this notebook, you'll be able to:
- Create Pydantic models to validate user input data
- Handle validation errors with proper error handling
- Use optional fields and field constraints in your models
- Work with JSON data validation methods

---

In [4]:
# Import libraries needed
from pydantic import BaseModel, ValidationError, EmailStr
import json

##### BaseModel - Everything starts with inheriting BaseModel into the class. This will define the class that this is pydantic datatype
##### ValidationError - We are going to use try and except clause, wherever pydantic validationError we are going to assert
##### EmailStr - This is special function from pydantic to verify the string is email or not. This is Optional, you can define your own regex pattern to match the email address pattern if needed.

## P.S: Please Make sure you install all the libraries in requirements.txt with "pip install -r requirements.txt"

### Define a UserInput Pydantic model and populate it with data

In [6]:
# Create a Pydantic model for validating user input
class UserInput(BaseModel):
    name: str
    email: EmailStr
    query: str

In [7]:
# Fill in the field required to create a model instance like below.
user_input = UserInput(
    name="Joe User", 
    email="joe.user@example.com", 
    query="I forgot my password."
)
print(user_input)

name='Joe User' email='joe.user@example.com' query='I forgot my password.'


### Note: the following cell will produce a validation error. You can correct the error by following along with the video, or just proceed with the rest of the notebook as cells below do not depend on this cell. 

In [None]:
# Attempt to create another model instance with an invalid email. Don't panic this will throw errors with Validation error. We are going to nicely catch this error in next code cell.
user_input = UserInput(
    name="Joe User", 
    email="not-an-email", 
    query="I forgot my password."
)
print(user_input)

ValidationError: 1 validation error for UserInput
email
  value is not a valid email address: An email address must have an @-sign. [type=value_error, input_value='not-an-email', input_type=str]

### Define a function for error handling and try different inputs

In [10]:
# Define a function to handle user input validation safely. So that you don't see big ugly error messages all over when Validation error occurs.
def validate_user_input(input_data):
    try:
        # Attempt to create a UserInput model instance from user input data
        user_input = UserInput(**input_data)
        print(f"✅ Valid user input created:")
        print(f"{user_input.model_dump_json(indent=2)}")
        return user_input
    except ValidationError as e:
        # Capture and display validation errors in a readable format
        print(f"❌ Validation error occurred:")
        for error in e.errors():
            print(f"  - {error['loc'][0]}: {error['msg']}")
        return None

##### Ahhh, Did you notice the **input_data in above validate_user_input function ? A python dictionary is a built-in data structure that stroes data in key-value pairs. Since our pydantic is also data structure now expecting some fields, our input_data is a python dictionary, giving **input_data will unpack this dictionary nicely into named arguments and this will be fit to pydantic model fields.

In [11]:
# Create an instance of UserInput using validate_user_input() function
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com",
    "query": "I forgot my password."
}

user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password."
}


In [None]:
# Attempt to create an instance of UserInput with missing query field. See no big error message while validation error occurs
input_data = {
    "name": "Joe User", 
    "email": "joe.user@example.com"
}

user_input = validate_user_input(input_data)

❌ Validation error occurred:
  - query: Field required


### Update your UserInput data model with additional fields and experiment with different input data

In [13]:
# Import additional libraries for enhanced validation
from pydantic import Field
from typing import Optional
from datetime import date

# Define a new UserInput model with optional fields
class UserInput(BaseModel):
    name: str
    email: EmailStr
    query: str
    order_id: Optional[int] = Field(
        None,
        description="5-digit order number (cannot start with 0)",
        ge=10000,
        le=99999
    )
    purchase_date: Optional[date] = None

##### Let me try to explain the above code. We imported new items Field and Optional.
##### Field - This is from pydantic module, this will give you an option to create field with default value if any, description if any and you can put some constraints like above. For example - Our Field order_id is an integer field, you can define the default value is None, description as you like it and contraints like ge(greather than or equal) and le (less than or equal). So here customer can able to give the integer from 10000 to 99999, anything above or below this number considered invalid entry and pydantic will throw Validation Error.
##### Optional - this is from typing module, really useful one going forward. By default Pydantic class needs all the fields that is defined, if any of them is missing then pydantic doesn't like so it will throw an error. This Optional tells that this field probably pydantic give an exception sometimes. So pydantic fill the field order_id with "None" when customer doesn't give any value.

#### Analogy - Assume pydantic classes are like your knowledge. It can be expanded but cannot shrink it down. Similarly pydantic needs all fields which is mentioned in the model and you can add new fields to it if required but you cannot ignore the existing base model field.

In [14]:
# Define a dictionary with required fields only
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I forgot my password."
}

# Validate the user input data
user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I forgot my password.",
  "order_id": null,
  "purchase_date": null
}


##### Printing user_input, you don't see much going on with pydantic, because this is pythonic way of showing the pydantic classes.

In [15]:
print(user_input)

name='Joe User' email='joe.user@example.com' query='I forgot my password.' order_id=None purchase_date=None


In [16]:
# Define a dictionary with all fields including optional ones
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": date(2025, 12, 31)
}

# Validate the user input data
user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [17]:
# Define a dictionary with all fields and including additional ones
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": date(2025, 12, 31),
    "system_message": "logging status regarding order processing...",
    "iteration": 1 
}

# Validate the user input data
user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


##### Okay, I told you before that pydantic class is like your knowledge can be expanded. Here even though you give extra fields system_message and iteration, pydantic class will not take it. Because this is not matching with pydantic user_input base model data structure. So any fields not mentioned in the class will be ignored. BUT BUT BUT... if you really want to add the extra fields dynamically there is a way to do it. Keep going, I will show it in sometime.

##### For now you can assume it like this, pydantic tries to match it's data structure, if it matches it picks only those remaining will be ignored. This is one of the key features of pydantic makes it shining with LLM. Because LLM talks a lot and gives a lot of information, pydantic is crucial to filter and pick only what is required.

In [18]:
print(user_input)

name='Joe User' email='joe.user@example.com' query='I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.' order_id=12345 purchase_date=datetime.date(2025, 12, 31)


In [19]:
# Create an instance of UserInput with valid data
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}

user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


##### Do you think this is the same input_data ? WRONG... Because purchase_date field in the above user_input is given as string. In below order_id is also sent as string in input_data dictionary. But Pydantic doesn't complain about it, why? Because this is called "Data COERCION". Pydantic supports this to automatically convert your string to datetime format or your string type of order_id into integer data format. Check for pydantic documentation for more data formatting supports...

In [20]:
# Define order_id as a string
input_data = {
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": "12345",
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a laptop carrying case and it turned out to be \n             the wrong size. I need to return it.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [21]:
# Define name field as an integer
input_data = {
    "name": 99999,
    "email": "joe.user@example.com",
    "query": f"""I bought a laptop carrying case and it turned out to be 
             the wrong size. I need to return it.""",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}

# Validate the user input data
user_input = validate_user_input(input_data)

❌ Validation error occurred:
  - name: Input should be a valid string


##### Don't ask me for data coercion here, because integer cannot be a string so it is validation error AND Pydantic is RIGHT !!!!

### Try starting with JSON data as input

In [22]:
# Define user input as JSON data
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "I bought a keyboard and mouse and was overcharged.",
    "order_id": 12345,
    "purchase_date": "2025-12-31"
}
'''

# Parse the JSON string into a Python dictionary
input_data = json.loads(json_data)
print("Parsed JSON:", input_data)

Parsed JSON: {'name': 'Joe User', 'email': 'joe.user@example.com', 'query': 'I bought a keyboard and mouse and was overcharged.', 'order_id': 12345, 'purchase_date': '2025-12-31'}


In [23]:
# Validate the user iput data
user_input = validate_user_input(input_data)

✅ Valid user input created:
{
  "name": "Joe User",
  "email": "joe.user@example.com",
  "query": "I bought a keyboard and mouse and was overcharged.",
  "order_id": 12345,
  "purchase_date": "2025-12-31"
}


In [24]:
# Try different JSON input
json_data = '''
{
    "name": "Joe User",
    "email": "joe.user@example.com",
    "query": "My account has been locked for some reason.",
    "order_id": "01234",
    "purchase_date": "2025-12-31"
}
'''

# Parse the JSON into a Python dictionary
input_data = json.loads(json_data)
print("Parsed JSON:", input_data)

Parsed JSON: {'name': 'Joe User', 'email': 'joe.user@example.com', 'query': 'My account has been locked for some reason.', 'order_id': '01234', 'purchase_date': '2025-12-31'}


In [25]:
# Validate the customer support data from JSON with non-standard formats
user_input = validate_user_input(input_data)

❌ Validation error occurred:
  - order_id: Input should be greater than or equal to 10000


##### So far we have seen, creating the python dictionary or JSON and we wrote our custom validate_user_input function to validate the instance. Instead Pydantic module itself gives us a method "model_validate_json" which can be directly used to validate json or dict.

### Try the `model_validate_json` method

### Note: the following cell will produce a validation error. You can correct the error by following along with the video. 

In [26]:
# Parse JSON and validate user input data in one step using model_validate_json method
user_input = UserInput.model_validate_json(json_data)
print(user_input.model_dump_json(indent=2))

ValidationError: 1 validation error for UserInput
order_id
  Input should be greater than or equal to 10000 [type=greater_than_equal, input_value='01234', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/greater_than_equal

---

## Conclusion

You learned how to use Pydantic models to validate user input for a customer support scenario. By defining clear data models and handling validation errors, you can ensure your code only works with well-formed data. This approach helps you build more robust and reliable applications, and sets the stage for more advanced validation and structured output in future lessons.