In [4]:
from typing import Optional, Any, List, Set
from datetime import datetime
from uuid import uuid4
from pydantic import (
    BaseModel, 
    Field, 
    conlist, 
    UUID4,
    field_validator,
    computed_field,
    model_validator,
    field_serializer,
    model_serializer,
    EmailStr,
    ValidationError,
)

# Motivation

Pydantic is a very powerful data validation and parsing library. With python being a dynamically typed language, it is prone to typing errors at runtime. Python can fail to give you type hints for dictionaries, which can be tedious to manage deeply nested objects. Pydantic also provides simple methods for serialization. It's fast and is the most widely used libraries for data validation. In this tutorial I will walk through all of the basic features you need to get started using pydantic efficiently. In this tutorial We will cover:
  - Basic type validation
  - Pydantic Special Field Types (i.e conlist, UUID4, EmailStr, and Field)
  - Custom Field and Model Validators
  - Handling Nested Models
  - Computed Fields
  - Creating Custom Field and Model Serialization Methods

# Basic Type Validation

In order to perform basic type validation with pydantic we inherit from the BaseModel class. The base model class has everything required to perform basic type validation on your models.

In [5]:
class Employee(BaseModel):
    first_name: str
    last_name: str
    age: Optional[int] = None

In [6]:
emp = Employee(first_name="Rob", last_name="Smith", age='33')
emp1 = Employee(first_name="John", last_name="Doe", age=27)

print(f"{emp.first_name} {emp.last_name} is {emp.age} years old")

Rob Smith is 33 years old


Here we instantiate a Employee class that inherits from BaseModel and we validate the imputs are in fact the correct type. Notice that '33' is not throwing a validation error because despite the fact that it is a string, int can still be parsed into an int. By default Pydantic is configured in 'lax' mode, which does not inforce strict type checking. We would need a type that could not be parsed into an int in order for a validation error to be thrown. For example:

In [7]:
emp2 = Employee(first_name="Rob", last_name="Smith", age='have fun')

ValidationError: 1 validation error for Employee
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='have fun', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/int_parsing

In [8]:
emp3 = Employee.model_validate({"first_name": "joe", "last_name": "smith"}) # just to show that age is optional

Notice that we have to pass our properties via keyword arguments, so if we wanted to create an Employee instance from a dictionary we would have to do something like:
`employee = Employee(**{"first_name": "joe", "last_name": "smith", "age": "23"})`
Sometimes we would want to enforce strict typing to our models. For this Pydantic has a handly model_validate class, which solves both of these issues. Additionally Pydantic has a handy ValidationError class for elegantly catching errors and displaying the corresponding error message.

In [10]:
try:
    emp4 = Employee.model_validate({"first_name": "joe", "last_name": "smith", "age": "23"}, strict=True)
except ValidationError as error:
    for e in error.errors():
        print(f"{e['loc'][0]}: {e['msg']}")


age: Input should be a valid integer


## Additional Field Types

Pydantic provides special field classes that allow for additional validation. 
  - The UUID field ensures that the field is a valid UUID. 
  - EmailStr field ensures that the string is a valid email address (no need for regex patterns)
  - The Field function acts as a wrapper around any type and provides additional functionality to the validation of that type
    - In this example, we use it to wrap around int and ensure that it is greater than 0 and less than 1,000,000. 
    - Alias is very useful in case the object we are reveiving has a different name than we want in its attribute.
      - This is particularly useful for converting Javascript syntax into camel case python syntax
  - The conlist function acts as a wrapper around the python list type to provide some extra validation like the number of elements we expect.

In [18]:
class Contractor(Employee):
    id: UUID4
    email: EmailStr
    salary: int = Field(..., ge=0, le=1_000_000, alias='compensation')
    salary_range: conlist(int, min_length=1, max_length=2) # type: ignore

In [19]:
try:
    contractor = Contractor(id=uuid4(), 
                            email="rob@gmail.com", 
                            salary=50_000,
                            salary_range=[40_000, 60_000])
except ValidationError as error:
    for e in error.errors():
        print(f"{e['loc'][0]}: {e['msg']}")


first_name: Field required
last_name: Field required
compensation: Field required


It's also important to note that you can extend pydantic models just like you can extend any other class and it will inherit it's parent's attributes. Here we created a new class called contractor that will inherit from the Employee class. All of the validation from the employee class will take place in addition to the new Contract validation, hense the validation errors above.

In [24]:
try:
    contractor = Contractor(id=uuid4(), 
                            first_name="James",
                            last_name="Lock",
                            email="rob@gmail.com", 
                            compensation=50_000,
                            salary_range=[40_000, 60_000])
except ValidationError as error:
    for e in error.errors():
        print(f"{e['loc'][0]}: {e['msg']}")
                    

In [25]:
contractor

Contractor(first_name='James', last_name='Lock', age=None, id=UUID('41e54002-ff17-405c-a27d-dc02d1aefd6e'), email='rob@gmail.com', salary=50000, salary_range=[40000, 60000])

Notice the behavior of the alias. We pass in the name of the alias in the init function, but when the object is created it maintains the name of the attribute. Change the compensation keyword argument back to salary and you will see that pydantic will throw a type error; therefore, if you use alias you must pass in the name of the aliase when you create an instance of your model

## Dumping schema and model values

In [26]:
# These functions are fairly self explanatory
print(contractor.model_fields_set)
print(contractor.model_dump()) # return dictionary of your instance
print(contractor.model_dump_json()) # returns json
print(contractor.model_json_schema()) # returns schema of your model

{'salary', 'email', 'id', 'first_name', 'salary_range', 'last_name'}
{'first_name': 'James', 'last_name': 'Lock', 'age': None, 'id': UUID('41e54002-ff17-405c-a27d-dc02d1aefd6e'), 'email': 'rob@gmail.com', 'salary': 50000, 'salary_range': [40000, 60000]}
{"first_name":"James","last_name":"Lock","age":null,"id":"41e54002-ff17-405c-a27d-dc02d1aefd6e","email":"rob@gmail.com","salary":50000,"salary_range":[40000,60000]}
{'properties': {'first_name': {'title': 'First Name', 'type': 'string'}, 'last_name': {'title': 'Last Name', 'type': 'string'}, 'age': {'anyOf': [{'type': 'integer'}, {'type': 'null'}], 'default': None, 'title': 'Age'}, 'id': {'format': 'uuid4', 'title': 'Id', 'type': 'string'}, 'email': {'format': 'email', 'title': 'Email', 'type': 'string'}, 'compensation': {'maximum': 1000000, 'minimum': 0, 'title': 'Compensation', 'type': 'integer'}, 'salary_range': {'items': {'type': 'integer'}, 'maxItems': 2, 'minItems': 1, 'title': 'Salary Range', 'type': 'array'}}, 'required': ['firs

# Custom Field Validators

Sometimes you will want to create your own custom validation function. To do this pydantic provides a useful decorator function called field_validator. To use this you simply pass in the name of the field you would like to validate and the mode you would like to apply validation. The mode is an important parameter to understand, consider the following example:

In [46]:
class Employee(BaseModel):
    first_name: str
    last_name: str
    age: Optional[int] = None

    @field_validator('first_name', mode='before')
    def first_name_contains_space(cls, v: str):
        if " " in v:
            raise ValueError("first name cannot contain a space")
        return v
    
    @field_validator('last_name', mode='after')
    def last_name_contains_space(cls, v: str):
        if not isinstance(v, str):
            raise ValueError("last_name must be a string")
        if " " in v:
            raise ValueError("last name cannot contain a space")
        return v

In [47]:
try:
    e = Employee(
        first_name="Joey Scram",
        last_name=123,
        age=55
    )
except ValidationError as error:
    for e in error.errors():
        print(f"{e['loc'][0]}: {e['msg']}")

first_name: Value error, first name cannot contain a space
last_name: Input should be a valid string


Notice the first validator returned a value error which we specified in our custom validation function. The second validation error did not fire and instead the default validation for the python `str` class fired instead. If we change the mode for last_name to before we will get 

# Custom Class Validators

If we would like to validate an entire class we can do so with pydantics model_validator class, which has similar functionality to the field validator class. In the following example, we validate that the salary is in fact within our specified salary range, and we insure the user passes in a first_name and last_name instead of just name.

In [57]:
class Contractor(Employee):
    id: UUID4
    email: EmailStr
    salary: int = Field(..., ge=0, le=1_000_000, alias='compensation')
    salary_range: conlist(int, min_length=1, max_length=2) # type: ignore

    @model_validator(mode='after')
    def salary_within_range(self):
        if self.salary < self.salary_range[0] or self.salary > self.salary_range[1]:
            raise ValueError("Salary must be within range")
        return self
    
    @model_validator(mode='before')
    @classmethod
    def last_name_in_email(cls, data: Any):
        if "name" in data:
            raise ValueError("please provide a first_name and last_name and not 'name'")
        return data

In [58]:
try:
    contractor = Contractor(id=uuid4(), 
                            # name="James Lock",
                            first_name="James",
                            last_name="Lock",
                            email="rob@gmail.com", 
                            compensation=10_000,
                            salary_range=[40_000, 60_000])
    
except ValidationError as error:
    for e in error.errors():
        print(f"LOC: {e['loc']} TYPE: {e['type']} MSG: {e['msg']}")

LOC: () TYPE: value_error MSG: Value error, Salary must be within range


# Computed Fields

Another wonderful feature of pydantic is we can compute the value of one field from the value of another field leveraging the computed_field decorator. In the example below we do exactly that.

In [59]:
class Contractor(Employee):
    id: UUID4
    email: EmailStr
    salary: int = Field(..., ge=0, le=1_000_000)
    
    @computed_field
    @property
    def salary_range(self) -> List[int]:
        return [self.salary - 10_000, self.salary + 10_000]

In [60]:
contractor = Contractor(id=uuid4(), 
                        name="James Lock",
                        first_name="James",
                        last_name="Lock",
                        email="rob@gmail.com", 
                        salary=30_000)

In [61]:
contractor.salary_range

[20000, 40000]

# Nested Models

Pydantic models can use other pydantic models as field attribute types, and validation will occur throughout the entire nested structure

In [62]:
class Team(BaseModel):
    id: int
    name: str
    logo: str

class Game(BaseModel):
    home_team: Team
    away_team: Team
    start_time: datetime


In [63]:
game = Game(home_team={"id": 1, "name": "Real Madrid", "logo": "path/to/logo"},
            away_team={"id": 2, "name": "Manchester United", "logo": "path/to/logo"},
            start_time=datetime.now())

In [74]:
game.model_dump()

{'home_team': {'id': 1, 'name': 'Real Madrid', 'logo': 'path/to/logo'},
 'away_team': {'id': 2, 'name': 'Manchester United', 'logo': 'path/to/logo'},
 'start_time': datetime.datetime(2024, 4, 22, 12, 31, 32, 722643)}

# Custom Serialization

Another helpful feature in Pydantic is the ability to pull data from one schema and transform it to fit into another schema with custom serialization. In this example we are creating a Game class which has a home team and away team. Both teams use the Team class for their definition. When we serialize the Game class we sort the player_ids for each team in ascending order. We also customize the serialization behavior for when the model is serialized to json

In [68]:
class Team(BaseModel):
    id: int
    name: str
    logo: str
    player_ids: Set[int]

    @field_serializer('player_ids', when_used='always') # perform always
    def sort_players(player_ids: Set[str]):
        return sorted(player_ids)


class Game(BaseModel):
    home_team: Team
    away_team: Team
    start_time: datetime

    @model_serializer(when_used='json') # perform only with json serialization
    def serialize(self):
        return {
            "home_team": self.home_team.name, 
            "away_team": self.away_team.name, 
            "start_time": self.start_time
        }

In [72]:
game = Game(home_team={"id": 1, "name": "Real Madrid", "logo": "path/to/logo", "player_ids": set([2, 9, 4])},
            away_team={"id": 2, "name": "Manchester United", "logo": "path/to/logo", "player_ids": set([7, 3, 1])},
            start_time=datetime.now())

In [73]:
print(game.model_dump())
print(game.model_dump_json())

{'home_team': {'id': 1, 'name': 'Real Madrid', 'logo': 'path/to/logo', 'player_ids': [2, 4, 9]}, 'away_team': {'id': 2, 'name': 'Manchester United', 'logo': 'path/to/logo', 'player_ids': [1, 3, 7]}, 'start_time': datetime.datetime(2024, 4, 24, 19, 32, 12, 21589)}
{"home_team":"Real Madrid","away_team":"Manchester United","start_time":"2024-04-24T19:32:12.021589"}


# Conclusion
You are now fully equiped to use pydantic to validate your data and enjoy explicit type hints to make your python code more robust to runtime errors and a pleasure to work with. We covered the basics for type validation. We leveraged pydantic's special field types to add some of the most common field types with the required validation out of the box. We added custom functionality to our field and model validators while grasping the concept of nested models and model inheritance. Not only can you create custom fields and validators, but you can even create fields that are computed from other fields you have defined in your model. Finally, we can customize the serialization process and we are ready to validate and transform data from any schema. 