# Type checking and data validation using Pydantic

**Type checking** verifies that the data types used in a computer program are correct.

**Data validation** is the process of ensuring that data is accurate, complete, and consistent.

**Pydantic** is a Python library that provides a powerful and intuitive way to perform type-checking and data validation. It leverages Python’s type annotations to define and validate data structures, making it easy to ensure that data is consistent and correct.

In [1]:
%pip install pydantic

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
from pydantic import BaseModel, constr, Field, field_validator

## Type checking
Python’s type annotations are a way to hint to the type checker what type of data is expected for a particular variable or function parameter. Pydantic takes this one step further by allowing us to define custom constraints on the given data structures.  
To use Pydantic for type checking, we simply create a Pydantic model class and define the fields we need. The type annotations for the fields will specify the expected types.  
Pydantic makes type checking easier, faster, and more efficient than manual type checking.

In [4]:
class User(BaseModel):
    name: str
    age: int
    email: str

user_data = {
    "name": "Tester1",
    "age": 18,
    "email": "Tester1@example.com",
}

user = User.model_validate(user_data)

print(user.name)
print(user.age)
print(user.email)

Tester1
18
Tester1@example.com


> If we try to create a new User instance with invalid data, Pydantic will raise a `ValidationError` exception.

## Data validation
Pydantic can validate data in a number of ways, including range checking, regular expression matching, uniqueness checking, and custom validation.

## Range checking
Pydantic range checking is a feature that allows us to validate data against a specified range of values. This can be done by using the `Field()` and `constr` class decorators to manage integer value and string length.

The `min_length()` and `max_length()` keyword arguments are used for the `constr()` class decorator to define the range of the string length. The `ge` (greater than) and `le` (less than) keyword arguments are used for the `Field()` class decorator to define the integer value bracket.

- For String length - `constr()` with `min_length()` and `max_length()` keyword arguments
- For Integer - `Field()` with `ge` (greater than) and `le` (less than) keyword arguments

> Through these checks, data gathering becomes convenient and we get clean data in the end.

In [None]:
class User(BaseModel):
    name: str = constr(min_length=3, max_length=20)
    age: int = Field(ge=18, le=68)
    email: str

## Regular expression matching
To use regular expression matching in Pydantic, we can use the `constr()` field type validator. `The constr()` field type validator allows specifying a regular expression pattern that the field value must match using `pattern` keyword argument.  

In [8]:
class CheckEmail(BaseModel):
  email: str = constr(pattern=r'[a-zA-Z0-9._]@([\w-]+\.)+[\w-]{2,4}')

## Uniqueness checking
To check for uniqueness in Pydantic, we can use the `field_validator()` decorator. The `field_validator()` decorator allows us to validate the entire model instance rather than just individual fields.  
> Using uniqueness checking in Pydantic is a great way to ensure that our data is consistent, accurate, and efficient.

In [None]:
class User(BaseModel):
    name: str = Field(unique=True)

    __values__ = {}
    
    def __init__(self, **data):
        super().__init__(**data)
        self.__values__[self.name] = self

    @field_validator("name")
    def validate_unique_name(cls, value, **kwargs):
        if value in cls.__values__:
            raise ValueError("Duplicate names are not allowed")
        return value

def check_for_duplicates(user_data):
    duplicates = []
    for name in user_data:
        try:
            User(name=name)
        except ValueError:
            duplicates.append(name)
    return duplicates

user_data = ["Tester1", "Tester1", "Tester2", "Tester2"]

duplicates = check_for_duplicates(user_data)
if duplicates:
    print("Duplicate names:")
    for name in duplicates:
        print(f"* {name}")
else:
    print("There are no duplicate names.")

## 