# LTP#13: pydantic

`pydantic` is a popular data validation and serialization library powered by Python type hints.

What does that mean and what can Pydantic do for us?

## The problem

Many of you might have written a code like this before for a simple data structure:

In [None]:
class Sample:
    def __init__(self, date, parameter, size):
        self.date = date
        self.parameter = parameter
        self.size = size

In [None]:
Sample("now", 0.5, 2000)

Here are some issues with this code:
* Repetitive, tedious *boilerplate* code
* What are *good values* for `date`, `parameter` and `size`?
* How do we control *mutability* of the data?
* What if we later want to write that data to a file/send over a network?
* Many things are undefinded, e.g. the `repr` shown here

## Partial solutions

Some of you might have identified this problem and moved to potential solutions.

### namedtuple

In [None]:
import collections

Sample = collections.namedtuple("Sample", ("date", "parameter", "size"))

In [None]:
Sample(date="now", parameter=0.5, size=2000)

* Reduces boilerplate
* Has e.g. a nice `repr`
* All other problems are unsolved

### dataclasses

In [None]:
import dataclasses
from datetime import datetime


@dataclasses.dataclass
class Sample:
    date: datetime
    parameter: float
    size: int

In [None]:
Sample(date="now", parameter=0.5, size=2000)

* Reduces boilerplate drastically
* Gives you control over mutability
* What about *data validation* and *serialization*?

# pydantic to the rescue!

Here is the same example with a `pydantic` base class:

In [None]:
from pydantic import BaseModel

In [None]:
class Sample(BaseModel):
    date: datetime
    parameter: float
    size: int

In [None]:
Sample(date="now", parameter=0.5, size=2000)

In [None]:
Sample(date="2024-04-19 12:00", parameter=0.5, size=2000)

# Validation in pydantic

Validation logic in `pydantic` is type annotation based. It:
* leverages the type annotations
* looks up the logic it has implemented for those types
* automatically converts to the correct type

If the last bit scares you, there is a `strict` mode.

Additional validation logic that exceeds type annotations is available:

In [None]:
from pydantic import PositiveInt, confloat

In [None]:
class Sample(BaseModel):
    date: datetime
    parameter: confloat(gt=0.0, lt=1.0)
    size: PositiveInt

In [None]:
Sample(date="2024-04-19 12:00", parameter=0.5, size=2000)

## More validation in pydantic

Validation logic can be customized in interesting ways. The following snippet allows to specify the magic string `"now"` for the date and it resolves to a timestamp:

In [None]:
from pydantic import field_validator


class Sample(BaseModel):
    date: datetime
    parameter: confloat(gt=0.0, lt=1.0)
    size: PositiveInt

    @field_validator("date", mode="before")
    def resolve_now(cls, v):
        if v == "now":
            return datetime.now()
        return v

In [None]:
Sample(date="now", parameter=0.5, size=2000)

## Even more validation in pydantic

Sometimes it is better to attach the validation logic to a type though, as it makes it reusable:

In [None]:
from typing_extensions import Annotated
from pydantic.functional_validators import AfterValidator

In [None]:
def check_squares(x: int) -> int:
    assert x**0.5 % 1 == 0, f"{x} is not a square number"
    return x

In [None]:
SquareNumber = Annotated[int, AfterValidator(check_squares)]

In [None]:
class MyModel(BaseModel):
    x: SquareNumber

In [None]:
MyModel(x=4)

## Validating function arguments

What if we are not building models, but our interface consists of functions instead?

In [None]:
from pydantic import validate_call

In [None]:
@validate_call(validate_return=True)
def square_root(x: SquareNumber) -> int:
    return x**0.5

In [None]:
square_root("4")

Note that this contained two implicit conversions:
* `"4"` -> `4`
* `2.0` -> `2`
Judge for yourself and your application (!) whether that is a good or a bad thing.

## Serialization/Deserialization

In [None]:
s = Sample(date="now", parameter=0.5, size=2000)

Assume we have a sample, we can turn it into a dictionary or a JSON string using `pydantic`:

In [None]:
s.model_dump()

In [None]:
s.model_dump_json()

And we can reconstruct an object from those dumps:

In [None]:
Sample(**s.model_dump())

In [None]:
Sample.model_validate_json(s.model_dump_json())

Such functionality is of key importance in the design of file formats and transmission protocols.

## Summary and References

`pydantic` gives you a 
* simple, yet very powerful toolbox
* allows you to write safer code with less bugs
* saves you from a lot of tedious work

### References

* https://docs.pydantic.dev/latest/
* https://github.com/pydantic/pydantic