Advanced Python Crash Course 03: Pydantic
=========================================

# Motivation

## Once upon a time, before the age of Pydantic, developers had to make a choice:

Either handle huge enterprise JSON payloads in their raw form,

Or implement their own abstractions and validations classes.

They usually chose the first option.

This means that when making a fix or a change in some shitty old REST controller, you would first need to wrap your head around code like:

In [None]:
if body.get('products', {}).get('BezeqTatYami', {}).get('provisioned', False) \
    and 'BezeqTatYami' in body.get('users', {}).get(user_from_url, {}).get('products', []):
    ...  # 🤮

Or, implement the fluff yourself, which was rather awkward:

In [None]:
class Customer:
    def __init__(self, products: list[str]):
        self.products = products

    @property
    def products(self) -> list:
        return self._products

    @products.setter
    def products(self, products: list[str]):
        if not products:
            raise ValueError("products field required")
        if not isinstance(products, list):
            raise ValueError("products value is not a valid list")
        for product in products:
            if not isinstance(product, str):
                index = products.index(product)
                raise ValueError(f"products {index} str type expected")
        self._products = products

    @products.deleter
    def products(self):
        del self._products

## In December 2016, Python 3.6 was released
And with it, a feature called _Type Annotations_ (heavily inspired by TypeScript)

Suddenly, you could turn this code:

In [None]:
def div(a, b):
    return a / b

Into this:

In [None]:
def div(a: int, b: int) -> int:
    return a / b

And turn this code:

In [None]:
class Person:
    def __init__(self, name):
        self.name = name

Into this:

In [None]:
class Person:
    name: str

    def __init__(self, name: str):
        self.name = name

Which means that you can finally tell the user and the editor what's the (intended) interface, instead of forcing them to look at the constructor implementation and infer the interface.

## `__annotations__` _(remember last lesson?)_

In [1]:
class Person:
    def __init__(self, name):
        self.name = name

In [2]:
hasattr(Person, 'name')

In [3]:
Person.__annotations__

In [4]:
class Person:
    name: str

In [5]:
hasattr(Person, 'name')

In [6]:
Person.__annotations__

This information is key. It's exactly enough to build everything we'll talk about today

# `pydantic.BaseModel`: an executable interface

A reminder of the boilerplate:

In [None]:
class Customer:
    def __init__(self, products: list[str]):
        self.products = products

    @property
    def products(self) -> list:
        return self._products

    @products.setter
    def products(self, products: list[str]):
        if not products:
            raise ValueError("products field required")
        if not isinstance(products, list):
            raise ValueError("products value is not a valid list")
        for product in products:
            if not isinstance(product, str):
                index = products.index(product)
                raise ValueError(f"products {index} str type expected")
        self._products = products

    @products.deleter
    def products(self):
        del self._products

Can be now written simply as:

In [8]:
from pydantic import BaseModel

class Customer(BaseModel):
    products: list[str]

In [8]:
from pydantic import BaseModel

class Customer(BaseModel):
    products: list[str]

In [10]:
Customer()  # ValidationError: products required

In [8]:
from pydantic import BaseModel

class Customer(BaseModel):
    products: list[str]

In [12]:
Customer(products='nice')  # ValidationError: not a valid list

In [8]:
from pydantic import BaseModel

class Customer(BaseModel):
    products: list[str]

In [14]:
Customer(products=[{"what":"?"}])  # ValidationError: str type expected

### Pydantic gives us:

- abstraction
- type safety (or at least validation)
- easy to define
- easy to use (to consume)
- oriented around OpenAPI: (de)serialization is a first-class citizen
- editor support
- Batman's-assistant-like behavior (tell it what you need, he'll take care of the rest)

## Fields

### Required vs Optional

#### Required
- You have to provide a value
- You have to provide the declared type

In [None]:
class Person(BaseModel):
    name: str

In [None]:
Person()  # ValidationError: name is required

In [None]:
Person(name=None)  # ValidationError: none is not allowed

In [None]:
person = Person(name='bob'); person.name = 'Obo'    # all good ✔

#### Optional
- Defaults to `None`, you don't have to provide a value
- You can explicitly provide `None` as a value

In [15]:
class Person(BaseModel):
    name: str | None

class Person(BaseModel):
    name: str = None

class Person(BaseModel):
    name: str | None = None

In [None]:
Person()    # all good ✔

In [None]:
Person(name=None)    # all good ✔

In [None]:
person = Person(name='bob'); person.name = 'Obo'    # all good ✔

#### Required Optional
- You have to provide a value
- You can explicitly provide `None` as a value

In [None]:
class Person(BaseModel):
    name: str | None = ...

In [None]:
Person()  # ValidationError: name is required

In [None]:
Person(name=None)

In [None]:
person = Person(name='bob'); person.name = 'Obo'

### Data Conversion

In [16]:
Person(name=42)

### `pydantic.Field`

The declared type annotations are "upgraded" to `Field` objects:

In [None]:
from pydantic import Field

class Person(BaseModel):
    name: str = Field()

Is the same as:

In [115]:
class Person(BaseModel):
    name: str

#### `default`

These are all equivalents:

In [116]:
class Person(BaseModel):
    name: str = None
    name: str = Field(None)
    name: str = Field(default=None)

#### `default_factory`
You might want to delay evaluation until the attribute is accessed:

In [133]:
import math

def really_long_calculation():
    return math.sqrt(math.inf)

class God:
    age: int = Field(default_factory=really_long_calculation)

#### OpenAPI

`Field` is one of the main places where pydantic meets OpenAPI. 

The `Field` constructor can take many custom rules to follow, if you need to tighten up your schema. 

In [None]:
def Field(
    default: Any = Undefined,
    *,
    default_factory: NoArgAnyCallable | None = None,
    
    # OpenAPI fluff:
    alias: str = None,
    title: str = None,
    description: str = None,
    
    # Serialization:
    exclude: 'AbstractSetIntStr' | 'MappingIntStrAny' | Any = None,
    include: 'AbstractSetIntStr' | 'MappingIntStrAny' | Any = None,
    const: bool = None,

    # Validation:
    gt: float = None,
    ge: float = None,
    lt: float = None,
    le: float = None,
    multiple_of: float = None,
    allow_inf_nan: bool = None,
    max_digits: int = None,
    decimal_places: int = None,
    min_items: int = None,
    max_items: int = None,
    unique_items: bool = None,  # cool
    min_length: int = None,
    max_length: int = None,
    allow_mutation: bool = True,
    regex: str = None,  # cool
    discriminator: str = None,
    repr: bool = True,
    **extra: Any,
)

## `BaseModel` Instance Methods

In [17]:
from datetime import datetime

class Person(BaseModel):
    name: str
    date_of_birth: datetime = None
    hungry: bool = True


p = Person(name="Bob", date_of_birth=datetime(1990, 1, 1))
p

### `model.json(...) -> str`

In [18]:
p.json()

__`include` and `exclude`__

In [22]:
p.json(exclude={"name"})

In [23]:
p.json(include={"name"})

__`exclude_defaults`: Everything that has the same value as the default value__

(Whether you set it that way or it just defaulted this way)

In [24]:
Person(name="Bob", 
       date_of_birth=None,  # excluded
       hungry=True          # excluded
      ).json(exclude_defaults=True)

In [25]:
Person(name="Bob", 
       date_of_birth=None,  # excluded
       hungry=False
      ).json(exclude_defaults=True)

In [26]:
Person(name="Bob", 
       date_of_birth=None , # excluded
                            # hungry is excluded
      ).json(exclude_defaults=True)

__`exclude_unset`: The fields you haven't explicitly provided to the constructor__

(regardless of their value)

In [27]:
Person(name="Bob", 
       date_of_birth=None,  # included
       hungry=True          # included
      ).json(exclude_unset=True)

In [28]:
Person(name="Bob", 
       date_of_birth=None,  # included
                            # hungry is excluded
      ).json(exclude_unset=True)

__`exclude_none`: Everything that is, well, `None`__

(regardless of what you've provided or what's the default)

In [29]:
Person(name="Bob", 
       date_of_birth=None,  # excluded
       hungry=True
      ).json(exclude_none=True)

In [30]:
Person(name="Bob", 
                          # date_of_birth excluded because defaults to None               
       hungry=True
      ).json(exclude_none=True)

In [31]:
Person(name="Bob", 
       date_of_birth=None,  # excluded
                            # hungry is included!
      ).json(exclude_none=True)

__`by_alias`:__

In [34]:
class Person(BaseModel):
    age: int = Field(alias="years_on_earth")
    
Person(years_on_earth=42).json()               # '{"age": 42}'
Person(years_on_earth=42).json(by_alias=True)  # '{"years_on_earth": 42}'

__`encoder`: an ad-hoc plan B `(field) -> str` function. Not as good as defining it on the class' `Config`.__

### `model.dict(...) -> dict`

Same arguments as `.json(...)`. Just returns a dict.

Cool thing I learned yesterday:

In [35]:
class Person(BaseModel):
    name: str
    age: int
    
for field, value in Person(name='Bob', age=2):
    print(f'field: {field} |', f'value: {value}')

field: name | value: Bob
field: age | value: 2


### `model.copy(...) -> BaseModel`

In [None]:
Person(name='Bob', age=2).copy()

In [None]:
Person(name='Bob', age=2).copy(include={"name"})

In [None]:
Person(name='Bob', age=2).copy(exclude={"name"})

## `BaseModel` Class Methods

### `BaseModel.parse_raw('...') -> BaseModel`

In [37]:
Person.parse_raw('{"name": "Bob", "age": 2}')

Implementation pseudo-code:
```python
class BaseModel:
    def parse_raw(self, stringified: str):
        data: dict = self.config.json_loads(stringified)
        return self.parse_obj(data)
```

### `BaseModel.parse_obj(other_model: BaseModel) -> BaseModel` 

In [38]:
class Person(BaseModel):
    name: str
    can_walk: bool
    
class Baby(Person):
    can_walk = False
    
ami = Baby(name='ami')
Person.parse_obj(ami)

Implementation pseudo-code:
```python
class BaseModel:
    @classmethod
    def parse_obj(cls, other_model: BaseModel) -> BaseModel:
        return cls(**dict(other_model))
```

### `BaseModel.parse_file(path: Path) -> BaseModel` 

### `BaseModel.from_orm(other_obj: Any) -> BaseModel` 

In [39]:
class Person(BaseModel):
    name: str
    can_walk: bool
    
    class Config:
        orm_mode=True
    
class Alien:   # Not BaseModel!
    def __init__(self, name, can_walk):
        self.name = name
        self.can_walk = can_walk
    
    
zorg = Alien(name='zorg', can_walk=True)
Person.from_orm(zorg)

### `BaseModel.schema(...) -> dict`
The OpenAPI Schema.

In [40]:
class Person(BaseModel):
    name: str
    
Person(name="Bob").schema()

### `BaseModel.schema_json(...) -> str`
The stringified OpenAPI Schema.

In [41]:
class Person(BaseModel):
    name: str
    
Person(name="Bob").schema_json()

# Validators

Validators are one way to extend custom field validations.

In [44]:
# Straight from the docs
from pydantic import validator

class Person(BaseModel):
    name: str

    @validator('name')
    def name_alphanumeric(cls, name):
        assert name.isalnum(), 'must be alphanumeric'
        return name
    
Person(name='Bob!')  # ValidationError: name must be alphanumeric

With `values` argument, we can access all the fields that already have been defined

In [45]:
class Person(BaseModel):
    first_name: str
    last_name: str

    @validator('last_name')
    def last_name_is_different_than_first_name(cls, last_name, values: dict):
        assert last_name != values['first_name'], 'last_name must be different than first_name'
        return last_name


Person(first_name='Bob', last_name='Bob')  # ValidationError

In [None]:
class Person(BaseModel):
    funny_numbers: list[int]

    @validator('funny_numbers', each_item=True)
    def each_number_is_positive(cls, number):
        print(number)
        assert isinstance(number, int) and number > 0, "must be a positive int"
        return number


Person(funny_numbers=[1, 59, -1])
# 1
# 59
# -1
# ValidationError: funny_numbers[2] must be a positive int

In [None]:
class Person(BaseModel):
    funny_numbers: list[int]
    friends: list[str]

    @validator('funny_numbers', 'friends')
    def make_sure_has_friends_even_though_loves_funny_numbers(cls, field):
        assert field, 'all fields must be truthy'
        return field


Person(funny_numbers=[-1, 59, 707], friends=[])  # ValidationError

In [None]:
class Person(BaseModel):
    funny_numbers: list[int]
    friends: list[str]

    @validator('*')
    def its_2_40_am(cls, field):
        assert field, 'all fields must be truthy'
        return field


Person(funny_numbers=[-1, 59, 707], friends=[])  # ValidationError

And weirdly, they're also getters. Otherwise the field is set to `None`!

In [167]:
class YellingPerson(BaseModel):
    name: str
    
    @validator('name')
    def uppercase(cls, name):
        return name.upper()
    
YellingPerson(name='Bob')

# Config

## Serialization config options

### `json_dumps: ( object_dict: dict, *, default ) -> str`
Defaults to stdlib `json.dumps`

In [None]:
def yell(object_dict: dict, *, default: callable) -> str:
    loud_dict = {}
    for key, value in object_dict.items():
        loud_dict[key] = str(value).upper() + "!!!"
    return json.dumps(loud_dict, default=default)

    

class Robot(BaseModel):
    name: str
    date_of_birth: datetime
    
    class Config:
        json_dumps = yell


Robot(name='r2d2',
      date_of_birth=datetime(1977, 1, 1)
     ).json()

### `json_loads: ( stringified_object: str ) -> dict`
Defaults to stdlib `json.loads`

In [None]:
def yell(object_dict: dict, *, default: callable) -> str:
    loud_dict = {}
    for key, value in object_dict.items():
        loud_dict[key] = str(value).upper() + "!!!"
    return json.dumps(loud_dict, default=default)


def everyone_calm_down(stringified_object: str) -> dict:
    calm: str = stringified_object.lower().replace("!", "")
    return json.loads(calm)


class Robot(BaseModel):
    name: str
    date_of_birth: datetime

    class Config:
        json_dumps = yell
        json_loads = everyone_calm_down


stringified = Robot(name="r2d2", date_of_birth=datetime(1977, 1, 1)).json()
Robot.parse_raw(stringified)

### `json_encoders: dict[type, callable]`
When `Config.json_dumps` fails to convert a value.

Defaults to a robust json encoder that does a better job than stdlib's `json.dumps`.

In [None]:
class Robot(BaseModel):
    name: str
    date_of_birth: datetime
    
    class Config:
        json_encoders = {str:  str.upper,              # ✘ Ignored!
                         datetime: datetime.timestamp  # ✔ Used
                        }


Robot(name='r2d2',                        # Not uppercase
      date_of_birth=datetime(1977, 1, 1)  # 220917600.0
     ).json()

In [None]:
class JamesBond(BaseModel):    
    class Config:
        json_dumps = lambda object_dict, default=None: "Bond... James Bond"
        

class Robot(BaseModel):
    name: str
    date_of_birth: datetime
    owner: JamesBond
    
    class Config:
        json_encoders = {str:  str.upper,               # ✘ Ignored
                         datetime: datetime.timestamp,  # ✔ Used
                         JamesBond: JamesBond.json      # ✔ Used because models_as_dict=False
                        }


Robot(name='r2d2',                         
      date_of_birth=datetime(1977, 1, 1),
      owner=JamesBond()                    # really stupid example
     ).json(models_as_dict=False)

## Controlling strictness

### `validate_all`
Whether to validate field defaults (default: `False`)

In [None]:
class Person(BaseModel):
    age: int = "Hello"

Person()  # all good ✔

In [None]:
class Person(BaseModel):
    age: int = "Hello"
    
    class Config:
        validate_all = True

Person()  # ValidationError: not a valid integer

### `validate_assignment`
Default `False`

In [None]:
class Person(BaseModel):
    age: int

p = Person(age=42)
p.age = "Hello"  # all good ✔

In [None]:
class Person(BaseModel):
    age: int = "Hello"
    
    class Config:
        validate_assignment = True

p = Person(age=42)
p.age = "Hello"  # ValidationError: not a valid integer

### `extra: Extra`
Default `Extra.ignore`

In [87]:
class Person(BaseModel):
    age: int

Person(age=42, what="no idea")

In [88]:
from pydantic import Extra
class Person(BaseModel):
    age: int
    
    class Config:
        extra = Extra.allow

Person(age=42, what="no idea")

In [None]:
from pydantic import Extra
class Person(BaseModel):
    age: int
    
    class Config:
        extra = Extra.forbid

Person(age=42, what="no idea")  # ValidationError: extra fields not permitted

### `allow_mutation`
Default `True`

In [91]:
class Person(BaseModel):
    age: int

p = Person(age=42)
p.age = 100  # all good ✔

In [None]:
class Person(BaseModel):
    age: int
    
    class Config:
        allow_mutation = False

p = Person(age=42)
p.age = 100  # TypeError: "Person" is immutable and does not support item assignment

## Other useful configurations

### `use_enum_values`
Default: `False`

In [102]:
from enum import Enum

class Food(Enum):
    japo = 'japo'
    sari = 'sari'

class Elad(BaseModel):
    ma_ochlim_haiom: Food


Elad(ma_ochlim_haiom='japo').dict()

In [103]:
from enum import Enum

class Food(Enum):
    japo = 'japo'
    sari = 'sari'

class Elad(BaseModel):
    ma_ochlim_haiom: Food
    
    class Config:
        use_enum_values = True

Elad(ma_ochlim_haiom='japo').dict()

### `orm_mode`
Like we've seen before:

In [106]:
class Person(BaseModel):
    name: str
    can_walk: bool
    
    class Config:
        orm_mode=True
    
class Alien:   # Not BaseModel!
    def __init__(self, name, can_walk):
        self.name = name
        self.can_walk = can_walk
    
    
zorg = Alien(name='zorg', can_walk=True)
Person.from_orm(zorg)

### `arbitrary_types_allowed`
Like `orm_mode`, but on the field level:

In [None]:
class MyCoolStringComposition:
    def __init__(self, string):
        self._string = string

class Person(BaseModel):
    name: MyCoolStringComposition
    

Person(name=MyCoolStringComposition('bob'))  # RuntimeError

In [114]:
class MyCoolStringComposition:
    def __init__(self, string):
        self._string = string

class Person(BaseModel):
    name: MyCoolStringComposition
    
    class Config:
        arbitrary_types_allowed = True
    

Person(name=MyCoolStringComposition('bob'))    # all good ✔

### `smart_union`
Default: `False`, which is shitat matzliach. Whatever works first.

In [104]:
class Person(BaseModel):
    personal_id: str | int


Person(personal_id=1234)

In [105]:
class Person(BaseModel):
    personal_id: str | int
    
    class Config:
        smart_union = True


Person(personal_id=1234)

# Settings

Pydantic offers a `BaseSettings` class, that interacts with environment variables, `.env` files and file secrets.

### `.dotenv`

In [169]:
from pydantic import BaseSettings

class Settings(BaseSettings):
    password: str
    
    class Config:
        env_file = '.env'

In [174]:
!echo 'password=1234' > .env

In [175]:
!cat .env

password=1234


In [176]:
Settings()

### Environment variables (take precedence over .env)

In [179]:
%env password=4321

env: password=4321


In [180]:
Settings()

### Support for multiple `.env` files

In [200]:
class Settings(BaseSettings):
    api_key: str
    
    class Config:
        env_file = '.env', '.env.prod'  # .env.prod wins

In [193]:
!echo 'api_key=9999' > .env.prod

In [194]:
!echo 'api_key=8888' > .env

In [195]:
!cat .env.prod

api_key=9999


In [196]:
!cat .env

api_key=8888


In [199]:
Settings()

### `BaseSettings` is a `BaseModel` subclass, so you could do dynamic evaluation, e.g:

In [None]:
class Settings(BaseSettings):
    prod_db_write_creds: str
    
    class Config:
        env_file = '.env'
        
    @validator('prod_db_write_creds')
    def get_prod_db_write_creds(cls, prod_db_write_creds):
        if os.getlogin() == 'gilad':
            return real thing
        return 'dev-demo-user.aws.qa'

# Exception Handling

### Pydantic's custom errors are nice wrappers with features

They have methods

They have attributes

They can be inherited from to create very informative errors

In [None]:
from pydantic import PydanticValueError, BaseModel, validator, ValidationError


class EmptyName(PydanticValueError):
    code = 'empty_name'
    msg_template = 'Name cannot be empty. Got "{name}" while building {cls.__name__}'



class Person(BaseModel):
    name: str

    @validator('name')
    def must_have_value(cls, name):
        if not name:
            raise EmptyName(name=name, cls=cls)
        return name


try:
    Person(name='')
except ValidationError as e:
    for error in e.errors():
        for key, value in error.items():
            print(f'{key}: {value}')
    
    print('raw_errors: ', e.raw_errors)
    print('json: ', e.json())