### Objective

In this notebook we would explore the `base model` that is very important from the standpoint of defining the basic constructs of the `Pydantic Data Models`. The `Pydantic` library does the following:
- Data Model: helps define the basic data attributes and their behavior, models are associated with web applications and AI models.
- Advanced features: Type Check, Serialization and de-serialization of data
- Fun fact: its used in FastAPI

In [1]:
from pydantic import BaseModel

In [2]:
class Person(BaseModel):
    first_name: str
    last_name: str
    age: int

In [3]:
p = Person(first_name="Issac", last_name="Newton", age=84)

In [4]:
p

Person(first_name='Issac', last_name='Newton', age=84)

Prior to `Pydantic` we had `Data Classes` and these were very strictly. In `Pydantic` we can still allow the instance to be defined with data types that can at least be converted to the original defined data type.

In [6]:
p = Person(first_name='100', last_name='200', age='30')

In [7]:
p

Person(first_name='100', last_name='200', age=30)

In [14]:
p = Person(first_name='100', last_name='200', age=30.0)

In the new version of `Pydantic`, we cannot:
- convert string type to int type or vice versa
- convert float type with non zero decomal type to int type.

One of the errors that encounter here is called `ValidationError`, which we could try and catch. Also by default the field are considered required of nothing is defined.

In [16]:
try:
    Person(first_name="Johny")
except Exception as ex:
    print(ex)

2 validation errors for Person
last_name
  Field required [type=missing, input_value={'first_name': 'Johny'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing
age
  Field required [type=missing, input_value={'first_name': 'Johny'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing


In [17]:
from pydantic import ValidationError

try:
    Person(first_name="Johny")
except ValidationError as ex:
    print(ex)

2 validation errors for Person
last_name
  Field required [type=missing, input_value={'first_name': 'Johny'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing
age
  Field required [type=missing, input_value={'first_name': 'Johny'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.8/v/missing


In [21]:
from pprint import pprint
try:
    Person(first_name="Johny")
except ValidationError as ex:
    pprint(ex.json())

('[{"type":"missing","loc":["last_name"],"msg":"Field '
 'required","input":{"first_name":"Johny"},"url":"https://errors.pydantic.dev/2.8/v/missing"},{"type":"missing","loc":["age"],"msg":"Field '
 'required","input":{"first_name":"Johny"},"url":"https://errors.pydantic.dev/2.8/v/missing"}]')


We could make one of the  attributes `optional`. There are two ways of doing this.

In [27]:
class Person(BaseModel):
    first_name: str
    last_name: str = ""
    age: int = None

In [26]:
Person(first_name="Issac", last_name="Newton")

Person(first_name='Issac', last_name='Newton', age=None)

In [28]:
Person(first_name="Issac")

Person(first_name='Issac', last_name='', age=None)

In [33]:
from typing import Optional

class Person(BaseModel):
    first_name: str
    last_name: Optional[str]
    age: Optional[int] = None

#### Serialization strategy

**Note:** Convert `Person` pydantic class to a dictionary data type or `JSON` data type.

In [34]:
p = Person(first_name="Issac", last_name="Newton")

In [35]:
p.dict()

{'first_name': 'Issac', 'last_name': 'Newton', 'age': None}

In [36]:
p.json()

'{"first_name":"Issac","last_name":"Newton","age":null}'

In [37]:
p.dict(exclude=['age'])

{'first_name': 'Issac', 'last_name': 'Newton'}

In [42]:
print(p.model_dump_json(include=['first_name', 'last_name'], indent=4))

{
    "first_name": "Issac",
    "last_name": "Newton"
}


Earlier in `pydantic<v2`, we could easily pass indent to the arguments of `json` function now we cannot do that. However, we could use the function `model_dump_json`.

When we pass `indent` in `json` function, we get the following error.

```TypeError: `dumps_kwargs` keyword arguments are no longer supported.```

#### Deserialization strategy

We could deserialize data in `Pydantic`. Here, the package simply tries to map the attributes to the closest data type.

In [44]:
from datetime import date

In [45]:
class Person(BaseModel):
    first_name: str
    last_name: str
    dob: date
    bmi: float

In [51]:
# dictionary with complex data-types
data = {
    "first_name":"Issac",
    "last_name":"Newton", 
    "dob":date(1987,1,9),
    "bmi":20.5
}

In [52]:
p = Person.parse_obj(data)

In [53]:
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 1, 9), bmi=20.5)

In [54]:
data = {
    "first_name":"Issac",
    "last_name":"Newton", 
    "dob":date(1987,1,9),
    "bmi":20
}

In [55]:
p = Person.parse_obj(data)

In [56]:
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 1, 9), bmi=20.0)

In [81]:
data = {
    "first_name":"Issac",
    "last_name":"Newton", 
    "dob":"1987-09-01", # ISO date format yyyy-MM-dd 
    "bmi":20
}

In [62]:
p = Person.parse_obj(data)

In [63]:
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 9, 1), bmi=20.0)

In [66]:
json = '''
{
    "first_name":"Issac",
    "last_name":"Newton", 
    "dob":"1987-09-01",
    "bmi":20
}
'''

In [67]:
p = Person.parse_raw(json)

In [68]:
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 9, 1), bmi=20.0)

In python, we use `snake-casing` but in JSON we use `camel-casing`. So, we use the `Field` module in `pydantic` that has `alias` attribute where we could allow this inter-operability.

In [69]:
from pydantic import Field

In [82]:
class Person(BaseModel):
    first_name: str = Field(alias='firstName')
    last_name: str = Field(alias='lastName')
    dob: date = None
    bmi: float = 0.0

In [83]:
data

{'first_name': 'Issac', 'last_name': 'Newton', 'dob': '1987-09-01', 'bmi': 20}

In [84]:
try:
    Person.parse_obj(data)
except ValidationError as ex:
    print(ex.json())

[{"type":"missing","loc":["firstName"],"msg":"Field required","input":{"first_name":"Issac","last_name":"Newton","dob":"1987-09-01","bmi":20},"url":"https://errors.pydantic.dev/2.8/v/missing"},{"type":"missing","loc":["lastName"],"msg":"Field required","input":{"first_name":"Issac","last_name":"Newton","dob":"1987-09-01","bmi":20},"url":"https://errors.pydantic.dev/2.8/v/missing"}]


In [85]:
data = {
    "firstName":"Issac",
    "lastName":"Newton", 
    "dob":"1987-09-01", # ISO date format yyyy-MM-dd 
    "bmi":20
}

In [86]:
try:
    Person.parse_obj(data)
except ValidationError as ex:
    print(ex.json())

In [87]:
p = Person.parse_obj(data)
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 9, 1), bmi=20.0)

In [88]:
p.json()

'{"first_name":"Issac","last_name":"Newton","dob":"1987-09-01","bmi":20.0}'

Surprisingly enough, when we are converting the `JSON` or dictionary to `Pydantic` object, we need to use the`alias` argument in Field submodule. But when we serialize this object again, we get back the key names in snake-casing format, which is not the `alias` name but the `Field` name.

In [89]:
p.dict()

{'first_name': 'Issac',
 'last_name': 'Newton',
 'dob': datetime.date(1987, 9, 1),
 'bmi': 20.0}

Now, the problem with the above approach is tht we could only use `alias` name to create the `Pydantic` object. But if we want to intercha nbly use both `alias` name and `field name` to  create the object.

In [105]:
from pydantic import ConfigDict

class Person(BaseModel):
    first_name: str = Field(alias='firstName')
    last_name: str = Field(alias='lastName')
    dob: date = None
    bmi: float = 0.0

    model_config = ConfigDict(
        populate_by_name=True,
    )

In [106]:
data = {
    "first_name":"Issac",
    "last_name":"Newton", 
    "dob":"1987-09-01", 
    "bmi":20
}

In [107]:
Person.parse_obj(data)

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 9, 1), bmi=20.0)

In [108]:
p = Person.parse_obj(data)

In [109]:
p.json()

'{"first_name":"Issac","last_name":"Newton","dob":"1987-09-01","bmi":20.0}'

In [110]:
p.dict()

{'first_name': 'Issac',
 'last_name': 'Newton',
 'dob': datetime.date(1987, 9, 1),
 'bmi': 20.0}

In [111]:
p.dict(by_alias=True)

{'firstName': 'Issac',
 'lastName': 'Newton',
 'dob': datetime.date(1987, 9, 1),
 'bmi': 20.0}

In [112]:
p.json(by_alias=True)

'{"firstName":"Issac","lastName":"Newton","dob":"1987-09-01","bmi":20.0}'

#### Including extra fields

In [114]:
data_extra = {
    "first_name":"Issac",
    "last_name":"Newton", 
    "dob":"1987-09-01", 
    "bmi":20,
    "extra": "This has extra information."
}    

In [115]:
p = Person.parse_obj(data_extra)

In [116]:
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 9, 1), bmi=20.0)

The default behaviour is to just ignore the extra field. But there are three possibilities of handling extra fields ->

- Ignore the field
- Add the field
- Raise an error

In [119]:
from pydantic import Extra

In [122]:
class Person(BaseModel, extra='allow'):
    first_name: str = Field(alias='firstName')
    last_name: str = Field(alias='lastName')
    dob: date = None
    bmi: float = 0.0

    model_config = ConfigDict(
        populate_by_name=True,
    )

In [123]:
p = Person.parse_obj(data_extra)

In [124]:
p

Person(first_name='Issac', last_name='Newton', dob=datetime.date(1987, 9, 1), bmi=20.0, extra='This has extra information.')

In [125]:
p.dict()

{'first_name': 'Issac',
 'last_name': 'Newton',
 'dob': datetime.date(1987, 9, 1),
 'bmi': 20.0,
 'extra': 'This has extra information.'}

In [127]:
print(p.json())

{"first_name":"Issac","last_name":"Newton","dob":"1987-09-01","bmi":20.0,"extra":"This has extra information."}


In [130]:
class Person(BaseModel, extra='forbid'):
    first_name: str = Field(alias='firstName')
    last_name: str = Field(alias='lastName')
    dob: date = None
    bmi: float = 0.0

    model_config = ConfigDict(
        populate_by_name=True,
    )

In [132]:
try:
    p = Person.parse_obj(data_extra)
except Exception as ex:
    print(ex)

1 validation error for Person
extra
  Extra inputs are not permitted [type=extra_forbidden, input_value='This has extra information.', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden


What of we would want to snake casing for a huge number of fields a once? we would create a fucntion and then we would basically use that

In [133]:
def snake_to_camel_case(value: str) -> str:
    if not isinstance(value, str):
        raise ValueError("Value must be a string")
    words = value.split("_")
    value = "".join(word.title() for word in words)
    return f"{value[0].lower()}{value[1:]}"

In [136]:
class Person(BaseModel):
    first_name: str = None
    last_name: str
    dob: date = None
    bmi: float = 0.0

    model_config = ConfigDict(
        populate_by_name=True,
        alias_generator=snake_to_camel_case,
    )

In [139]:
p = Person(firstName="Issac", lastName="Newton")

In [140]:
p

Person(first_name='Issac', last_name='Newton', dob=None, bmi=0.0)

#### Field Constraints

In [142]:
from pydantic import conint

class Real_Person(Person):
    age: conint(gt=0, le=150)

In [145]:
Real_Person(first_name="Polpot", last_name="Verde", age=10)

Real_Person(first_name='Polpot', last_name='Verde', dob=None, bmi=0.0, age=10)

In [169]:
from pydantic import constr

class Real_Person(Person):
    first_name: str = None
    last_name: constr(strip_whitespace=True, strict=True, min_length=2, max_length=30, pattern=r'[A-Z][a-z]+')

In [170]:
Person(first_name="100", last_name="    Newton   ", age=10)

Person(first_name='100', last_name='    Newton   ', dob=None, bmi=0.0)

In [171]:
Real_Person(first_name="100", last_name="    Newton    ", age=10)

Real_Person(first_name='100', last_name='Newton', dob=None, bmi=0.0)

In [173]:
try:
    Real_Person(first_name="100", last_name="200", age=10)
except ValidationError as ex:
    print(ex)

1 validation error for Real_Person
last_name
  String should match pattern '[A-Z][a-z]+' [type=string_pattern_mismatch, input_value='200', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/string_pattern_mismatch


#### Custom Validators

In [177]:
from pydantic import field_validator

In [188]:
class Real_Person(Person):
    hash_tag: str

    @field_validator('hash_tag')
    def validate_hash_tag(cls, value):
        if not value.startswith("#"):
            raise ValueError("Hash tag must start with a #")
        return value

In [189]:
Real_Person(hash_tag="#phycist", first_name="Issac", last_name="Newton")

Real_Person(first_name='Issac', last_name='Newton', dob=None, bmi=0.0, hash_tag='#phycist')

In [192]:
class Real_Person(Person):
    hash_tag: constr(min_length=5, strip_whitespace=True)

    @field_validator('hash_tag')
    def validate_hash_tag(cls, value):
        if not value.startswith("#"):
            return f"#{value.lower()}"
        return value.lower()

We can actually set constraints as well as validation rules for the `Pydantic` class.

In [193]:
Real_Person(hash_tag="#phycist", first_name="Issac", last_name="Newton")

Real_Person(first_name='Issac', last_name='Newton', dob=None, bmi=0.0, hash_tag='#phycist')

In [194]:
Real_Person(hash_tag="phycist", first_name="Issac", last_name="Newton")

Real_Person(first_name='Issac', last_name='Newton', dob=None, bmi=0.0, hash_tag='#phycist')

In [195]:
from enum import Enum
from typing import List, Tuple, Union

In [196]:
class PolygonType(Enum):
    trigon = 3
    tetragon = 4
    pentagon = 5
    hexagon = 6

In [197]:
class CustomBaseModel(BaseModel):

    model_config = ConfigDict(
        populate_by_name=True,
        alias_generator=snake_to_camel_case,
    )

When we have multiple variables and validation has to be applied on one variable but it depends on another variable for it's validation then we have to use values.

In [206]:
from pydantic import ValidationInfo

class PolygonModel(CustomBaseModel):
    polygon_type: PolygonType
    vertices: List[Tuple[Union[int,float], Union[int,float]]]

    @field_validator('vertices')
    def validate_vertices(cls, value, values: ValidationInfo):
        polygon_type = values.data['polygon_type']
        if polygon_type:
            num_vertices_required = polygon_type.value
            if len(value) != num_vertices_required:
                raise ValueError(
                    f"For a {polygon_type.name}, we exactly need {polygon_type.value} vertices."
                )
        return value

In [207]:
PolygonModel(polygon_type=PolygonType.trigon, vertices=[(1,1), (2,2), (3,3)])

PolygonModel(polygon_type=<PolygonType.trigon: 3>, vertices=[(1, 1), (2, 2), (3, 3)])

In [209]:
try:
    PolygonModel(polygon_type=PolygonType.trigon, vertices=[(1,1), (2,2), (3,3), (4,4)])
except ValidationError as ex:
    print(ex)

1 validation error for PolygonModel
vertices
  Value error, For a trigon, we exactly need 3 vertices. [type=value_error, input_value=[(1, 1), (2, 2), (3, 3), (4, 4)], input_type=list]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error


In [210]:
try:
    PolygonModel(polygon_type=PolygonType.trigon, vertices=[(1,1), (2,2+2j), (3,3), (4,4)])
except ValidationError as ex:
    print(ex)

2 validation errors for PolygonModel
vertices.1.1.int
  Input should be a valid integer [type=int_type, input_value=(2+2j), input_type=complex]
    For further information visit https://errors.pydantic.dev/2.8/v/int_type
vertices.1.1.float
  Input should be a valid number [type=float_type, input_value=(2+2j), input_type=complex]
    For further information visit https://errors.pydantic.dev/2.8/v/float_type


#### Nested Models

Here we will start with the smaller constructs. And then move to the composite constructs.

In [242]:
# author construct
class Author(CustomBaseModel):
    first_name: constr(min_length=1, max_length=20, strip_whitespace=True)
    last_name: constr(min_length=1, max_length=20, strip_whitespace=True)
    display_name: Optional[str] =  Field(min_length=1, max_length=50, default = None, validate_default=True)

    # always = True forces the validator to run, even if display_name is None, this
    # is how we can set a dynamic default value
    @field_validator("display_name")  
    def validate_display_name(cls, value, values):
        # validator runs, even if previous fields did not validate properly - so 
        # we will need to run our code only if prior fields validated OK.
        if not value and 'first_name' in values.data.keys() and 'last_name' in values.data.keys():
            first_name = values.data.get('first_name')
            last_name = values.data.get('last_name')
            return f"{first_name} {(last_name[0]).upper()}"
        return value

In [243]:
# link construct
from pydantic import AnyHttpUrl

class Link(CustomBaseModel):
    name: constr(min_length=5, max_length=25)
    url: AnyHttpUrl

In [244]:
Author(first_name="Gottfried", last_name="Leibniz")

Author(first_name='Gottfried', last_name='Leibniz', display_name='Gottfried L')

In [245]:
Link(name="Original Book", url="https://archive.org/details/in.ernet.dli.2015.215284")

Link(name='Original Book', url=Url('https://archive.org/details/in.ernet.dli.2015.215284'))

Now we will make the Post, composite field.

In [247]:
from pydantic import conlist

In [256]:
class Post(BaseModel):
    byline: conlist(item_type=Author, min_length=1)
    title: constr(min_length=10, max_length=50, strip_whitespace=True)
    sub_title: constr(min_length=20, max_length=100, strip_whitespace=True) = None
    body: constr(min_length=100)
    links: List[Link] = []

    @validator('title')
    def validate_title(cls, value):
        return value and value.title()

In [264]:
class Post(BaseModel):
    byline: conlist(item_type=Author, min_length=1)
    title: constr(min_length=10, max_length=50, strip_whitespace=True)
    sub_title: constr(min_length=20, max_length=100, strip_whitespace=True) = None
    body: constr(min_length=100)
    links: List[Link] = []

    @validator('title')
    def validate_title(cls, value):
        return value and value.title()

Post(byline=[
        Author(first_name="John", last_name="von Neumann", display_name="Johnny V"),
        Author(first_name="Oskar", last_name="Morgenstern")],
     title="Theory of Games and Economic Behavior",
     sub_title="A non-mathematical overview",
    body="Lorem ipsum sit dolor amet." * 20,
    links=[
        Link(name="Original Book", url="https://archive.org/details/in.ernet.dli.2015.215284"),
        Link(name="Review", url = "https://www.ams.org/journals/bull/1945-51-07/S0002-9904-1945-08391-8/S0002-9904-1945-08391-8.pdf")
    ])

Post(byline=[Author(first_name='John', last_name='von Neumann', display_name='Johnny V'), Author(first_name='Oskar', last_name='Morgenstern', display_name='Oskar M')], title='Theory of Games and Economic Behavior', sub_title='A non-mathematical overview', body='Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.', links=[Link(name='Original Book', url=Url('https://archive.org/details/in.ernet.dli.2015.215284')), Link(name='Review', url=Url('https://www.ams.org/journals/bull/1945-51-07/S0002-9904-1945-08391

In [270]:
p = Post(byline=[
        Author(first_name="John", last_name="von Neumann", display_name="Johnny V"),
        Author(first_name="Oskar", last_name="Morgenstern")],
     title="Theory of Games and Economic Behavior",
     sub_title="A non-mathematical overview",
    body="Lorem ipsum sit dolor amet." * 20,
    links=[
        Link(name="Original Book", url="https://archive.org/details/in.ernet.dli.2015.215284"),
        Link(name="Review", url = "https://www.ams.org/journals/bull/1945-51-07/S0002-9904-1945-08391-8/S0002-9904-1945-08391-8.pdf")
    ])

print(p.model_dump_json(indent=2))

{
  "byline": [
    {
      "first_name": "John",
      "last_name": "von Neumann",
      "display_name": "Johnny V"
    },
    {
      "first_name": "Oskar",
      "last_name": "Morgenstern",
      "display_name": "Oskar M"
    }
  ],
  "title": "Theory of Games and Economic Behavior",
  "sub_title": "A non-mathematical overview",
  "body": "Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.Lorem ipsum sit dolor amet.",
  "links": [
    {
      "name": "Original Book",
      "url": "https://archive.org/details/in.ernet.dli.2015.21

In [None]:
print(Post.schema_json(i