# Pydantic model dump - what's inside?

In this quick example we will create a Pydantic model and dump it to see what's inside.
Here I will start to make one distinction:

* **Decorated methods** that serve as definitions for validators on various fields will be called straight-up **validators** by me,
* The limits such as `min_length`, `max_length`, `min_value`, `max_value` etc. will be called **constraints**,
defined as additional arguments to the `Field` constructor.

This is to make the distinction between the two clearer, which wil lcome in handy when we will be looking at the dumped model
and, later, creating the library which is the main goal of this article.

So first, let's create a simple Pydantic model with one validator and two constraints on the `age` field.

In [4]:
import pydantic
import json
import logging


class Nested(pydantic.BaseModel):
    name: str
    age: int = pydantic.Field(ge=0, le=80)

    @pydantic.field_validator('age')
    def check_age(cls, value):
        if value < 18:
            raise ValueError('You need to be an adult to use this service.')
        return value

We can now check if all bells and whistles are in good order by feeding some data to the model and checking if it's valid.
Let's do that for a list of differing ages, since we know that anything in the range from $0$ to $18$ **and** above $80$ is invalid.

In [None]:
logging.basicConfig(level=logging.DEBUG)

# We can see if validators/constraints are working
# by trying to create a model with invalid and valid values.
# For invalid values, we expect a ValidationError to be raised.
# For valid values, we expect the model to be created successfully.
for age in [-1, 0, 17, 18, 80, 81]:
    try:
        Nested(name='John', age=age)
        logging.info(f'John is {age} years old.')
    except pydantic.ValidationError as e:
        logging.error(e)

In [None]:
import datetime

TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)  # All timestamps will be relative to this one


class ModelWithDatetime(pydantic.BaseModel):
    created_at: str

    @pydantic.field_validator('created_at')
    def check_created_at(cls, value):
        iso_formatted_value = datetime.datetime.fromisoformat(value)
        if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):
            raise ValueError('The timestamp is too old.')
        return iso_formatted_value

This class is a simple example of a context-dependent model - it checks the date 100 days before the current date
and then validates if passed date is not older than that.

This is a simple example of a validator that uses a global context to validate the field, because:

* `datetime` need to be imported from the `datetime` module and included in the current `globals()`,
* `datetime.now()` is a function that is called during the model creation and is not a part of the model itself,
* the `TIMESTAMP_START` constant is a module-wide constant that is used in the validator.

Let's check how this model will behave when the global context changes.

In [None]:
# Here we will employ a trick to remove the datetime module from the globals() dictionary,
# so it is not available to the unpickled object.
if 'datetime' in globals():
    del globals()['datetime']

# We can check it by trying to access the datetime module
try:
    datetime.datetime.now()
except NameError as e:
    logging.error(f'Failed to access the datetime module: {e}')

# Now we will try to create an instance of the model
try:
    ModelWithDatetime(created_at='1410-07-15T00:00:00')  # This is the date of the Battle of Grunwald
except NameError as e:
    logging.error(f'Failed to create an instance of the model: {e}')

# Now we will try to create an instance of the model AFTER we have restored the datetime module

import datetime

try:
    ModelWithDatetime(created_at='1410-07-15T00:00:00')
except pydantic.ValidationError as e:
    logging.error(e)

Aha! We got an `NameError` error, because the `datetime` module is not available in the validator function in a clean Python environment.
After re-importing the missing module, we can see that the model model performs the validation as expected, hence the `ValidationError`
is raised for the date that is older than 100 days.

This means that any dependencies used inside of the valdiator functions need to be installed and re-imported in the new environment
in order to work properly. One way to fix this would be to move the importing of the `datetime` module to the
source code of validator function, but this is not a good practice, because it makes the code less readable and harder to maintain.

However, we will bite the bullet and try this approach to see if it will work.

In [None]:
# Again, remove datetime
if 'datetime' in globals():
    del globals()['datetime']


class ModelWithDatetimeRedux(pydantic.BaseModel):
    created_at: str

    @pydantic.field_validator('created_at')
    def check_created_at(cls, value):
        """
            What we do is we basically try to "pack" the whole context of the function here
        """
        import datetime
        TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)

        iso_formatted_value = datetime.datetime.fromisoformat(value)
        if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):
            raise ValueError('The timestamp is too old.')
        return value

In [None]:
# Gone with the datetime module again
if 'datetime' in globals():
    del globals()['datetime']

# Now we will try to create an instance of the model, which should pass

try:
    ModelWithDatetimeRedux(created_at='1410-07-15T00:00:00')
except pydantic.ValidationError as e:
    logging.error(e)

Cool, cool, this approach works and may be used to move around our Pydantic models from one environment to another,
since the validators are now self-contained and do not depend on any global context. We need to check a couple of things.

First - how this works out for nested models, since we can have fields that are Pydantic models themselves.

In [None]:
class Nested(pydantic.BaseModel):
    name: str
    age: int = pydantic.Field(ge=0, le=80)

    @pydantic.field_validator('age')
    def check_age(cls, value):
        if value < 18:
            raise ValueError('You need to be an adult to use this service.')
        return value


class Root(pydantic.BaseModel):
    description: str
    nested: Nested

# Let's see what information is available in the JSON dump of our model
nested_model = Root(
    description='A model with a nested model',
    nested=Nested(
        name='John',
        age=18
    )
)

# We will confirm that the model is created successfully,
# and nested model is also created successfully as one of the fields
logging.info(
    json.dumps(  # This method is just for indentation only
        Nested.model_json_schema(),  # This is a V2 version of the schema dump
        indent=2
    )
)

As expected, validation for $-1$, $0$, $18$ and $81$ failed, while $1$, $17$, $19$ and $80$ passed.
This means that both our validator and constraints are working as expected.

## Serializing the model

Now, let's serialize the model to see what's inside. Pydantic allows us to dump the model to a dictionary, which we can then print out,
using the `model_json_schema` (previously it was `schema_json`) method. As we can see, the model is serialized to a dictionary with
contains only information about the **contraints** applied to the fields, but no mention is found of the **validators**.

In [None]:
# Let's see what information is available in the JSON dump of our model
logging.info(
    json.dumps(  # This method is just for indentation only
        Root.model_json_schema(),  # This is a V2 version of the schema dump
        indent=2
    )
)
logging.info(
    Root.schema_json(indent=2)
)  # This is a V1 version of the schema dump


Let's try the more lower-lever `dict()` method on the model instance to see if it will give us more information.

In [None]:
model_instance = Nested(name='John', age=18)
logging.info(
    Root.dict(model_instance)
)

So far no hit on the validators. Let's just use the Python built-in `__dict__` attribute to see if we can find anything useful there.

In [None]:
whole_dict_of_model = Root.__dict__
logging.info(whole_dict_of_model)

### What do we get out of this?

First of all, we can see that the `__dict__` attribute of the model class contains all the fields that we defined in the model,
**together** with definitions of validator functions linked to named fields. This means, that we can try to **programatically**
create a class inheriting from `BaseModel` and add all the fields and validators to it by accessing correct, private attributes of the model instance.

But why any JSON dumps of our model did not contain this information? The answer is simple - Pydantic does not serialize the validators,
because the underlying serializers do not know how to handle them. They are **functions** with specific **closures** that need to be
**reconstructed** in order to be used. Similarly, reading the documentation for JSON schemas in Pydantic, we can see that there
is no straightforward way to serialize models and load them back via library's API.

Let's check if we can reconstruct our nested model from the serialized form of the `__dict__` attribute.

In [None]:
model_dump = dict(**Root.__dict__)

# Start with the built-in type function to create a new class
try:
    reconstructed_model = type(
        'NestedModel',
        (pydantic.BaseModel,),
        model_dump
    )
except pydantic.PydanticUserError as error:
    logging.error(f'Failed to reconstruct the model: {error}')

# Maybe we can try to filter out the object's attributes from the model's dictionary,
# and then try to reconstruct the model using Pydantic API?

object_dict = object.__dict__  # This is the dictionary of the object's attributes, a base class for all objects in Python
filtered_model_dump = {k: model_dump[k] for k in model_dump if k not in object_dict}  # This leaves out Pydantic's BaseModel-specific attributes

validators_from_dict = filtered_model_dump.get('__validators__', {})
reconstructed_model = pydantic.create_model(
    'NestedModel',
    __base__=pydantic.BaseModel,
    __validators__=validators_from_dict,
    **{
        annotation: (filtered_model_dump['__annotations__'][annotation], ...)
        for annotation in filtered_model_dump['__annotations__']
    }
)
reconstructed_model_instance = reconstructed_model(
    description='A nested model',
    nested=Nested(
        name='John',
        age=18
    )
)

logging.info(reconstructed_model_instance.dict())

Well, seems like we have found out way to reconstruct the model. It may no be as straightforward as we would like it to be, but it clearly works.
The validators seem to be taken into the account, since the validation of the nested model works as expected. But, what if we start from scratch,
meaning that there is no `Model` class defined in the current environment?

In [None]:
if 'Nested' in globals():
    del globals()['Nested']
if 'Root' in globals():
    del globals()['Root']

# Now we can try to reconstruct the Root model, having ONLY the filtered dictionary available to us

reconstructed_model = pydantic.create_model(
    'NestedModel',
    __base__=pydantic.BaseModel,
    __validators__=validators_from_dict,
    **{
        annotation: (filtered_model_dump['__annotations__'][annotation], ...)
        for annotation in filtered_model_dump['__annotations__']
    }
)

try:
    reconstructed_model_instance = reconstructed_model(
        description='A nested model',
        nested=Nested(
            name='John',
            age=18
        )
    )
except NameError as e:
    logging.error(f'Failed to create an instance of the model: {e}')

And here is the main pitfall we encounter in this case. The `Model` class is not defined in the current environment, so we cannot
reconstruct the model from the serialized form of the `__dict__` attribute. This means that we need to have the `Model` class
defined in the current environment in order to reconstruct the `Root` model from the serialized form of the `__dict__` attribute.

The second pitfall is actually easy to show - if we choose the output (even filtered) of the `__dict__` attribute of the model instance
then we will be unable to serialize it to a form that would be suitable for exporting such as JSON or YAML. This is because the
`__dict__` attribute contains references to the functions that are not serializable.

Let's quickly define another nested model and try to serialize it to see if we can reconstruct it from the serialized form.

In [None]:
class Address(pydantic.BaseModel):
    street: str
    city: str
    zip: str

    @pydantic.field_validator('zip')
    def check_zip(cls, value):
        if len(value) != 5:
            raise ValueError('ZIP code must be exactly 5 characters long.')
        return value

class WorkInfo(pydantic.BaseModel):
    company: str
    position: str
    salary: float = pydantic.Field(ge=0)

# This will be our new Root model
class Person(pydantic.BaseModel):
    name: str
    age: int = pydantic.Field(ge=0)
    address: Address
    occupation: WorkInfo

It is time to JSON dump this bad boi to see if it can be exported.

In [None]:
filtered_person_dump = {k: Person.__dict__[k] for k in Person.__dict__ if k not in object_dict}

# Dump the model to JSON
try:
    person_as_json = json.dumps(filtered_person_dump, indent=2)
except TypeError as e:
    logging.error(f'Failed to serialize the model: {e}')

Okay, we can always `pickle` our data and go from there.

In [None]:
import pickle

# Let's pickle the model
try:
    pickled_model = pickle.dumps(filtered_person_dump)
except TypeError as e:
    logging.error(f'Failed to pickle the model: {e}')

# Maybe the pickling of the whole model will work?
try:
    pickled_model = pickle.dumps(Person)
    logging.info('Model pickled successfully!')
except TypeError as e:
    logging.error(f'Failed to pickle the model: {e}')
# Let's pickle the model
try:
    pickled_model = pickle.dumps(filtered_person_dump)
except TypeError as e:
    logging.error(f'Failed to pickle the model: {e}')

# Maybe the pickling of the whole model will work?
try:
    pickled_model = pickle.dumps(Person)
    logging.info('Model pickled successfully!')
except TypeError as e:
    logging.error(f'Failed to pickle the model: {e}')

Yes! The anwser was so simple, we only needed to use the good old `pickle` module to serialize the model to a bytes object. Let's see how it looks like, so we can try to come up with a way to maintain the model in a serialized form.

In [None]:
logging.info(pickled_model)

Exactly, just how a random jumble of bytes would look like. More over, the **first** pitfall is still in place - we need to have nested models defined in the current environment in order to reconstruct the model from the serialized form.

In [None]:
for model in [Person, Address, WorkInfo]:
    if model.__name__ in globals():
        del globals()[model.__name__]

# Now we will try to unpickle the model
try:
    unpickled_model = pickle.loads(pickled_model)
    logging.info('Model unpickled successfully!')
except AttributeError as e:
    logging.error(f'Failed to unpickle the model: {e}')

## A need

These ramblings show that available solutions always come with some kind of trade-off. We can either:

* serialize the model to a dictionary and lose the validators,
* serialize the model to a bytes object and lose the ability easily analyze the model's structure.

In both cases, we need to be **aware** that any nested models need to be defined in the current environment in order to reconstruct the model from the serialized form.
So, if You model is dependent on some other models - You've got two pickles to pass around, and the complexity of the model grows with each nested model.

This is a clear sign of a need a better way to serialize and deserialize Pydantic models, so we can easily export and import them to and from different environments.

Back to the drawing board...