# Pydantic Example

In this notebook, some useful pydantic examples for DS and ML projects will be demostrated. To get full info on pydantic, read official docs: [Pydantic](https://docs.pydantic.dev/latest/)

In [None]:
import os
import yaml
from datetime import date, datetime
from pathlib import Path
from typing import Literal

import pandas as pd
from pydantic import (BaseModel, ConfigDict, DirectoryPath, Field, FilePath,
                      HttpUrl, PastDate, PositiveInt, field_validator,
                      validate_call)

## Table of Contents

1. [Simple Pydantic Model](#simple-pydantic-model)
    - [Integer validation](#integer-validation)
    - [Path validation](#path-validation)
    - [Datetime parsing](#datetime-parsing)
    - [**DIY**: Explore other built-in types validation](#diy-explore-other-built-in-types-validation)
2. [Custom Validation](#custom-validation)
    - [Custom types](#custom-types)
    - [Custom validators](#custom-validators)
    - [Custom validate "before" default validation](#custom-validate-before-default-validation)
    - [**DIY**: Explore docs for supported types](#diy-explore-docs-for-supported-types)
3. [Pydantic Configuration](#pydantic-configuration)
    - [Common configuration](#common-configurations)
    - [Counterexample of `extra="forbid"`](#counterexample-of-extraforbid)
    - [Counterexample of `validate_assignment=True`](#counterexample-of-validate_assignmenttrue)
    - [**DIY**: Counterexample of `validate_default=True`](#diy-counterexample-of-validate_defaulttrue)
4. [Nested Data & Serialization](#nested-data--serialization)
    - [Nested Pydantic Model](#nested-pydantic-model)
    - [Create pydantic model](#create-pydantic-model)
    - [Export pydantic model](#export-pydantic-model)
    - [**DIY**: Export pydantic model to JSON file](#diy-export-pydantic-model-to-json-file)
5. [More Advanced Usage](#more-advanced-usage)
    - [Hashable pydantic model](#hashable-pydantic-model)
    - [Validate function input by type hints](#validate-function-input-by-type-hints)

## Simple Pydantic Model

Below is a simple pydantic model example, consider it as a config for a project. We will see examples for validation of each field one by one.

In [None]:
class ExampleConfig(BaseModel):
    model_config = ConfigDict(extra="forbid", validate_assignment=True, validate_default=True)

    data_format: Literal["DB", "SAP"] = "DB"
    input_path: DirectoryPath | FilePath = Path("requirements.txt")
    start_date: date = datetime.now().date()
    window: int = 7

### Integer Validation

Pydantic will automatically cast the type (`str` to `int`, `float` to `int`), only when it is valid integer (`window=6.1` won't work)

In [None]:

print(ExampleConfig(window=6.0))
print(ExampleConfig(window="5"))
print(ExampleConfig(window="6.0"))
# print(ExampleConfig(window=6.1))
# print(ExampleConfig(window="6.1"))

### Path Validation

Pydantic allows easy validation on FilePath or DirectoryPath. Validation error will raise if path not valid.

In [None]:
print(type(ExampleConfig().input_path))
print(ExampleConfig(input_path=r"./requirements.txt"))
print(ExampleConfig(input_path=r"./.venv"))
# print(ExampleConfig(input_path="not_exist.csv"))

### Datetime Parsing

Datetime is easy to parse as well, as long as it is "YYYY-MM-DD" format

In [None]:
print(ExampleConfig(start_date=datetime.now().date()).start_date.day)
print(ExampleConfig(start_date="2022-01-01"))
# print(ExampleConfig(start_date="2022/01/01"))

### **DIY**: Explore other built-in types validation

Pydantic also has a lot more built-in types for validation. Try changing `window`, `start_date`, `data_format`, `item` values to pass/fail the validation.

In [None]:
class NewConfig(BaseModel):
    data_format: Literal['DB', 'SAP'] = "DB"
    window: PositiveInt = 7
    start_date: PastDate = date(2023, 11, 27)
    item: int = Field(ge=0, le=255, default=2)

print("NewConfig:", NewConfig())

## Custom Validation

Pydantic allow a lot of customization, if built-in validation and types doesn't suit your needs. It is easy to implement custom validations.

### Custom Types

There are data types are not built-in supported, such as pd.DataFrame, np.ndarray etc.

Pydantic will validate the type of those type with `arbitrary_types_allow = True`. It will validate like this `assert type(something) == pd.DataFrame`

If type are not in one of these in official doc, you need to set the `arbitrary_types_allow = True`

- [Standard Library Types](https://docs.pydantic.dev/latest/api/standard_library_types/)
- [Pydantic Types](https://docs.pydantic.dev/latest/api/types/)
- [Network Types](https://docs.pydantic.dev/latest/api/networks/)

In [None]:
class CustomConfig(ExampleConfig, arbitrary_types_allowed=True):
    df: pd.DataFrame

print(CustomConfig(df=pd.DataFrame()))
# print(CustomConfig(df="something"))


### Custom Validators

If you want to do more than just validate the type, you can implement custom validators.

You have to raise `ValueError` when validation fails, pydantic will catch it to raise ValidationError.

In [None]:
class CustomConfig(ExampleConfig, arbitrary_types_allowed=True):
    df: pd.DataFrame

    @field_validator("df")
    def validate_df(cls, df: pd.DataFrame):
        if df.empty:
            raise ValueError("`df` shouldn't be Empty!")
        return df

In [None]:
print(CustomConfig(df=pd.DataFrame({'a': [1, 2, 3]})))
# print(CustomConfig(df=pd.DataFrame()))

### Custom validate "before" default validation

Custsom validators default is "after" which means pydantic will run custom validator after default validator.

In case like `output_path`, user could input a path that is not exist. Implement `"before"` custom validator to make the path, then validate it is a valid directory path.

In [None]:
class CustomConfig(ExampleConfig):
    output_path: DirectoryPath = Path("output")

    @field_validator("output_path", mode="before")
    def validate_directory(cls, directory_path):
        if not os.path.exists(directory_path):
            os.makedirs(directory_path)
        return directory_path

In [None]:
print(CustomConfig())

### **DIY**: Explore docs for supported types

- [Standard Library Types](https://docs.pydantic.dev/latest/api/standard_library_types/)
- [Pydantic Types](https://docs.pydantic.dev/latest/api/types/)
- [Network Types](https://docs.pydantic.dev/latest/api/networks/)

## Pydantic Configuration

Pydantic has configurations, see full list [here](https://docs.pydantic.dev/latest/api/config/). These configurations are used to control behaviours of pydantic validation.

### Common Configurations

Here some common use configurations are summarized, examples are in later sections:

- **extra**: `Literal["allow", "forbid", "ignore"]` default is `"ignore"`, recommmend to set `"forbid"` all time.

- **validate_assignment**: `bool`, default is `False`, recommend to set `True` all time.

- **validate_default**: `bool`, default is `False`, set `True` if you also want to validate the default value.

- **arbitrary_types_allowed**: `bool`, default is `False`, set `True` when you have not supported types (like pandas, numpy etc.).

- **frozen**: `bool`, default is `False`, set  `True` when you want pydantic model to be hashable.


In [None]:
class Config(BaseModel, validate_assignment=True):
    model_config = ConfigDict(extra="forbid", validate_assignment=True, validate_default=True)

    input_path: FilePath = Path("requirements.txt")

### Counterexample of `extra="forbid"`

When we didn't set `extra="forbid"`:

In [None]:
class Config(BaseModel):
    model_config = ConfigDict(extra="ignore", validate_assignment=True, validate_default=True)

    input_path: FilePath = Path("requirements.txt")

config = Config(inputs_path="requirements_2.txt")
print(config)

### Counterexample of `validate_assignment=True`

When we didn't set `validate_assignment=True`:

In [None]:
class Config(BaseModel):
    model_config = ConfigDict(extra="forbid", validate_assignment=False, validate_default=True)

    input_path: FilePath = Path("requirements.txt")

config = Config()
config.input_path = [123]
print(config)

### **DIY**: Counterexample of `validate_default=True`

Try come up with counterexample for `validate_default` yourself!

In [None]:
class Config(BaseModel):
    model_config = ConfigDict(extra="forbid", validate_assignment=True, validate_default=True)

    input_path: FilePath = Path("requirements.txt")

config = Config()
print(config)

## Nested Data & Serialization

### Nested Pydantic Model

Pydantic model can be nested, try avoid using dictionary for nested data, or passing dictionary with lots of keys use pydantic instead.


In [None]:
class BaseConfig(BaseModel):
    model_config = ConfigDict(extra="forbid", validate_assignment=True, validate_default=True)

class DataConfig(BaseConfig):
    data_format: Literal["DB", "SAP"] = "DB"
    input_path: DirectoryPath | FilePath = Path("requirements.txt")
    start_date: date = datetime.now().date()
    window: int = 7

class ModelConfig(BaseConfig):
    num_estimators: int = 150
    max_depth: int = 7

class ProjectConfig(BaseConfig):
    data: DataConfig=DataConfig(data_format="SAP")
    model: ModelConfig=ModelConfig()

config = ProjectConfig()

print("project_config:", config)
print("model.max_depth:", config.model.max_depth)

### Create pydantic model

You can create an instance of pydantic model from Dictionary, YAML, JSON

In [None]:
config_dict = {"data": {"data_format": "SAP"}, "model": {"max_depth": 5}}
project_config = ProjectConfig(**config_dict)
print(project_config)

In [None]:
with open("config/project_config.yaml", "r", encoding="utf-8") as file:
    data = yaml.load(file, Loader=yaml.FullLoader)
config = ProjectConfig(**data)
print(config)

### Export pydantic model

You can save pydantic object as Dictionary, YAML, JSON

In [None]:
config = ProjectConfig()
print(config.model_dump())
print(config.model_dump_json(indent=2))

### **DIY**: Export pydantic model to JSON file

In [None]:
config = ProjectConfig()
print(config.model_dump_json(indent=2))

## More Advanced Usage

### Hashable Pydantic Model

You can make Pydantic hashable, that is it can be hashed as a key in dictionary, so that you will have a multi-level information indexed dictionary.

In [None]:
class ModelConfig(BaseConfig, frozen=True):
    num_estimators: int = 150
    max_depth: int = 7

class Model(BaseModel):
    model_config = ConfigDict(extra="forbid", validate_assignment=True, validate_default=True, frozen=True)
    name: str
    config: ModelConfig

    def __hash__(self):
        return hash((self.name, self.config))

    def __eq__(self, other):
        return (self.name, self.config) == (other.name, other.config)

In [None]:
running_models: dict[Model, HttpUrl] = {}

model_example = Model(name="example", config=ModelConfig())
running_models[model_example] = "http://www.example.com"
print(running_models[model_example])

model_example_2 = Model(name="example", config=ModelConfig(max_depth=6))
running_models[model_example_2] = "http://www.example2.com"
print(running_models)

### Validate function input by type hints

You can use pydantic's `validate_call` decorator, to validate all the input types in your type hints. `validate_call` also has configurations as discussed in previous sections.

In [None]:
@validate_call(config=ConfigDict(arbitrary_types_allowed=True))
def get_all_running_models(running_models: dict[Model, HttpUrl], dummy_df: pd.DataFrame):
    return running_models.keys()

print(get_all_running_models(running_models, dummy_df=pd.DataFrame()))
print(get_all_running_models(running_models={model_example_2: "http://example2.au"}, dummy_df=pd.DataFrame()))
# print(get_all_running_models(running_models, dummy_df=2))
# print(get_all_running_models(running_models={"model": 2}, dummy_df=pd.DataFrame()))