# Typing and Pydantic

Brief

## Python type hints

Python allows you to optionally specify the types of variables e.g.

In [1]:
def add_one(n: int) -> int:
    return n + 1

The function works when we use the 

In [2]:
add_one(1)

2

... but it also works when we use the "wrong" type, e.g. passing a float instead of a 

In [3]:
add_one(1.5)

2.5

That example works because `int` can be converted to `float`, so the operation `+` works as expected.

If we tried with an incompatible data type, then the operation fails,

In [4]:
add_one('Hello')

TypeError: can only concatenate str (not "int") to str

Summary: type hint does not change any behaivour; they are just hints.

**So why might you want to use type hints?**

- ...?

[Here](https://dagster.io/blog/python-type-hinting) are some reasons why.

## Dataclasses

The above example uses type hints for a standalone function. They can also be used in other structures, e.g. in a [dataclass](https://docs.python.org/3/library/dataclasses.html),

In [5]:
from dataclasses import dataclass

@dataclass
class InventoryItem:
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

In [6]:
my_inventory = InventoryItem(name='Apple', unit_price=0.95, quantity_on_hand=5)
my_inventory.total_cost()

4.75

Dataclasses are useful structures, removing much of the [boilerplate code](https://en.wikipedia.org/wiki/Boilerplate_code) needed when normally defining classes e.g. for `InventoryItem` we have not needed to define an `__init__` method, which could have looked like this,

```
def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
    self.name = name
    self.unit_price = unit_price
    self.quantity_on_hand = quantity_on_hand
```

But still, we do not get an error if we pass a variable of the wrong type,

In [7]:
my_inventory = InventoryItem(name='Apple', unit_price=0.95, quantity_on_hand='Apple')

In [8]:
my_inventory.quantity_on_hand

'Apple'

## Data Validation 

This is checking that the data satisfies some expected criteria.

We could manually validate that the provided `quantity_on_hand` is indeed an `int` by adding an assert statement,

In [9]:
@dataclass
class InventoryItem:
    """Class for keeping track of an item in inventory."""

    def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
        self.name = name
        self.unit_price = unit_price
        assert type(quantity_on_hand)  == int
        self.quantity_on_hand = quantity_on_hand
    
    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

In [10]:
my_inventory = InventoryItem(name='Apple', unit_price=0.95, quantity_on_hand='Apple')

AssertionError: 

**Can you see any downsides to manually validating this way?**

- ...?

It is easy to quickly get stuck in lots of problems like this when manually validating code.

There are [lots of](https://medium.com/@thomas.a.roche/python-data-validation-using-pydantic-34306e88492c) solutions around this. One is a third party library

## [Pydantic](https://docs.pydantic.dev/latest/)

Pydantic is the most widely used data validation library for Python. First install it, via

`pip install pydantic`

Then we specify that our class is a pydantic `BaseModel`, e.g.

In [11]:
from pydantic import BaseModel

class InventoryItem(BaseModel):
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

In [12]:
my_inventory = InventoryItem(name='Apple', unit_price=0.95, quantity_on_hand='Apple')

ValidationError: 1 validation error for InventoryItem
quantity_on_hand
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='Apple', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/int_parsing

But we keep the nice behaivour that floats still work,

In [13]:
my_inventory = InventoryItem(name='Apple', unit_price=0.95, quantity_on_hand=1.)

In contrast to the `dataclass` implementation above, the `quantity_on_hand` attribute is actually converted to a float,

In [14]:
my_inventory.quantity_on_hand

1

`pydantic` has lots more options data validation, e.g. if we want to insist that `quantity_on_hand` is a positive integer, greater than 0, then we do:

In [15]:
from pydantic import BaseModel, PositiveInt

class InventoryItem(BaseModel):
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: PositiveInt

In [16]:
my_inventory = InventoryItem(name='Apple', unit_price=0.95, quantity_on_hand=0)

ValidationError: 1 validation error for InventoryItem
quantity_on_hand
  Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]
    For further information visit https://errors.pydantic.dev/2.5/v/greater_than