<img src='../images/cards.png' width='150px' align='right' style="padding: 15px">

# Data Classes

## Goal

In this notebook we shall explore data classes: convenient interface to create classes that contains mostly data (although no restrictions exists!).

## Program

- [Motivation]()
- [Type hinting]()
- [Immutability]()
- [NamedTuple vs. Dataclass]()

## Motivation

Data classes were introduced from Python 3.7 (see PEP [557](https://www.python.org/dev/peps/pep-0557/)).

Essentialy they are just regular classes that are geared towards storing state, rather than containing a lot of logic.

This means that every time you create a class that mostly consists of attributes, you make a data class.

What the **dataclasses** module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you!


Where you would have done this in the past:

In [None]:
class InventoryItem:
    def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None:
        self.name = name
        self.unit_price = unit_price
        self.quantity_on_hand = quantity_on_hand
        
    def __repr__(self):
        return (f'InventoryItem(name={self.name!r}, unit_price={self.unit_price!r}, '
                f'quantity_on_hand={self.quantity_on_hand!r})')
        
    def __eq__(self, other):
        if other.__class__ is self.__class__:
            return (self.name, self.unit_price, self.quantity_on_hand) \
            == (other.name, other.unit_price, other.quantity_on_hand)
        return NotImplemented

i = InventoryItem('milk', 2, 0)
i

We can now write this

In [None]:
from dataclasses import dataclass

@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

If our `init` is more complicated (for example, we may have attributes that depend on other attributes), we can use a `__post_init__` method
```
def __post_init__(self):
    self.total_cost = self.unit_price * self.quantity_on_hand
```

In [None]:
from dataclasses import dataclass, field

@dataclass
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    total_cost: float = field(init=False) # Question! What is the effect of setting init to be True?
    quantity_on_hand: int = 0
 

    def __post_init__(self):
        self.total_cost = self.unit_price * self.quantity_on_hand
        
i = InventoryItem('milk', 2, 1)
i

## Type hinting
Type hint is mandatory when defining the fields in your data class. Without a type hint, the field will not be a part of the data class.

However, types are **not** enforced when initializing the class

In [None]:
i = InventoryItem('milk', 2, 'cow')
i

In [None]:
i.total_cost

In this case, you just get some odd behaviour, but you can trigger errors in this way too:

In [None]:
# NBVAL_RAISES_EXCEPTION
InventoryItem('milk', 2, {})

## Immutability
One nice thing about namedtuples is that they're immutable. 

Data classes can be made immutable (i.e. once initialized, no fields can be changed) by using `@dataclass(frozen=True)`

In [None]:
from dataclasses import dataclass

@dataclass(frozen=True)
class FrozenInventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self):
        return self.unit_price * self.quantity_on_hand

In [None]:
f = FrozenInventoryItem('milk', 2, 1)

In [None]:
f.total_cost()

In [None]:
# NBVAL_RAISES_EXCEPTION
f.unit_price = 12

## NamedTuple vs Dataclass

Given the functionality, it's natural to compare `NamedTuples` with `Dataclasses`. The biggest differences between `NamedTuple` and `Dataclass` (apart from availability depending on Python version) are:
- `NamedTuple` is immutable by default, and cannot be mutable; `Dataclass` is mutable by default, but can be made immutable
- The implementation of `NamedTuple` is based on tuples, whereas `Dataclass` is based on `Dict`. This leads to some differences in behaviour:

In [None]:
import collections
Point = collections.namedtuple('Point', ['x', 'y'])
p1 = Point(1,2)
p1

In [None]:
p1[0]

In [None]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Pointer:
    x: float
    y: float
p2 = Pointer(1,2)
p2

In [None]:
p2.x

`NamedTuple` will allow you to unpack the values, whereas `Dataclass` won't:

In [None]:
x, y = p1
print(x)
print(y)

In [None]:
# NBVAL_RAISES_EXCEPTION
x, y = p2

Due to the underlying implementation, there are some performance considerations to take into account when choosing between these two data structures. There is a nice blog [here](https://medium.com/@jacktator/dataclass-vs-namedtuple-vs-object-for-performance-optimization-in-python-691e234253b9) that compares the two. In summary:

- DataClass is faster at reading the object, nested properties and executing functions.
- NamedTuple is faster at only creating the object.

# Summary

Dataclasses are a powerful tool in your Python toolbox that allow you to deal with data easily and quickly, without having to define a lot of boilerplate class methods.

You can use the `__post_init__` method to do more complex initialization.

Dataclasses and NamedTuples are quite similar in terms of usage. The biggest differences:
- `NamedTuple` is immutable by default, and cnanot be mutable; `Dataclass` is mutable by default, but can be me made immutable
- The implementation of `NamedTuple` is based on tuples, whereas `Dataclass` is based on `Dict`.