Python offers a few ways to *build a simple class that is just a collection of fields*, with little or no extra functionality. That pattern is known as a “data class” — and data classes is one of the packages that supports this pattern. This chapter covers three
different class builders that you may use as shortcuts to write data classes:
- `collections.namedtuple` (since Python 2.6)
- `typing.NamedTuple` (since Python 3.6)
- `@dataclasses.dataclass`: A class decorator that allows more customization than previous alternatives (since Python 3.7)

## Overview of Data Class Builders

In [26]:
class Coordinate:
    def __init__(self, lat, lon):
        self.lat = lat
        self.lon = lon

moscow = Coordinate(55.76, 37.62)
print(moscow)
location = Coordinate(55.76, 37.62)
moscow == location

<__main__.Coordinate object at 0x71ce26a24950>


False

In [25]:
type(moscow)

__main__.Coordinate

**Writing the `__init__` boilerplate becomes old real fast**:
- `__repr__` inherited from `object` is not very helpful
- the `__eq__` method inherited from object compares object IDs => meaningless `==`
- Comparing two coordinates requires explicit comparison of each attribute

In [7]:
(location.lat, location.lon) == (moscow.lat, moscow.lon)

True

Let's try to use `collections`'s `namedtuple` with useful `__repr__` and `__eq__`.

In [31]:
from collections import namedtuple
Coordinate = namedtuple('Coordinate', 'lat lon')
moscow = Coordinate(55.756, 37.617)
moscow

Coordinate(lat=55.756, lon=37.617)

In [30]:
isinstance(moscow, tuple)

True

In [14]:
moscow == Coordinate(lat=55.756, lon=37.617)

True

`typing.NamedTuple` provides the same functionality with type hints

In [15]:
import typing
Coordinate = typing.NamedTuple('Coordinate', [('lat', float), ('lon', float)])

In [17]:
typing.get_type_hints(Coordinate)

{'lat': float, 'lon': float}

In [34]:
from typing import NamedTuple

class Coordinate(NamedTuple):
    lat: float
    lon: float

    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'W'

        return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'

c = Coordinate(lat=55.756, lon=37.617)
print(c)
isinstance(c, tuple)

55.8°N, 37.6°E


True

In [35]:
str(c)

'55.8°N, 37.6°E'

### Main features of different data class builders

<img src="../images/9.png" style="width: 70%;">

A key difference between these class builders is that `collections.namedtuple` and `typing.NamedTuple` build tuple subclasses, therefore the instances are immutable. By default, `@dataclass` produces mutable classes. But the decorator accepts a keyword argument `frozen` to make it immutable

In [22]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    lat: float
    lon: float
    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'W'
        return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'
    
str(Coordinate(lat=55.756, lon=37.617))


'55.8°N, 37.6°E'

### `collections.namedtuple`
The `collections.namedtuple` function is a factory that builds subclasses of tuple enhanced with field names, a class name, an informative `__repr__`. It also offers useful functionalities such as `__lt__`, `_asdict()`...

In [36]:
from collections import namedtuple

# Two parameters are required to create a named tuple: class name and a list of field names
City = namedtuple('City', ['name', 'country', 'population', 'coordinates'])
tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
tokyo.population

36.933

In [37]:
City._fields

('name', 'country', 'population', 'coordinates')

In [38]:
Coordinate = namedtuple('Coordinate', 'lat lon')
delhi_data = ('Delhi NCR', 'IN', 21.935, Coordinate(28.613889, 77.208889))
delhi = City._make(delhi_data)
delhi._asdict()

{'name': 'Delhi NCR',
 'country': 'IN',
 'population': 21.935,
 'coordinates': Coordinate(lat=28.613889, lon=77.208889)}

In [39]:
import json
json.dumps(delhi._asdict())

'{"name": "Delhi NCR", "country": "IN", "population": 21.935, "coordinates": [28.613889, 77.208889]}'

### `typing.NamedTuple`

In [41]:
from typing import NamedTuple

class Coordinate(NamedTuple):
    # Every instance field of NamedTuple must be annotated with a typ
    lat: float
    lon: float
    reference: str = 'WGS84'

## Type Hints 101

Type hints (type annotations), are ways to declare the expected type of function arguments, return values, variables, and attributes.
- Type hints are not enforced at all by the Python bytecode compiler and interpreter, hence have *no impact on the runtime behavior* of Python programs

In [42]:
import typing

class Coordinate(typing.NamedTuple):
    lat: float
    lon: float

trash = Coordinate('Ni!', None)
trash

Coordinate(lat='Ni!', lon=None)

The type hints are intended primarily to support third-party type checkers, like `Mypy` or the PyCharm IDE built-in type checker. These are *static analysis tools*: they check Python source code “at rest,” not running code.

Type hints have no effect at runtime, however, at import time—when a module is loaded—Python does read them to build the `__annotations__` dictionary that `typing.NamedTuple` and `@dataclass` then use to enhance the class.

In [1]:
class DemoPlainClass:
    a: int  # a becomes an entry in __annotations__, but is otherwise discarded: no attribute named a is created in the class.
    b: float = 1.1  # b is saved as an annotation, and also becomes a class attribute with value 1.1
    c = "spam"  # c is just a plain old class attribute, not an annotation

DemoPlainClass.__annotations__

{'a': int, 'b': float}

In [2]:
DemoPlainClass.a

AttributeError: type object 'DemoPlainClass' has no attribute 'a'

In [3]:
print(DemoPlainClass.b)
print(DemoPlainClass.c)

1.1
spam


Note that the `__annotations__` special attribute is created by the interpreter to record the type hints that appear in the source code—even in a plain class.  
*The `a` survives only as an annotation. It doesn’t become a class attribute because no value is bound to it.6 The `b` and `c` are stored as class attributes* because they are bound to values.

New instance of the `DemoPlainClass`

In [4]:
x = DemoPlainClass()


In [5]:
x.a

AttributeError: 'DemoPlainClass' object has no attribute 'a'

In [9]:
print(x.b)
print(x.c)

1.1
spam


#### Inspecting a `typing.NamedTuple`

In [10]:
import typing

class DemoNTClass(typing.NamedTuple):
    a: int  # a becomes an annotation and also an instance attribute
    b: float = 1.1  # b is another annotation, and also becomes an instance attribute with default value 1.1.
    c = 'spam'  # c is just a plain old class attribute; no annotation will refer to it

In [11]:
DemoNTClass.__annotations__

{'a': int, 'b': float}

In [14]:
print(DemoNTClass.a)  # a and b are descriptors
print(DemoNTClass.b)
print(DemoNTClass.c)

_tuplegetter(0, 'Alias for field number 0')
_tuplegetter(1, 'Alias for field number 1')
spam


In [15]:
DemoNTClass.__doc__

'DemoNTClass(a, b)'

Inspect an instance of `DemoNTClass`

In [16]:
nt = DemoNTClass(8)
print(nt.a)
print(nt.b)
print(nt.c)

8
1.1
spam


#### Inspecting a class decorated with `dataclass`

In [18]:
from dataclasses import dataclass

@dataclass
class DemoDataClass:
    a: int  # a becomes an annotation and also an instance attribute controlled by a descriptor
    b: float = 1.1  # b is another annotation, and also becomes an instance attribute with a descriptor and a default value 1.1
    c = 'spam'  # c is just a plain old class attribute; no annotation will refer to it

In [19]:
DemoDataClass.__annotations__

{'a': int, 'b': float}

In [20]:
DemoDataClass.__doc__

'DemoDataClass(a: int, b: float = 1.1)'

In [22]:
print(DemoDataClass.b)
print(DemoDataClass.c)
print(DemoDataClass.a)

1.1
spam


AttributeError: type object 'DemoDataClass' has no attribute 'a'

`a` attribute will only exist in instances of `DemoDataClass`. It will be a public attribute that we can get and set, unless the class is frozen. But `b` and `c` exist as class attributes, with `b` holding the default value for the `b` instance attribute, while `c` is just a class attribute that will not be bound to the instances.

In [23]:
dc = DemoDataClass(9)
print(dc.a)
print(dc.b)
print(dc.c)


9
1.1
spam


As mentioned, `DemoDataClass` instances are mutable (except when it's freezed)—and no type checking is done at runtime

In [24]:
dc.a = 10
dc.b = 'oops'

In [25]:
dc.c = 'whatever'

## More about `@dataclass`

The decorator `@dataclass` accepts these arguments `@dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)`.  
- If the `eq` and `frozen` arguments are both `True`, `@dataclass` produces a suitable `__hash__` method, so the instances will be hashable.
- If `frozen=False` (the default), `@dataclass` will set `__hash__` to `None`, signalling that the instances are unhashable, therefore overriding `__hash__` from any superclass

Mutable default values are a common source of bugs. Class attributes are often used as default attribute values for instances, including in data classes. And `@dataclass` uses the default values in the type hints to generate parameters with defaults for `__init__`. To prevent bugs, `@dataclass` rejects the class definition like below

In [2]:
from dataclasses import dataclass

@dataclass
class ClubMember:
    name: str
    guests: list = []

ValueError: mutable default <class 'list'> for field guests is not allowed: use default_factory

We have to use a `default_factory`:

In [3]:
from dataclasses import dataclass, field

@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)

The `default_factory` parameter lets you provide a function, class, or any other callable, which will be invoked with zero arguments to build a default value each time an instance of the data class is created. *This way, each instance of `ClubMember` will have its own list—instead of all instances sharing the same list from the class, which is rarely what we want and is often a bug*.

#### Post-init Processing

The `__init__` method generated by `@dataclass` only takes the arguments passed and assigns them—or their default values, if missing—to the instance attributes that are instance fields. But you may need to do more than that to initialize the instance.
If that’s the case, you can provide a `__post_init__` method. When that method
exists, `@dataclass` will add code to the generated `__init__` to call `__post_init__` as the last step.

In [3]:
from dataclasses import dataclass, field
from typing import ClassVar

@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)
    athlete: bool = field(default=False, repr=False)

@dataclass
class HackerClubMember(ClubMember):
    all_handles: ClassVar[set[str]] = set()  # class attribute
    handle: str = ''  # instance field

    def __post_init__(self):
        cls = self.__class__
        if self.handle == '':
            self.handle = self.name.split()[0]
        if self.handle in cls.all_handles:
            msg = f'handle {self.handle!r} already exists.'
            raise ValueError(msg)
        cls.all_handles.add(self.handle)
