# Data Class Builders

- `collections.namedtuple`
- `typing.NamedTuple`
- `@dataclasses.dataclass`

## Overview of Data Class Builders

Example 5-1. `class/coordinates.py`

In [None]:
class Coordinate:

    def __init__(self, lat, lon):
        self.lat = lat
        self.lon = lon


In [None]:
moscow = Coordinate(55.76, 37.62)
moscow

`__repr__` inherited from `object` is not very useful

In [None]:
location = Coordinate(55.76, 37.62)
location == moscow

Meaningless `==`; the `__eq__` method inherited from `object` compares object IDs.

In [None]:
(location.lat, location.lon) == (moscow.lat, moscow.lon)

Comparing the two coordinates requires explicit comparison of each attribute.

In [None]:
from collections import namedtuple

Coordinate = namedtuple("Coordinate", "lat lon")

In [None]:
issubclass(Coordinate, tuple)

In [None]:
moscow = Coordinate(lat=55.756, lon=37.617)
moscow

useful `__repr__`

In [None]:
moscow == Coordinate(lat=55.756, lon=37.617)

In [None]:
Coordinate(lat="100", lon="100")

Meaningful `__eq__`

In [None]:
import typing

Coordinate = typing.NamedTuple(
    "Coordinate",
    lat=float,
    lon=float,
)

In [None]:
issubclass(Coordinate, tuple)

In [None]:
typing.get_type_hints(Coordinate)

In [None]:
Coordinate(lat="100", lon="100")

Example 5-2. `typing_namedtuple/coordinates.py`

In [None]:
from typing import NamedTuple

class Coordinate(NamedTuple):
    lat: float
    lon: float

    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'W'
        return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'

str(Coordinate(lat=100, lon=2))

In [None]:
issubclass(Coordinate, typing.NamedTuple)

In [None]:
issubclass(Coordinate, tuple)

Example 5-3. `dataclass/coordinates.py`

In [None]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinate:
    lat: float
    lon: float

    def __str__(self):
        ns = 'N' if self.lat >= 0 else 'S'
        we = 'E' if self.lon >= 0 else 'W'
        return f'{abs(self.lat):.1f}°{ns}, {abs(self.lon):.1f}°{we}'

## Classic Named Tuples

Example 5-4. Defining and using a named tuple type

In [None]:
from collections import namedtuple

City = namedtuple("City", "name country population coordinates")

In [None]:
tokyo = City("Tokyo", "JP", 36.933, (35.689722, 139.691667))
tokyo

In [None]:
tokyo.population

In [None]:
tokyo.coordinates

In [None]:
tokyo[1]

In [None]:
list(tokyo)

Example 5-5. Named tuple attributes and methods (continued from the previous example)

In [None]:
City._fields

In [None]:
Coordinate = namedtuple("Coordinate", "lat lon")

In [None]:
delhi_data = ("Delhi NCR", "IN", 21.935, Coordinate(28.613889, 77.208889))
delhi_data

In [None]:
delhi = City._make(delhi_data)
delhi = City(*delhi_data)
delhi

In [None]:
delhi._asdict()

In [None]:
import json

json.dumps(delhi._asdict())

Example 5-6. Named tuple attributes and methods, continued from 5-5

In [None]:
Coordinate = namedtuple("Coordinate", "lat long reference", defaults=["WGS84"])

In [None]:
Coordinate(0, 0)


In [None]:
Coordinate._field_defaults

## Typed Named Tuples

Example 5-8. `typing_namedtuple/coordinates2.py`

In [None]:
from typing import NamedTuple

class Coordinate(NamedTuple):
    lat: float
    lon: float
    reference: str = 'WGS84'

coord = Coordinate(1,1)
coord

### Type Hints 101

#### No Runtime Effect

Example 5-9. Python does not enforce type hints at runtime

In [None]:
import typing

class Coordinate(typing.NamedTuple):
    lat: float
    lon: float


In [None]:
trash = Coordinate("Ni!", None)

In [None]:
print(trash)

## Variable Annotation Syntax

Example 5-10. meaning/demo_plain.py: a plain class with type hints

In [None]:
class DemoPlainClass:
    a: int           # <1>
    b: float = 1.1   # <2>
    c = 'spam'       # <3>

In [None]:
DemoPlainClass.__annotations__

In [None]:
DemoPlainClass.a

In [None]:
demo = DemoPlainClass(8)
demo.a

In [None]:
DemoPlainClass.b

In [None]:
DemoPlainClass.c

Example 5-11. meaning/demo_nt.py: a class built with `typing.NamedTuple`

In [None]:
import typing

class DemoNTClass(typing.NamedTuple):
    a: int           # <1>
    b: float = 1.1   # <2>
    c = 'spam'       # <3>


In [None]:
DemoNTClass.__annotations__

In [None]:
DemoNTClass.a

In [None]:
DemoNTClass.b

In [None]:
DemoNTClass.c

In [None]:
DemoNTClass.__doc__

In [None]:
nt = DemoNTClass(8)
nt

In [None]:
nt.a

In [None]:
nt.b

In [None]:
nt.c

What happens if we try to assign values to nt.a, nt.b, ...?

In [None]:
nt.a = 9

In [None]:
nt.b = 10

In [None]:
nt.c = "spam spam"

In [None]:
nt.z = "new field"

Example 5-12: meaning/demo_dc.py: a class decorated with @dataclass

In [None]:
from dataclasses import dataclass

@dataclass
class DemoDataClass:
    a: int           # <1>
    b: float = 1.1   # <2>
    c = 'spam'       # <3>

# which are class attributes, and which are instance attributes?

In [None]:
DemoDataClass.__annotations__

In [None]:
DemoDataClass.__doc__

In [None]:
DemoDataClass.a

In [None]:
DemoDataClass.b

In [None]:
DemoDataClass.c

In [None]:
class Example:
    clas_variable = 1

    def __init__(self, val):
        self.instance_variable = val

e = Example(1)

In [None]:
e.instance_variable

In [None]:
Example.instance_variable

In [None]:
dc = DemoDataClass(9)

In [None]:
dc2 = DemoDataClass(0)
dc2.c = "ham"
dc2.c

In [None]:
dc1.c

In [None]:
dc.a

In [None]:
dc.b

In [None]:
dc.c

In [None]:
dc.a = 10
dc.a

In [None]:
dc.b = "oops"
dc.b

In [None]:
dc.c = "whatever"
dc.c

In [None]:
dc.z = "secret stash"
dc.z

## Field Options

Example 5-13. dataclass/club_wrong.py: this class raises ValueError

In [None]:
from dataclasses import dataclass

@dataclass
class ClubMember:
    name: str
    guests: list = []


ClubMember()

Example 5-14: dataclass/club.py: this ClubMember definition works

In [None]:
from dataclasses import dataclass, field

@dataclass
class ClubMember:
    name: str
    guests: list = field(default_factory=list)

Example 5-15: dataclass/club_generic.py: this ClubMember definition is more precise

In [23]:
from dataclasses import dataclass, field
from typing import List

@dataclass
class ClubMember:
    name: str
    guests: List[str] = field(default_factory=list)  # <1>

## Post-init Processing

Example 5-16: dataclass/hackerclub.py: doctests for HackerClubMember

In [24]:
"""
``HackerClubMember`` objects accept an optional ``handle`` argument::

    >>> anna = HackerClubMember('Anna Ravenscroft', handle='AnnaRaven')
    >>> anna
    HackerClubMember(name='Anna Ravenscroft', guests=[], handle='AnnaRaven')

If ``handle`` is omitted, it's set to the first part of the member's name::

    >>> leo = HackerClubMember('Leo Rochael')
    >>> leo
    HackerClubMember(name='Leo Rochael', guests=[], handle='Leo')

Members must have a unique handle. The following ``leo2`` will not be created,
because its ``handle`` would be 'Leo', which was taken by ``leo``::

    >>> leo2 = HackerClubMember('Leo DaVinci')
    Traceback (most recent call last):
      ...
    ValueError: handle 'Leo' already exists.

To fix, ``leo2`` must be created with an explicit ``handle``::

    >>> leo2 = HackerClubMember('Leo DaVinci', handle='Neo')
    >>> leo2
    HackerClubMember(name='Leo DaVinci', guests=[], handle='Neo')
"""

from dataclasses import dataclass

@dataclass
class HackerClubMember(ClubMember):                         # <1>
    all_handles = set()                                     # <2>
    handle: str = ''                                        # <3>

    def __post_init__(self):
        cls = self.__class__                                # <4>
        if self.handle == '':                               # <5>
            self.handle = self.name.split()[0]
        if self.handle in cls.all_handles:                  # <6>
            msg = f'handle {self.handle!r} already exists.'
            raise ValueError(msg)
        cls.all_handles.add(self.handle)                    # <7>


In [25]:
jam = HackerClubMember("Jamie Brandon")
john = HackerClubMember("John McCain")

jam

HackerClubMember(name='Jamie Brandon', guests=[], handle='Jamie')

In [26]:
john

HackerClubMember(name='John McCain', guests=[], handle='John')

In [27]:
HackerClubMember("Jamie Lynn Spears")

ValueError: handle 'Jamie' already exists.

This example works as intended, but it is not satisfactory to a static type checker. Read the section Typed Class Attributes see why, and how to fix it.

## Initialization Variables that are Not Fields

5-18. Example from the `dataclasses` module documentation

In [None]:
from dataclasses import InitVar


@dataclass
class C:
    i: int
    j: int = None
    database: InitVar[DatabaseType] = None

    def __post_init__(self, database):
        if self.j is None and database is not None:
            self.j = database.lookup("j")

c = C(10, database=my_database)

`database` will not be set as an instance attribute, and the `dataclass.fields` function will not list it!

## `@dataclass` Example: Dublin Core Resource Record

Example 5-19. `dataclass/resource.py` code for `Resource`, a class based on Dublin Core terms

In [34]:
from dataclasses import dataclass, field
from typing import Optional
from enum import Enum, auto
from datetime import date


class ResourceType(Enum):  # <1>
    BOOK = auto()
    EBOOK = auto()
    VIDEO = auto()

ResourceType.EBOOK == ResourceType.BOOK


False

In [36]:
from typing import List

@dataclass
class Resource:
    """Media resource description."""
    identifier: str                                    # <2>
    title: str = '<untitled>'                          # <3>
    creators: List[str] = field(default_factory=list)
    date: Optional[date] = None                        # <4>
    type: ResourceType = ResourceType.BOOK             # <5>
    description: str = ''
    language: str = ''
    subjects: List[str] = field(default_factory=list)


Resource(
    identifier="1"
)

Resource(identifier='1', title='<untitled>', creators=[], date=None, type=<ResourceType.BOOK: 1>, description='', language='', subjects=[])

In [None]:

from typing import TypedDict


class ResourceDict(TypedDict):
    identifier: str
    title: str
    creators: List[str]
    date: Optional[date]
    type: ResourceType
    description: str
    language: str
    subjects: List[str]

In [37]:

r = Resource('0')
r


Resource(identifier='0', title='<untitled>', creators=[], date=None, type=<ResourceType.BOOK: 1>, description='', language='', subjects=[])

Example 5-20. `dataclass/resource.py`: code for `Resource`, a class based on Dublin Core terms

In [38]:
description = 'Improving the design of existing code'
book = Resource(
    '978-0-13-475759-9',
    'Refactoring, 2nd Edition',
    ['Martin Fowler', 'Kent Beck'],
    date(2018, 11, 19),
    ResourceType.BOOK,
    description,
    'EN',
    ['computer programming', 'OOP'],
)
print(book)


Resource(identifier='978-0-13-475759-9', title='Refactoring, 2nd Edition', creators=['Martin Fowler', 'Kent Beck'], date=datetime.date(2018, 11, 19), type=<ResourceType.BOOK: 1>, description='Improving the design of existing code', language='EN', subjects=['computer programming', 'OOP'])


In [39]:
book_dict: ResourceDict = {
    'identifier': '978-0-13-475759-9',
    'title': 'Refactoring, 2nd Edition',
    'creators': ['Martin Fowler', 'Kent Beck'],
    'date': date(2018, 11, 19),
    'type': ResourceType.BOOK,
    'description': 'Improving the design of existing code',
    'language': 'EN',
    'subjects': ['computer programming', 'OOP'],
}
book2 = Resource(**book_dict)
print(book == book2)


True


In [41]:
class A:
    def __init__(self, value):
        self.value = value

first = A(1)
second = A(1)

first == second

False

The repr for `book` is long and a bit hard to read. We'd like to make it look like
```
Resource(
    identifier='978-0-13-475759-9',
    title='Refactoring, 2nd Edition',
    creators=['Martin Fowler', 'Kent Beck'],
    date=datetime.date(2018, 11, 19),
    type=<ResourceType.BOOK: 1>,
    description='Improving the design of existing code',
    language='EN',
    subjects=['computer programming', 'OOP'],
    )
```

Example 5-21. `dataclass/resource_repr.py` code for `__repr__` method implemented in the `Resource` class from Example 5-19

In [43]:
from dataclasses import dataclass, field, fields

@dataclass
class Resource:
    """Media resource description."""
    identifier: str                                    # <2>
    title: str = '<untitled>'                          # <3>
    creators: List[str] = field(default_factory=list)
    date: Optional[date] = None                        # <4>
    type: ResourceType = ResourceType.BOOK             # <5>
    description: str = ''
    language: str = ''
    subjects: List[str] = field(default_factory=list)

    def __repr__(self):
        cls = self.__class__
        cls_name = cls.__name__
        indent = ' ' * 4
        res = [f'{cls_name}(']                            # <1>
        for f in fields(cls):                             # <2>
            value = getattr(self, f.name)                 # <3>
            res.append(f'{indent}{f.name} = {value!r},')  # <4>

        res.append(')')                                   # <5>
        return '\n'.join(res)                             # <6>

r = Resource(**book_dict)
r

Resource(
	identifier = '978-0-13-475759-9',
	title = 'Refactoring, 2nd Edition',
	creators = ['Martin Fowler', 'Kent Beck'],
	date = datetime.date(2018, 11, 19),
	type = <ResourceType.BOOK: 1>,
	description = 'Improving the design of existing code',
	language = 'EN',
	subjects = ['computer programming', 'OOP'],
)

## Extra Insert: Pydantic Models

In [44]:
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name = "Jane Doe"

In [48]:
user = User(id="123")
user

User(id=123, name='Jane Doe')

In [49]:
user_x = User(id="123.45")

ValidationError: 1 validation error for User
id
  value is not a valid integer (type=type_error.integer)

In [50]:
user.id

123

In [51]:
user.__fields_set__

{'id'}

In [52]:
user.dict() == dict(user) == {"id": 123, "name": "Jane Doe"}

True

In [55]:
user2 = User(id=12, name={"a": 1})
user2

ValidationError: 1 validation error for User
name
  str type expected (type=type_error.str)

Recursive models

In [57]:
from typing import List, Optional
from pydantic import BaseModel


class Foo(BaseModel):
    count: int
    size: Optional[float] = None


class Bar(BaseModel):
    apple = 'x'
    banana = 'y'


class Spam(BaseModel):
    foo: Foo
    bars: List[Bar]


m = Spam(
    foo={'count': 4},
    bars=[{'apple': 'x1'}, {'apple': 'x2'}],
)
m

Spam(foo=Foo(count=4, size=None), bars=[Bar(apple='x1', banana='y'), Bar(apple='x2', banana='y')])

In [58]:
m.dict()

{'foo': {'count': 4, 'size': None},
 'bars': [{'apple': 'x1', 'banana': 'y'}, {'apple': 'x2', 'banana': 'y'}]}

Validation errors are better in pydantic

In [59]:
from typing import List
from pydantic import BaseModel, ValidationError, conint


class Location(BaseModel):
    lat = 0.1
    lng = 10.1


class Model(BaseModel):
    is_required: float
    gt_int: conint(gt=42)
    list_of_ints: List[int] = None
    a_float: float = None
    recursive_model: Location = None


data = dict(
    list_of_ints=['1', 2, 'bad'],
    a_float='not a float',
    recursive_model={'lat': 4.2, 'lng': 'New York'},
    gt_int=21,
)

try:
    Model(**data)
except ValidationError as e:
    print(e)

5 validation errors for Model
is_required
  field required (type=value_error.missing)
gt_int
  ensure this value is greater than 42 (type=value_error.number.not_gt; limit_value=42)
list_of_ints -> 2
  value is not a valid integer (type=type_error.integer)
a_float
  value is not a valid float (type=type_error.float)
recursive_model -> lng
  value is not a valid float (type=type_error.float)


Custom Validators

In [60]:
from pydantic import BaseModel, ValidationError, validator


class Model(BaseModel):
    foo: str

    @validator('foo')
    def value_must_equal_bar(cls, v):
        if v != 'bar':
            raise ValueError('value must be "bar"')

        return v


try:
    Model(foo='ber')
except ValidationError as e:
    print(e)

1 validation error for Model
foo
  value must be "bar" (type=value_error)


You can even provide your own custom errors

In [61]:
from pydantic import BaseModel, PydanticValueError, ValidationError, validator


class NotABarError(PydanticValueError):
    code = 'not_a_bar'
    msg_template = 'value is not "bar", got "{wrong_value}"'


class Model(BaseModel):
    foo: str

    @validator('foo')
    def value_must_equal_bar(cls, v):
        if v != 'bar':
            raise NotABarError(wrong_value=v)
        return v


Model(foo='ber')


ValidationError: 1 validation error for Model
foo
  value is not "bar", got "ber" (type=value_error.not_a_bar; wrong_value=ber)