# Chapter 14: Dataclasses and Named Tuples

Python provides several ways to create structured data types. This notebook covers `dataclasses` (added in Python 3.7) and `typing.NamedTuple`, comparing them to regular classes and `collections.namedtuple`.

## Key Concepts
- **dataclass**: Auto-generates `__init__`, `__repr__`, `__eq__` and more
- **frozen dataclass**: Immutable instances, hashable by default
- **field()**: Fine-grained control over defaults and metadata
- **__post_init__**: Validation and derived attributes after init
- **typing.NamedTuple**: Typed, immutable records
- **slots=True**: Memory-efficient dataclasses (Python 3.10+)

## Basic Dataclass

The `@dataclass` decorator inspects type annotations and auto-generates common dunder methods. This eliminates the boilerplate of writing `__init__`, `__repr__`, and `__eq__` by hand.

In [None]:
from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float

# Auto-generated __init__
p1 = Point(1.0, 2.0)
p2 = Point(1.0, 2.0)
p3 = Point(3.0, 4.0)

# Auto-generated __repr__
print(f"p1 = {p1}")
print(f"repr(p1) = {repr(p1)}")

# Auto-generated __eq__ (compares field values)
print(f"\np1 == p2: {p1 == p2}")
print(f"p1 == p3: {p1 == p3}")

# Dataclasses are mutable by default
p1.x = 10.0
print(f"\nAfter mutation: p1 = {p1}")
print(f"p1 == p2 now: {p1 == p2}")

## Frozen Dataclasses: Immutability and Hashability

Setting `frozen=True` makes instances immutable. Attempting to set an attribute raises `FrozenInstanceError`. Frozen dataclasses are also hashable, so they can be used as dict keys and set members.

In [None]:
from dataclasses import dataclass

@dataclass(frozen=True)
class Color:
    r: int
    g: int
    b: int

red = Color(255, 0, 0)
blue = Color(0, 0, 255)

print(f"red = {red}")
print(f"hash(red) = {hash(red)}")

# Cannot modify frozen instances
try:
    red.r = 128
except AttributeError as e:
    print(f"\nCannot modify: {e}")

# Frozen dataclasses can be used as dict keys or set members
color_names = {red: "Red", blue: "Blue"}
print(f"\nColor lookup: {color_names[Color(255, 0, 0)]}")

unique_colors = {red, blue, Color(255, 0, 0)}  # Duplicate removed
print(f"Unique colors: {len(unique_colors)}")

## field() and Default Factories

Use `field(default_factory=...)` for mutable default values. Never use mutable defaults directly (e.g., `tags: list = []`) -- this would share the same list across all instances.

In [None]:
from dataclasses import dataclass, field

@dataclass
class Config:
    name: str
    debug: bool = False
    tags: list[str] = field(default_factory=list)
    metadata: dict[str, str] = field(default_factory=dict)

# Each instance gets its own list and dict
c1 = Config("app1")
c2 = Config("app2", debug=True)

c1.tags.append("web")
c1.tags.append("production")
c2.tags.append("cli")

print(f"c1 = {c1}")
print(f"c2 = {c2}")
print(f"\nc1.tags: {c1.tags}")
print(f"c2.tags: {c2.tags} (independent list!)")

# field() also supports repr, compare, and metadata
@dataclass
class User:
    name: str
    email: str
    password: str = field(repr=False)  # Hide from repr
    login_count: int = field(default=0, compare=False)  # Exclude from ==

u1 = User("Alice", "alice@example.com", "secret123")
u2 = User("Alice", "alice@example.com", "secret123", login_count=5)
print(f"\nUser: {u1}")  # password hidden
print(f"u1 == u2: {u1 == u2}")  # login_count ignored

## Post-Init Processing with __post_init__

`__post_init__` runs after the auto-generated `__init__`. Use it for validation, derived fields, or any setup logic.

In [None]:
from dataclasses import dataclass, field
import math

@dataclass
class Circle:
    radius: float
    area: float = field(init=False)      # Computed, not passed to __init__
    perimeter: float = field(init=False)  # Computed, not passed to __init__

    def __post_init__(self) -> None:
        if self.radius <= 0:
            raise ValueError(f"Radius must be positive, got {self.radius}")
        self.area = math.pi * self.radius ** 2
        self.perimeter = 2 * math.pi * self.radius

c = Circle(5.0)
print(f"Circle: {c}")
print(f"Area: {c.area:.2f}")
print(f"Perimeter: {c.perimeter:.2f}")

# Validation in __post_init__
try:
    bad = Circle(-1.0)
except ValueError as e:
    print(f"\nValidation error: {e}")

## Dataclass Inheritance

Dataclasses support inheritance. Child fields are appended after parent fields. Note: fields with defaults must come after fields without defaults in the combined MRO.

In [None]:
from dataclasses import dataclass

@dataclass
class Animal:
    name: str
    species: str

@dataclass
class Pet(Animal):
    owner: str
    vaccinated: bool = False

pet = Pet(name="Buddy", species="Dog", owner="Alice", vaccinated=True)
print(f"Pet: {pet}")
print(f"Is Animal? {isinstance(pet, Animal)}")

# Parent fields come first in __init__
pet2 = Pet("Whiskers", "Cat", "Bob")
print(f"Pet2: {pet2}")

# Equality works across the hierarchy
pet3 = Pet("Buddy", "Dog", "Alice", True)
print(f"\npet == pet3: {pet == pet3}")

## typing.NamedTuple: Typed Named Tuples

`typing.NamedTuple` is the modern alternative to `collections.namedtuple`. It uses class syntax with type annotations, making it cleaner and more IDE-friendly.

In [None]:
from typing import NamedTuple

class Coordinate(NamedTuple):
    latitude: float
    longitude: float
    label: str = "unknown"

# Create instances
home = Coordinate(40.7128, -74.0060, "New York")
office = Coordinate(37.7749, -122.4194, "San Francisco")
mystery = Coordinate(51.5074, -0.1278)  # Uses default label

print(f"Home: {home}")
print(f"Office: {office}")
print(f"Mystery: {mystery}")

# Named access and indexing both work (it's still a tuple)
print(f"\nLatitude by name: {home.latitude}")
print(f"Latitude by index: {home[0]}")

# Immutable
try:
    home.latitude = 0.0
except AttributeError as e:
    print(f"\nImmutable: {e}")

# Tuple unpacking works
lat, lon, label = home
print(f"\nUnpacked: lat={lat}, lon={lon}, label={label}")

# Hashable and usable as dict keys
locations = {home: "home", office: "work"}
print(f"Dict lookup: {locations[home]}")

## slots=True for Memory Efficiency (Python 3.10+)

By default, Python objects store attributes in a `__dict__`. Using `slots=True` eliminates the per-instance dict, reducing memory usage and slightly speeding up attribute access.

In [None]:
import sys
from dataclasses import dataclass

@dataclass
class RegularPoint:
    x: float
    y: float

@dataclass(slots=True)
class SlottedPoint:
    x: float
    y: float

rp = RegularPoint(1.0, 2.0)
sp = SlottedPoint(1.0, 2.0)

print(f"Regular point: {rp}")
print(f"Slotted point: {sp}")

# Memory comparison
print(f"\nRegular size: {sys.getsizeof(rp)} bytes")
print(f"Slotted size: {sys.getsizeof(sp)} bytes")

# Regular has __dict__, slotted does not
print(f"\nRegular has __dict__: {hasattr(rp, '__dict__')}")
print(f"Slotted has __dict__: {hasattr(sp, '__dict__')}")
print(f"Slotted has __slots__: {hasattr(sp, '__slots__')}")

# Slotted instances cannot have arbitrary attributes
rp.z = 3.0  # Works fine on regular
print(f"\nAdded rp.z = {rp.z}")

try:
    sp.z = 3.0  # Fails on slotted
except AttributeError as e:
    print(f"Cannot add to slotted: {e}")

## Comparing dataclass vs namedtuple vs Regular Class

When should you use each? Here is a side-by-side comparison.

In [None]:
from collections import namedtuple
from dataclasses import dataclass
from typing import NamedTuple

# Approach 1: Regular class (most verbose)
class RegularProduct:
    def __init__(self, name: str, price: float) -> None:
        self.name = name
        self.price = price

    def __repr__(self) -> str:
        return f"RegularProduct(name={self.name!r}, price={self.price!r})"

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, RegularProduct):
            return NotImplemented
        return self.name == other.name and self.price == other.price

# Approach 2: collections.namedtuple (immutable, no types)
NTProduct = namedtuple("NTProduct", ["name", "price"])

# Approach 3: typing.NamedTuple (immutable, typed)
class TypedProduct(NamedTuple):
    name: str
    price: float

# Approach 4: dataclass (mutable, typed)
@dataclass
class DCProduct:
    name: str
    price: float

# All produce similar results
products = [
    RegularProduct("Widget", 9.99),
    NTProduct("Widget", 9.99),
    TypedProduct("Widget", 9.99),
    DCProduct("Widget", 9.99),
]

for p in products:
    print(f"{type(p).__name__:20s} | {p}")

# Key differences
print("\n--- Feature Comparison ---")
print(f"{'Feature':<25} {'Regular':>10} {'namedtuple':>12} {'NamedTuple':>12} {'dataclass':>12}")
print("-" * 75)
print(f"{'Mutable':<25} {'Yes':>10} {'No':>12} {'No':>12} {'Yes*':>12}")
print(f"{'Type hints':<25} {'Manual':>10} {'No':>12} {'Yes':>12} {'Yes':>12}")
print(f"{'Auto __repr__':<25} {'No':>10} {'Yes':>12} {'Yes':>12} {'Yes':>12}")
print(f"{'Auto __eq__':<25} {'No':>10} {'Yes':>12} {'Yes':>12} {'Yes':>12}")
print(f"{'Hashable':<25} {'No**':>10} {'Yes':>12} {'Yes':>12} {'No***':>12}")
print(f"{'Index access':<25} {'No':>10} {'Yes':>12} {'Yes':>12} {'No':>12}")
print("\n* frozen=True makes dataclass immutable")
print("** unless you implement __hash__")
print("*** unless frozen=True")

## Summary

### Key Takeaways

- **`@dataclass`** is the go-to choice for most structured data in modern Python. It reduces boilerplate while keeping full flexibility.
- **`frozen=True`** gives you immutability and hashability -- use it for value objects, dict keys, and set members.
- **`field(default_factory=...)`** is essential for mutable defaults. Never write `tags: list = []` in a dataclass.
- **`__post_init__`** is the right place for validation and computing derived fields.
- **`typing.NamedTuple`** is ideal when you need an immutable, lightweight record with tuple unpacking support.
- **`slots=True`** (Python 3.10+) reduces memory usage for classes with many instances.

### Decision Guide
- Need mutability? Use `@dataclass`
- Need immutability + hashability? Use `@dataclass(frozen=True)` or `NamedTuple`
- Need tuple unpacking? Use `NamedTuple`
- Need full control (custom `__init__`, etc.)? Use a regular class
- Creating many small instances? Add `slots=True`