**⭐ 1. What This Pattern Solves**

Represents structured records or rows in a pipeline with named fields.

Makes ETL transformations more readable by replacing tuple indices with field names.

Provides lightweight immutability (namedtuple) or mutability (dataclass) for analytics pipelines.

Useful for small-to-medium datasets where schema enforcement improves code clarity.

**⭐ 2. SQL Equivalent**

In [0]:
%sql
-- Each named field maps to a column
SELECT id, name, amount
FROM transactions;

**⭐ 3. Core Idea**

Use structured, self-documenting objects instead of raw tuples or dicts for pipeline records.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
from collections import namedtuple
from dataclasses import dataclass

# Namedtuple (immutable)
Transaction = namedtuple('Transaction', ['id', 'name', 'amount'])
t1 = Transaction(1, 'Alice', 100)
print(t1.name)

# Dataclass (mutable by default)
@dataclass
class TransactionData:
    id: int
    name: str
    amount: float

t2 = TransactionData(2, 'Bob', 200)
t2.amount = 250  # mutable
print(t2.amount)

**⭐ 5. Detailed Example**

In [0]:
records = [(1, 'Alice', 100), (2, 'Bob', 200)]

from collections import namedtuple

Transaction = namedtuple('Transaction', ['id', 'name', 'amount'])
structured_records = [Transaction(*r) for r in records]

for r in structured_records:
    print(f"{r.name} paid {r.amount}")

Alice paid 100
Bob paid 200

**⭐ 6. Mini Practice Problems**

Convert a list of tuples (user_id, product, price) into a namedtuple and print only products costing >50.

Create a dataclass for a LogEntry with fields (timestamp, level, message) and update the level for certain entries.

Compare namedtuple vs dataclass in immutability by trying to change a field.

**⭐ 7. Full Data Engineering Scenario**

Problem Statement:
You ingest user activity logs (user_id, action, timestamp) and need structured objects for downstream transformations like sessionization or aggregation.

Expected Output:
Structured list of records with field access by name.

In [0]:
from dataclasses import dataclass

@dataclass
class Activity:
    user_id: int
    action: str
    timestamp: str

activities = [Activity(*r) for r in raw_logs]
# Use activities in group-by, session calculations, etc.

**⭐ 8. Time & Space Complexity**

Time Complexity: O(n) — converting n raw rows into structured objects.

Space Complexity: O(n) — storing n objects with fields.

**⭐ 9. Common Pitfalls & Mistakes**

❌ Accessing tuple indices instead of fields → unreadable code.
❌ Using namedtuple when mutability is required.
✔ Use dataclass for mutable fields, namedtuple for immutable records.
✔ Use __slots__ in dataclass for memory efficiency if many objects.
✔ Avoid deep nesting inside dataclasses for high-volume pipelines; prefer flat structures.