**Pattern 1** — Represent a log row

**Problem:** Raw logs come as tuples or dicts. Accessing by index/key is noisy and error-prone.

**Task:** Wrap each log row in a lightweight, immutable structure with named fields.

In [0]:
from collections import namedtuple

# Define schema once
Log = namedtuple("Log", ["user", "ts", "action"])

# Raw data from a file / DB
raw_rows = [
    ("alice", "2025-10-21 12:00", "click"),
    ("bob",   "2025-10-21 12:01", "view"),
]

# Convert to structured rows
logs = [Log(*row) for row in raw_rows]

# Use with dot-notation (much cleaner)
for entry in logs:
    print(entry.user, entry.ts, entry.action)

**Why this pattern?**

Clear schema (Log(user, ts, action)), no magic indices like row[0].

Still lightweight: behaves like a tuple (hashable, comparable).

Great for passing rows between functions in ETL code.

**Time & Space Complexity**

Building n log rows:

Time: create list of n namedtuples → O(n)

Space: store n rows → O(n)

Accessing fields (entry.user, entry.ts) is direct attribute access → O(1)

**Pattern 2**— Pack JSON into structured rows

**Problem:** Logs arrive as JSON/dicts from an API (e.g., Kafka, REST).
Accessing dict keys everywhere is noisy and easy to typo.

**Task:** Convert JSON log dicts into Log rows once, then work with a typed structure.

In [0]:
from collections import namedtuple

Log = namedtuple("Log", ["user", "ts", "action"])

# JSON-like input (dicts)
json_logs = [
    {"user": "alice", "timestamp": "2025-10-21 12:00", "event": "click"},
    {"user": "bob",   "timestamp": "2025-10-21 12:01", "event": "view"},
]

def pack_logs(json_rows):
    """Normalize JSON fields into our Log schema."""
    return [
        Log(
            user=row["user"],
            ts=row["timestamp"],      # map JSON key → schema field
            action=row["event"],
        )
        for row in json_rows
    ]

logs = pack_logs(json_logs)

# Downstream code is clean:
active_users = {log.user for log in logs if log.action == "click"}

**Why this pattern?**

One place to handle messy JSON keys → normalized schema for the rest of pipeline.

Easier refactors: if JSON changes, you fix pack_logs only.

Better type hints and IDE support when using named fields.

**Time & Space Complexity**

Packing n JSON rows:

Time: one pass over input, constant work per row → O(n)

Space: output list of n namedtuples → O(n)

Dict lookups (row["user"]) and attribute access (log.user) are both O(1).

**Pattern 3** — Immutable row for CDC comparison

**Problem:** In Change Data Capture (CDC), you compare “before” vs “after” snapshots.
If rows are mutable dicts, you can accidentally mutate them and corrupt comparisons.

**Task:** Use immutable, hashable rows (namedtuple) so you can safely put them in sets and compare snapshots.

In [0]:
from collections import namedtuple

Account = namedtuple("Account", ["id", "balance", "status"])

# Snapshot at T1
before_rows = [
    Account(1, 100, "ACTIVE"),
    Account(2, 200, "ACTIVE"),
]

# Snapshot at T2
after_rows = [
    Account(1, 150, "ACTIVE"),   # balance changed
    Account(3, 300, "ACTIVE"),   # new account
]

# Use sets for CDC-style comparison
before_set = set(before_rows)
after_set  = set(after_rows)

inserted = after_set - before_set   # new rows
deleted  = before_set - after_set   # rows that disappeared
unchanged = before_set & after_set  # rows exactly identical

In [0]:
## You can also detect “updated” rows by comparing keys separately:

before_by_id = {a.id: a for a in before_rows}
after_by_id  = {a.id: a for a in after_rows}

updated = [
    (before_by_id[acc_id], after_by_id[acc_id])
    for acc_id in (before_by_id.keys() & after_by_id.keys())
    if before_by_id[acc_id] != after_by_id[acc_id]
]

**Why this pattern?**

namedtuple rows are immutable → no accidental changes mid-pipeline.

Hashable → can put them into sets/dicts for fast diffing.

Very natural for CDC, slowly changing dimensions, snapshot comparisons.

**Time & Space Complexity**

Let n = len(before_rows), m = len(after_rows):

Building sets/dicts:

before_set, after_set, before_by_id, after_by_id each built in O(n) or O(m).

Set operations (-, &) and key-set intersection:

inserted, deleted, unchanged, updated detection → O(n + m) on average.

Space:

Storing snapshots + sets/dicts → O(n + m) additional space.

Per-row comparisons (before_by_id[acc_id] != after_by_id[acc_id]) are O(1) per row.

**namedtuple — Lightweight, Fast Row Schema**


Faster than dicts

More readable than tuples

Immutable

Access by attribute

**namedtuple**

Row schemas

Lightweight models

CDC-friendly immutable structures