**Pattern 1** — Count Error Codes from Logs

**Problem:** Count how many times each HTTP status code appears in log records.

**Task:** Given a list of log dicts, compute frequency of status codes.

In [0]:
from collections import Counter

logs = [
    {"status": 200, "path": "/home"},
    {"status": 500, "path": "/api"},
    {"status": 200, "path": "/login"},
    {"status": 404, "path": "/missing"},
    {"status": 500, "path": "/api"},
]

"""
Idea:
Feed a generator of status codes into Counter.
Counter builds a hashmap: status_code -> count
"""

cnt = Counter(log["status"] for log in logs)
print(cnt)  # Counter({200: 2, 500: 2, 404: 1})

**Why this pattern**
Fast way to summarize logs by status, error type, response code, etc. One pass, hash-based counting.

**Time & Space Complexity**

Time: Single pass over logs → O(n), where n = number of log records.

Space: One entry per distinct status → O(m), where m = unique status codes.

**Pattern 2 **— Top-K Items

**Problem:** Find the most frequent error/status codes or categories.

**Task:** Given a Counter, return the top K most common items.

In [0]:
from collections import Counter

logs = [
    {"status": 200}, {"status": 500}, {"status": 500},
    {"status": 404}, {"status": 404}, {"status": 404},
]

cnt = Counter(log["status"] for log in logs)

"""
Idea:
Counter.most_common(k) returns a list of (item, count),
sorted by count descending, limited to top k.
"""

top_2 = cnt.most_common(2)
print(top_2)  # [(404, 3), (500, 2)]

**Why this pattern**
Used for “Top 10 error codes”, “Top 5 users by events”, “Top N slow endpoints”, etc.

**Time & Space Complexity**
Let m = number of unique keys.

Time: most_common(k) is approximately O(m log k) (heap-based selection).

Space: Uses extra space up to O(k) to hold the top K items.

**Pattern 3** — Category Drift Detection

**Problem:** Detect category distribution drift between training and production.

**Task:** Compare category counts in train vs prod and find categories that increased in production.

In [0]:
from collections import Counter

train = [
    {"category": "A"},
    {"category": "A"},
    {"category": "B"},
]

prod = [
    {"category": "A"},
    {"category": "B"},
    {"category": "B"},
    {"category": "C"},
]

train_cnt = Counter(row["category"] for row in train)
prod_cnt = Counter(row["category"] for row in prod)

"""
Idea:
Subtract counters: prod_cnt - train_cnt
Negative and zero counts are dropped.
Result: categories that increased (or appeared new) in prod.
"""

drift = prod_cnt - train_cnt
print(drift)  # Counter({'B': 1, 'C': 1})


**Why this pattern**
Great for monitoring: new categories appearing, shifts in label distribution, schema/category drift in production pipelines.

**Time & Space Complexity**
Let mt, mp be unique categories in train and prod; u = mt + mp (upper bound).

Time: Building counters is O(n_train + n_prod).
Subtraction is O(u) over unique keys.

Space: Counters store unique categories → O(u).

**Pattern 4 **— Count Composite Keys

**Problem:** Count how many events per (customer, date) pair (or any multi-column key).

**Task:** Given rows with cust and date, count frequency per (cust, date).

In [0]:
from collections import Counter

rows = [
    {"cust": "alice", "date": "2025-11-19"},
    {"cust": "alice", "date": "2025-11-19"},
    {"cust": "bob",   "date": "2025-11-19"},
    {"cust": "alice", "date": "2025-11-20"},
]

"""
Idea:
Use a tuple (cust, date) as the key.
Tuples are hashable → perfect as composite keys in Counter.
"""

pair_cnt = Counter((row["cust"], row["date"]) for row in rows)

for key, c in pair_cnt.items():
    print(key, "->", c)

# ('alice', '2025-11-19') -> 2
# ('bob',   '2025-11-19') -> 1
# ('alice', '2025-11-20') -> 1


**Why this pattern**
Very common in DE: daily counts per user, per account per day, per store per SKU, etc., without string concatenation.

**Time & Space Complexity**
Let n = number of rows, m = unique (cust, date) pairs.

Time: One pass to build keys and count → O(n).

Space: Store m composite keys → O(m).

**Pattern 5 **— Increment Counters Safely

**Problem:** Increment counts without worrying about missing keys or initializing to 0.

**Task:** Process a stream of items and maintain counts in a Counter.

In [0]:
from collections import Counter

items = ["A", "B", "A", "C", "A", "B"]

"""
Idea:
Counter returns 0 for missing keys.
So we can do cnt[item] += 1 directly without checks.
This is typically faster than dict.get(...) + 1 pattern.
"""

cnt = Counter()

for item in items:
    cnt[item] += 1

print(cnt)  # Counter({'A': 3, 'B': 2, 'C': 1})


**Why this pattern**
Ideal for streaming-style counting in loops, especially when category set is unknown ahead of time.

**Time & Space Complexity**
Let n = number of items, m = unique items.

Time: Each increment is average O(1) (hash map), over n items → O(n).

Space: Store counts for m unique items → O(m).

In [0]:
## Problem: Find top-3 customers by events

Counter(r['cust_id'] for r in events).most_common(3)

In [0]:
## Problem: Detect unexpected categories in prod

unexpected = set(prod_cnt) - set(train_cnt)

In [0]:
## Problem: Count frequency of (product, location)

Counter((r['product'], r['store']) for r in records)

**Counter**

Count anything fast

top-K

category drift

composite key frequency