**Pattern 1** — Counters per Key (Frequency Map)

**Problem:** Count how many times each event type appears in a stream of logs.

**Task:** Given a list of log rows, return a mapping event_type -> count.

In [0]:
from collections import defaultdict

def count_events_by_type(rows):
    """
    rows = [
        {"event_type": "INFO"},
        {"event_type": "ERROR"},
        {"event_type": "INFO"},
        {"event_type": "WARN"},
        {"event_type": "ERROR"},
    ]
    → {"INFO": 2, "ERROR": 2, "WARN": 1}
    """
    counts = defaultdict(int)
    for row in rows:
        key = row["event_type"]
        counts[key] += 1
    return counts

**Idea** (Hash Map Counter per Key)
Use a hash map where each key is an event type and value is the running count.
Every row updates exactly one bucket in O(1) average time.

**Why this is Aggregation**
You compress many rows into a small summary: counts per category.

**Time & Space Complexity**

Time: One pass over rows → O(n)

Space: At most one entry per distinct event type k → O(k)

**Pattern 2** — Summation per Key (Totals per Group)

**Problem:** Compute total revenue per user from a transaction log.

**Task:** Given rows with user and amount, return user -> total_amount.

In [0]:
from collections import defaultdict

def total_spend_per_user(rows):
    """
    rows = [
        {"user": "alice", "amount": 10.0},
        {"user": "bob",   "amount": 5.0},
        {"user": "alice", "amount": 2.5},
    ]
    → {"alice": 12.5, "bob": 5.0}
    """
    totals = defaultdict(float)
    for row in rows:
        u = row["user"]
        totals[u] += row["amount"]
    return totals

**Idea (Running Sum per Key)**
Same hash map idea, but value is a sum instead of a count.
Each new row just adds to the existing total.

**Why this is Aggregation**
Multiple rows collapse into a single numeric statistic per key: total revenue, total latency, etc.

**Time & Space Complexity**

Time: Single pass through rows → O(n)

Space: One entry per distinct user k → O(k)

**Pattern 3** — Average per Key (Mean per Group)

**Problem:** Compute average response time per API endpoint.

**Task:** Given rows with endpoint and latency_ms, return endpoint -> avg_latency.

In [0]:
from collections import defaultdict

def avg_latency_per_endpoint(rows):
    """
    rows = [
        {"endpoint": "/login", "latency_ms": 100},
        {"endpoint": "/login", "latency_ms": 200},
        {"endpoint": "/search", "latency_ms": 300},
    ]
    → {"/login": 150.0, "/search": 300.0}
    """
    # stats[endpoint] = [sum_latency, count]
    stats = defaultdict(lambda: [0, 0])

    for r in rows:
        ep = r["endpoint"]
        s, c = stats[ep]
        stats[ep] = [s + r["latency_ms"], c + 1]

    averages = {ep: s / c for ep, (s, c) in stats.items()}
    return averages


**Idea **(Track Sum & Count per Key)
You can’t compute an average in one step.
Maintain two aggregations per key: total sum and count, then divide at the end.

**Why this is Aggregation**
You’re combining rows into two statistics (sum, count) and then deriving a third (average).

**Time & Space Complexity**

Time:

First pass to build stats: O(n)

Second pass to compute averages: O(k)

Overall: O(n + k) ≈ O(n)

Space: Store sum & count per distinct key k → O(k)

**Pattern 4** — Min/Max per Key (Extremes per Group)

**Problem:** Find the minimum latency observed per user.

**Task:** Given rows with user and latency, return user -> min_latency.

In [0]:
def min_latency_per_user(rows):
    """
    rows = [
        {"user": "alice", "latency": 120},
        {"user": "bob",   "latency": 300},
        {"user": "alice", "latency": 80},
    ]
    → {"alice": 80, "bob": 300}
    """
    mins = {}
    for r in rows:
        u = r["user"]
        if u not in mins:
            mins[u] = r["latency"]
        else:
            mins[u] = min(mins[u], r["latency"])
    return mins

(Swap min for max to track maximum instead.)

**Idea (Running Extremum per Key)**
For each key, keep the best-so-far value (min or max).
Each new row only compares once with current extremum.

**Why this is Aggregation**
You collapse many measurements per key into a single extreme value (SLA checks, worst-case, best-case).

**Time & Space Complexity**

Time: One scan over rows, one compare per row → O(n)

Space: One stored value per distinct key k → O(k)

**Pattern 5** — Multi-Field Aggregation per Key

**Problem:** For each customer, compute multiple stats at once:
total spend, number of orders, minimum order value, maximum order value.

**Task:** Given rows with customer and amount, return:
customer -> {sum, count, min, max}.

In [0]:
from collections import defaultdict
import math

def stats_per_customer(rows):
    """
    rows = [
        {"customer": "alice", "amount": 10},
        {"customer": "alice", "amount": 5},
        {"customer": "bob",   "amount": 20},
    ]
    → {
        "alice": {"sum": 15, "count": 2, "min": 5,  "max": 10},
        "bob":   {"sum": 20, "count": 1, "min": 20, "max": 20},
      }
    """
    # value: [sum, count, min, max]
    agg = defaultdict(lambda: [0, 0, math.inf, -math.inf])

    for r in rows:
        c = r["customer"]
        amt = r["amount"]
        s, cnt, mn, mx = agg[c]
        s += amt
        cnt += 1
        mn = min(mn, amt)
        mx = max(mx, amt)
        agg[c] = [s, cnt, mn, mx]

    # Optional: convert to nicer dicts
    out = {}
    for c, (s, cnt, mn, mx) in agg.items():
        out[c] = {"sum": s, "count": cnt, "min": mn, "max": mx}
    return out

**Idea (One Pass, Many Aggregates per Key)**
Instead of making multiple passes for sum, count, min, max, keep a small state tuple per key and update all fields in a single scan.

Why this is Aggregation
You’re building a compact statistics object per key, representing many different metrics from many rows.

**Time & Space Complexity**

Time:

First pass to update [sum, count, min, max] for each row → O(n)

Second pass to format output (over k keys) → O(k)

Overall: O(n + k) ≈ O(n)

Space: State per distinct key k, constant size (4 numbers) → O(k)