**⭐ 1. What This Pattern Solves**

Efficient multi-level grouping in ETL pipelines.

Aggregating or collecting values by keys without initializing empty lists/dicts.

Avoids repetitive if key not in dict checks.

Common in analytics pipelines where events/logs are grouped by user, time, or category.

**⭐ 2. SQL Equivalent**

In [0]:
%sql
SELECT user_id, ARRAY_AGG(event) AS events
FROM user_events
GROUP BY user_id;

**⭐ 3. Core Idea**

Automatically initializes missing keys with a default type (list, set, int), simplifying accumulation.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
from collections import defaultdict

# Single-level grouping
grouped = defaultdict(list)
for key, value in data:
    grouped[key].append(value)

# Multi-level grouping
nested_group = defaultdict(lambda: defaultdict(list))
for outer, inner, value in data:
    nested_group[outer][inner].append(value)

**⭐ 5. Detailed Example**

In [0]:
data = [
    ('Alice', 'click'),
    ('Bob', 'view'),
    ('Alice', 'purchase'),
    ('Bob', 'click'),
]

from collections import defaultdict

events_by_user = defaultdict(list)
for user, event in data:
    events_by_user[user].append(event)

print(events_by_user)

{
    'Alice': ['click', 'purchase'],
    'Bob': ['view', 'click']
}


**⭐ 6. Mini Practice Problems**

Group transaction amounts by account ID using defaultdict.

Count occurrences of each word in a list using defaultdict(int).

Create a nested dictionary of sales per region per month.

**⭐ 7. Full Data Engineering Scenario**

Problem Statement: Collect all page views per user per day from web logs.


In [0]:
"""
Expected Output:

{
  '2025-12-15': {'Alice': ['home', 'checkout'], 'Bob': ['home']},
  '2025-12-16': {'Alice': ['profile']}
}

"""
from collections import defaultdict

logs_by_date = defaultdict(lambda: defaultdict(list))
for date, user, page in web_logs:
    logs_by_date[date][user].append(page)


**⭐ 8. Time & Space Complexity**

Time Complexity: O(n) — one pass through the data.

Space Complexity: O(n) — stores all elements grouped by key.

**⭐ 9. Common Pitfalls & Mistakes**

❌ Using normal dicts with repetitive if key not in dict → verbose and error-prone.
❌ Using mutable default arguments improperly (defaultdict(list) vs dict.setdefault).
✔ Correct: defaultdict automatically handles missing keys.
✔ Use lambda for nested structures to avoid shared mutable objects.