**⭐ 1. What This Pattern Solves**

Organizes hierarchical or multi-level data efficiently in ETL pipelines

Converts flat records into nested structures for analytics, reporting, or JSON exports

Useful for aggregating metrics by multiple keys (e.g., country → city → store → sales)

Simplifies lookups in deeply structured datasets without multiple joins

**⭐ 2. SQL Equivalent**

In [0]:
%sql
SELECT country, city, store, SUM(sales) as total_sales
FROM sales_table
GROUP BY country, city, store;

**⭐ 3. Core Idea**

Use dictionaries recursively to represent multiple levels of aggregation

Each level corresponds to a key in the hierarchy

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
from collections import defaultdict

def nested_dict():
    return defaultdict(nested_dict)

data = nested_dict()

# Insert a value
data['level1']['level2']['level3'] = 'value'

# Optional: Convert to normal dict for JSON export
import json
json.dumps(data)

**⭐ 5. Detailed Example**

In [0]:
records = [
    {'country': 'USA', 'city': 'NY', 'store': 'A', 'sales': 100},
    {'country': 'USA', 'city': 'NY', 'store': 'B', 'sales': 150},
    {'country': 'USA', 'city': 'LA', 'store': 'A', 'sales': 200},
    {'country': 'CAN', 'city': 'Toronto', 'store': 'A', 'sales': 50},
]

from collections import defaultdict

def nested_dict():
    return defaultdict(nested_dict)

agg = nested_dict()

for r in records:
    agg[r['country']][r['city']][r['store']] = r['sales']

import json
print(json.dumps(agg, indent=2))

{
  "USA": {
    "NY": {
      "A": 100,
      "B": 150
    },
    "LA": {
      "A": 200
    }
  },
  "CAN": {
    "Toronto": {
      "A": 50
    }
  }
}


**⭐ 6. Mini Practice Problems**

Convert a list of website clicks into a nested dictionary: date → user_id → page → click_count.

Aggregate employee salaries into a nested structure: department → team → employee → salary.

Build a multi-level inventory dictionary: warehouse → aisle → shelf → product → quantity.

**⭐ 7. Full Data Engineering Scenario**

Problem: Log analytics pipeline requires nested metrics by region → app → event_type → count.

In [0]:
## Expected Output
{
  "US": {"App1": {"click": 120, "view": 300}, "App2": {"click": 50}},
  "EU": {"App1": {"view": 200}}
}

In [0]:
from collections import defaultdict

def nested_dict():
    return defaultdict(nested_dict)

metrics = nested_dict()

for log in logs:
    metrics[log['region']][log['app']][log['event_type']] += log['count']


**⭐ 8. Time & Space Complexity**

Time Complexity: O(n) — iterate over all records once

Space Complexity: O(k) — number of unique keys across all levels; grows with hierarchy depth

**⭐ 9. Common Pitfalls & Mistakes**

❌ Forgetting to use defaultdict → KeyError when assigning nested levels
❌ Over-nesting without a clear structure → hard to query or maintain
❌ Converting to normal dict too late → JSON export fails
✔ Always define a recursive defaultdict or helper function
✔ Keep depth limited to meaningful hierarchy
✔ Use json.dumps() for serialization and downstream pipelines