**Pattern 1** — Safe Get With Default (dict.get)

**Problem:** Optional fields in JSON API response (e.g., user might be missing).

**Task:** Read user info safely without raising KeyError.

In [0]:
data = {
    "request_id": "abc-123",
    # "user" may or may not exist
    "meta": {"source": "web"}
}

"""
Idea (Safe Get With Default)
Use dict.get(key, default) to avoid KeyError and fall back to {} or a sensible default.
"""

user = data.get("user", {})
user_id = user.get("id", None)
user_name = user.get("name", "anonymous")

**Why this pattern**

Robust against missing fields in real-world APIs.

Lets you chain .get() or use defaults instead of try/except everywhere.

**Time & Space Complexity**

Time: dict.get is average O(1). A few constant lookups → O(1).

Space: Only a couple of variables, no extra data structures → O(1).

**Pattern 2 **— Drill Through Nested Objects Safely

**Problem:** Deeply nested JSON (e.g., user → address → location → city), where any level might be missing.

**Task:** Get city safely without blowing up on missing keys.

In [0]:
data = {
    "user": {
        "address": {
            "location": {
                "city": "Jacksonville",
                "state": "FL"
            }
        }
    }
}

"""
Idea (Chained Safe Get)
At each level, use .get(key, {}) so you always have a dict to ask the next key from.
Final .get uses None or "" as default for the value.
"""

city = (
    data
    .get("user", {})
    .get("address", {})
    .get("location", {})
    .get("city", None)
)

**Why this pattern**

Avoids big if "user" in data and "address" in data["user"] ... nests.

Perfect for messy or partially-populated JSON from microservices.

**Time & Space Complexity**

Time: Fixed number of get calls (4 lookups here) → O(1).

Space: Only scalar variables, no growing structures → O(1).

**Pattern 3** — Extract All Users From Nested JSON (List Comprehension)

**Problem:** JSON has a nested "team" → "members" array, each with name, id, etc.

**Task:** Extract all member names into a Python list.

In [0]:
data = {
    "team": {
        "name": "Data Platform",
        "members": [
            {"name": "alice", "role": "DE"},
            {"name": "bob", "role": "SRE"},
            {"name": "charlie", "role": "PM"},
        ]
    }
}

"""
Idea (Flat Map Over Inner List)
We know where the list lives: data["team"]["members"].
Use a list comprehension to map each member dict → member["name"].
"""

members = data.get("team", {}).get("members", [])
user_names = [m["name"] for m in members]
# ['alice', 'bob', 'charlie']

**Why this pattern**

Super common for flattening a known array out of a JSON payload.

Plays nicely with further filtering (if m["role"] == "DE" inside the comprehension).

**Time & Space Complexity**
Let n = number of members.

Time: One pass over members, O(1) work per member → O(n).

Space: New list of size n (plus a small members reference) → O(n).

**Pattern 4** — Extract All Keys Recursively (Schema / Drift Detection)

**Problem:** Need to see all keys in a complex JSON (maps of maps, nested dicts) to detect schema drift.

**Task:** Return a set of every key at every level.

In [0]:
def all_keys(d):
    """
    Idea (DFS Over Nested Dicts)
    Perform a depth-first search:
    - For each key k in current dict, add k.
    - If value is another dict, recurse and union keys.
    """
    keys = set()
    for k, v in d.items():
        keys.add(k)
        if isinstance(v, dict):
            keys |= all_keys(v)  # union with nested keys
    return keys


data = {
    "user": {
        "id": 1,
        "profile": {
            "name": "alice",
            "email": "a@example.com"
        }
    },
    "meta": {
        "request_id": "abc-123"
    }
}

schema_keys = all_keys(data)
# {'user', 'id', 'profile', 'name', 'email', 'meta', 'request_id'}

**Why this pattern**

Great for comparing JSON schema between “yesterday vs today” to detect drift.

Works for arbitrarily deep nesting without hardcoding paths.

**Time & Space Complexity**
Let N = total number of key–value pairs across all nested dicts.

Time: We visit each dict entry exactly once → O(N).

Space:

Output set stores up to N keys → O(N).

Recursion stack depth = nesting depth D (usually small) → O(D) extra.

Overall dominated by keys set → O(N).

**Pattern 5** — Extract All Values Recursively (Scan for PII / Anomalies)

**Problem:** You have a big JSON event and want to scan all values for PII patterns (emails, SSNs, etc.) or anomalies.

**Task:** Flatten all values (including nested dicts/lists) into a list or generator so you can inspect them.

In [0]:
def all_values(obj):
    """
    Idea (Recursive Walk Over Dicts and Lists)
    - If obj is a dict: recurse on each value.
    - If obj is a list/tuple: recurse on each element.
    - Otherwise: it's a leaf value → yield it.
    """
    if isinstance(obj, dict):
        for v in obj.values():
            yield from all_values(v)
    elif isinstance(obj, (list, tuple)):
        for item in obj:
            yield from all_values(item)
    else:
        yield obj


data = {
    "user": {
        "name": "alice",
        "email": "alice@example.com",
        "phones": ["555-1234", "555-5678"],
    },
    "transactions": [
        {"id": 1, "amount": 100.0},
        {"id": 2, "amount": 250.5},
    ]
}

values = list(all_values(data))
# ['alice', 'alice@example.com', '555-1234', '555-5678', 1, 100.0, 2, 250.5]

**Why this pattern**

Lets you run regex checks or anomaly detection on every value without caring about its position.

Works for mixed structures (dicts of lists of dicts, etc.).

**Time & Space Complexity**
Let M = total number of “nodes” (dict entries + list items + leaf values).

Time: Each node is processed once → O(M).

Space:

If you materialize all values into a list, that list size is number of leaf values L → O(L).

Recursion stack proportional to nesting depth D → O(D) extra.

Overall output-dominated → O(L) additional space (or O(1) extra if you consume the generator on the fly).

In [0]:
## Flat Key-Value Flattening

def flatten(d, parent='', sep='.'):
    out = {}
    for k, v in d.items():
        key = f"{parent}{sep}{k}" if parent else k
        if isinstance(v, dict):
            out.update(flatten(v, key, sep))
        else:
            out[key] = v
    return out

## input {"user": {"id": 1, "name": "alice"}}
"""
output {
    "user.id": 1,
    "user.name": "alice"
}
"""




In [0]:
## Flatten JSON With Arrays

def flatten_with_arrays(d, parent='', sep='.'):
    out = {}
    if isinstance(d, dict):
        for k, v in d.items():
            key = f"{parent}{sep}{k}" if parent else k
            out.update(flatten_with_arrays(v, key, sep))
    elif isinstance(d, list):
        for i, v in enumerate(d):
            key = f"{parent}[{i}]"
            out.update(flatten_with_arrays(v, key, sep))
    else:
        out[parent] = d
    return out

In [0]:
## Convert JSON → Rows (Explode Arrays)

## input
{
  "order_id": 1,
  "items": [
      {"sku": "A", "qty": 2},
      {"sku": "B", "qty": 1}
  ]
}

## Produce
(order_id, sku, qty)

def explode_items(order):
    for item in order['items']:
        yield {
            "order_id": order["order_id"],
            "sku": item["sku"],
            "qty": item["qty"]
        }


In [0]:
## Schema Inference From JSON

def infer_schema(d):
    schema = {}
    for k, v in d.items():
        if isinstance(v, dict):
            schema[k] = infer_schema(v)
        elif isinstance(v, list):
            if v and isinstance(v[0], dict):
                schema[k] = [infer_schema(v[0])]
            else:
                schema[k] = [type(v[0]).__name__ if v else "empty"]
        else:
            schema[k] = type(v).__name__
    return schema

In [0]:
## JSON Schema Drift Detection

def drift(old_schema, new_schema):
    old = set(flatten(old_schema).keys())
    new = set(flatten(new_schema).keys())
    return {
        "added": new - old,
        "removed": old - new
    }

In [0]:
## Detecting Missing Fields in JSON Records

missing_fields = expected_schema - set(flatten(JSON).keys())

In [0]:
## Detecting Extra Unexpected Fields

unexpected = set(flatten(json_record)) - expected_schema

In [0]:
## Safe JSON Processing With Try/Except

def safe_load(s):
    try:
        return json.loads(s)
    except json.JSONDecodeError:
        return None

In [0]:
## Problem 1 — Flatten JSON

## Flat Key-Value Flattening

def flatten(d, parent='', sep='.'):
    out = {}
    for k, v in d.items():
        key = f"{parent}{sep}{k}" if parent else k
        if isinstance(v, dict):
            out.update(flatten(v, key, sep))
        else:
            out[key] = v
    return out

## input {"user": {"id": 1, "name": "alice"}}
"""
output {
    "user.id": 1,
    "user.name": "alice"
}
"""


In [0]:
## Flatten JSON With Arrays

def flatten_with_arrays(d, parent='', sep='.'):
    out = {}
    if isinstance(d, dict):
        for k, v in d.items():
            key = f"{parent}{sep}{k}" if parent else k
            out.update(flatten_with_arrays(v, key, sep))
    elif isinstance(d, list):
        for i, v in enumerate(d):
            key = f"{parent}[{i}]"
            out.update(flatten_with_arrays(v, key, sep))
    else:
        out[parent] = d
    return out

In [0]:
## Problem 2 — Extract all user_id from nested JSON logs

def extract_users(records):
    out = []
    for r in records:
        out.append(r["event"]["user"]["id"])
    return out

In [0]:
## Problem 3 — Detect schema changes across two JSON datasets

def detect_changes(old, new):
    return drift(infer_schema(old), infer_schema(new))

In [0]:
## Problem 4 — Convert JSON array → table rows

rows = [dict(order_id=o['id'], **item) for item in o['items']]

In [0]:
## Problem 5 — Validate JSON fields against expected schema

missing = expected - set(json_record.keys())

In [0]:
## Problem 6 — Find all nested dict keys

keys = all_keys(data)

**Summary** 

JSON in Python = nested dicts + lists

Learn recursive patterns for flattening, extracting, navigating

Use get() for safe access

Schema inference + drift detection are must-know

Parsing patterns:

flatten

explode arrays

recursive traversal

type inference

Avoid repeated load() calls

Must know how to handle malformed/missing values

JSON processing is core to event ingestion, logs, CDC, and lakehouse ETL