⭐** 1. What This Pattern Solves**

Enrich a primary dataset with optional reference data

Preserve all rows from the left dataset even when no match exists

**Common in:**

Dimension lookups in ETL

Fact-to-dimension enrichment

Handling missing or late reference data

**⭐ 2. SQL Equivalent**

In [0]:
%sql
SELECT l.*, r.value
FROM left_table l
LEFT JOIN right_table r
  ON l.key = r.key;

**⭐ 3. Core Idea**

Build a lookup from the right dataset

Iterate over the left dataset

Attach matching data or None if missing

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
lookup = {k: v for k, v in right}

result = []
for k, v in left:
    result.append((k, v, lookup.get(k)))

**⭐ 5. Detailed Example**

In [0]:
orders = [
    (1, 100),
    (2, 200),
    (3, 300)
]

customers = [
    (1, "Alice"),
    (2, "Bob")
]

cust_map = {cid: name for cid, name in customers}

joined = []
for order_id, amount in orders:
    joined.append((order_id, amount, cust_map.get(order_id)))

[
    (1, 100, 'Alice'),
    (2, 200, 'Bob'),
    (3, 300, None)
]



**⭐ 6. Mini Practice Problems**

Left join web events with user profiles (some users missing)

Enrich transactions with FX rates (missing dates allowed)

Join IoT readings with device metadata (new devices appear first)

**⭐ 7. Full Data Engineering Scenario**

**Problem**
You have daily transactions and a customer dimension table.
Some transactions arrive before the customer record exists.

**Expected Output**
All transactions retained, customer fields nullable.

In [0]:
cust_lookup = {...}   # customer_id -> attributes

for txn in transactions:
    customer = cust_lookup.get(txn.customer_id)
    emit_enriched_record(txn, customer)

**⭐ 8. Time & Space Complexity**

Time: O(n + m)

Build lookup: m

Scan left dataset: n

Space: O(m)

Hash map for right dataset

**⭐ 9. Common Pitfalls & Mistakes**

❌ Iterating both lists with nested loops
✔ Always hash the right dataset first

❌ Assuming matches always exist
✔ Use .get() and handle None explicitly

❌ Overwriting left keys during join
✔ Preserve left-side schema as the source of truth