**⭐ 1. What This Pattern Solves**

Keep all rows from the left table and add matching right-side columns when available (useful for enriching a main dataset with optional reference data).

**⭐ 2. SQL Equivalent**

In [0]:
%sql

SELECT *
FROM orders o
LEFT JOIN customers c
  ON o.customer_id = c.id;

**⭐ 3. Core Idea**

Use how="left" to preserve left rows; unmatched right columns become NULL. Useful when left is event-driven truth and right is optional metadata.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
joined = df_left.join(df_right, on=join_keys, how="left")

**⭐ 5. Detailed Example**

In [0]:
orders = spark.createDataFrame([(1,100),(2,50),(3,75)], ['order_id','amount'])
customers = spark.createDataFrame([(1,'Alice'),(2,'Bob')], ['id','name'])

res = orders.join(customers, orders.order_id == customers.id, how='left') \
            .select('order_id','amount','name')
res.show()

**⭐ 6. Mini Practice Problems**

Left join logins to users to keep every login even for deleted users.

Enrich sales with promo table but keep sales without promo (NULL promo_id).

Left join events to geo_lookup and fill missing country with 'UNKNOWN'.

**⭐ 7. Full Data Engineering Problem**

Merge nightly orders_delta into orders_enriched by left joining with product_catalog to add product categories, then fill missing categories and write partitioned by order_date.

**⭐ 8. Time & Space Complexity**

Same distributed cost as inner join. Additional cost negligible; null propagation only.

**⭐ 9. Common Pitfalls**

Assuming matched right rows always exist — must handle NULLs (coalesce, when).

Not filtering nulls when downstream expects non-null → bugs.

Joining on non-unique right keys causing row explosion (1-to-many) — ensure key cardinality.