**⭐ 1. What This Pattern Solves**

The Upsert pattern handles incremental updates by inserting new records and updating existing ones.

Similar to MERGE but often simpler for streaming ingestion or small batch updates.

Keeps your tables in sync with source changes without full overwrite.

Used for:

Syncing a Silver/Gold table with new batch data

Avoiding duplicates when data arrives out-of-order

Maintaining incremental ETL pipelines

**⭐ 2. SQL Equivalent**

In [0]:
%sql
MERGE INTO target_table AS t
USING updates_table AS s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;


**⭐ 3. Core Idea**

Detect existing rows → update

Insert new rows → insert

Keeps incremental pipelines efficient

Reusability: Any Delta table can apply Upsert to handle incremental streams or daily batch updates.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
from delta.tables import DeltaTable

delta_table = DeltaTable.forPath(spark, "/delta/target")

delta_table.alias("target").merge(
    source_updates.alias("source"),
    "target.id = source.id"
).whenMatchedUpdateAll() \
 .whenNotMatchedInsertAll() \
 .execute()

**⭐ 5. Detailed Example**

In [0]:
# Target table
data = [("A", 100), ("B", 50)]
df_target = spark.createDataFrame(data, ["id", "amount"])
df_target.write.format("delta").mode("overwrite").save("/delta/target")

# Updates
updates = [("A", 200), ("C", 300)]
source_updates = spark.createDataFrame(updates, ["id", "amount"])

# Upsert
delta_table = DeltaTable.forPath(spark, "/delta/target")
delta_table.alias("target").merge(
    source_updates.alias("source"),
    "target.id = source.id"
).whenMatchedUpdateAll() \
 .whenNotMatchedInsertAll() \
 .execute()

spark.read.format("delta").load("/delta/target").show()

**Step-by-step:**

"A" exists → updated to 200

"C" does not exist → inserted

"B" remains unchanged

**⭐ 6. Mini Practice Problems**

Upsert daily stock prices into a Delta table.

Sync user activity logs with incremental updates.

Merge a streaming IoT dataset into the Bronze table using Upsert logic.

**⭐ 7. Full Data Engineering Problem**

Scenario: Retail company ingests nightly product inventory updates:

Some products have changed stock (update)

New products are added (insert)

Implement Upsert into Silver Delta table to maintain accurate inventory for downstream analytics.

**⭐ 8. Time & Space Complexity**

Time: O(n + m) → scans both target and source

Space: Minimal; only updated rows are rewritten

Scales: Well for medium-size tables; large tables benefit from partitioning

**⭐ 9. Common Pitfalls**

Forgetting to set the merge condition → duplicates

Using Upsert on very large tables without partitions → slow

Not handling schema evolution → merge fails

Applying Upsert on SCD2 tables without tracking history → breaks versioning