# Delta Lake Merge (Upsert) & Soft Deletes

**Objective:** Master the `MERGE` command in Delta Lake. This is the primary way to handle "Upserts" (Update if exists, Insert if new) which is a common requirement in ETL pipelines (often called SCD Type 1). We will also look at how to use `MERGE` for "Soft Deletes".

---

## 1. Setup Data
We need two tables:
1.  **Target Table (`emp`):** The main table we want to update.
2.  **Source Table (`emp_updates`):** The table containing new changes (updates and inserts).

In [None]:
USE CATALOG dev;
USE SCHEMA bronze;

-- 1. Create Target Table with some initial data
CREATE OR REPLACE TABLE emp (
    emp_id INT,
    emp_name STRING,
    salary DOUBLE,
    is_active STRING DEFAULT 'Y' -- Column for soft delete logic
);

INSERT INTO emp VALUES 
(1001, 'John', 5000, 'Y'),
(1002, 'Jane', 6000, 'Y');

-- 2. Create Source Table with updates
-- 1001: Salary increased (Update)
-- 1003: New employee (Insert)
CREATE OR REPLACE TABLE emp_updates (
    emp_id INT,
    emp_name STRING,
    salary DOUBLE
);

INSERT INTO emp_updates VALUES 
(1001, 'John', 5500), -- Update
(1003, 'Mike', 4000); -- Insert

-- View data
SELECT * FROM emp;
SELECT * FROM emp_updates;

## 2. Standard Upsert (SCD Type 1)
The standard merge logic:
*   **WHEN MATCHED:** Update the existing record.
*   **WHEN NOT MATCHED:** Insert the new record.

In [None]:
MERGE INTO emp AS target
USING emp_updates AS source
ON target.emp_id = source.emp_id

-- Update existing records
WHEN MATCHED THEN
  UPDATE SET 
    target.salary = source.salary,
    target.emp_name = source.emp_name

-- Insert new records
WHEN NOT MATCHED THEN
  INSERT (emp_id, emp_name, salary, is_active)
  VALUES (source.emp_id, source.emp_name, source.salary, 'Y');

-- Verify Result
SELECT * FROM emp ORDER BY emp_id;

## 3. Merge with Soft Delete
Sometimes, we want to mark records as "deleted" (inactive) in the target if they are **missing** from the source (or based on a specific flag), rather than physically deleting them.

**Scenario:** If a record exists in `emp` but NOT in `emp_updates` (or based on some business logic), we might want to delete it or mark it inactive.
*Note: The standard `WHEN NOT MATCHED BY SOURCE` clause allows us to handle records present in Target but missing in Source.*

Let's delete (soft delete) records in Target that are NOT in the Source update batch (just for demonstration, usually you compare against a full snapshot).

In [None]:
-- Let's say our source batch only contains active employees.
-- Anyone NOT in this batch should be marked inactive.

MERGE INTO emp AS target
USING emp_updates AS source
ON target.emp_id = source.emp_id

-- Standard Upsert logic
WHEN MATCHED THEN
  UPDATE SET target.salary = source.salary
WHEN NOT MATCHED THEN
  INSERT (emp_id, emp_name, salary, is_active) VALUES (source.emp_id, source.emp_name, source.salary, 'Y')

-- Soft Delete Logic
WHEN NOT MATCHED BY SOURCE THEN
  UPDATE SET target.is_active = 'N';

-- Verify Result
-- Note: In a real scenario, 'Jane' (1002) was in Target but not in Source, so she becomes Inactive 'N'.
SELECT * FROM emp ORDER BY emp_id;

## 4. Summary of Merge Clauses

| Clause | Trigger Condition | Common Action |
| :--- | :--- | :--- |
| `ON <condition>` | Defines the matching key (e.g., ID). | N/A |
| `WHEN MATCHED` | Key exists in both Source and Target. | `UPDATE SET ...` |
| `WHEN NOT MATCHED` | Key exists in Source but NOT in Target. | `INSERT (...) VALUES (...)` |
| `WHEN NOT MATCHED BY SOURCE` | Key exists in Target but NOT in Source. | `DELETE` or `UPDATE SET status='Inactive'` |

## Next Steps
In the next video, we will cover more advanced Delta features like **Delta Live Tables (DLT)** or **Optimization commands** (Optimize/Z-Order/Vacuum).