In [0]:
# 2.4 DEMO: Implementing Change Data Feed (CDF) with Delta Sharing

## Provider Workspace - Tracking Data Changes

**Learning Objectives:**
- Enable Change Data Feed (CDF) on Delta tables
- Track INSERT, UPDATE, and DELETE operations
- Share CDF-enabled tables with recipients
- Understand the benefits of CDF for incremental data processing

**Scenario:**
You are managing a customer database that is shared with analytics partners. Instead of sending full snapshots every time data changes, you want to share only the changes (deltas) so partners can efficiently update their systems.

**What is Change Data Feed (CDF)?**
- CDF records all changes made to a Delta table (inserts, updates, deletes)
- Recipients can query `table_changes()` to see what changed between versions
- Enables efficient incremental processing and real-time data synchronization
- Reduces data transfer and processing costs

**Use Cases:**
- Real-time data synchronization
- Audit trails and compliance
- Incremental ETL pipelines
- Event-driven architectures

In [None]:
%run ../../setup/00-provider-setup

## Step 1: Create Catalog and Schema

First, ensure we have the catalog and schema set up.

In [None]:
-- Create catalog and schema
CREATE CATALOG IF NOT EXISTS ${c.catalog_name};
USE CATALOG ${c.catalog_name};

CREATE SCHEMA IF NOT EXISTS ${c.schema_name};
USE SCHEMA ${c.schema_name};

## Step 2: Create a Table with CDF Enabled

**Key Point:** We enable Change Data Feed using `TBLPROPERTIES` when creating the table. This tells Delta Lake to track all changes.

In [None]:
-- Create a customer table with Change Data Feed enabled
CREATE OR REPLACE TABLE ${c.catalog_name}.${c.schema_name}.customer_accounts (
  customer_id INT,
  account_number STRING,
  customer_name STRING,
  email STRING,
  account_status STRING,
  account_balance DECIMAL(10,2),
  last_activity_date DATE,
  created_date DATE
)
TBLPROPERTIES (
  'delta.enableChangeDataFeed' = 'true'
)
COMMENT 'Customer accounts with CDF enabled for change tracking';

-- Insert initial data
INSERT INTO ${c.catalog_name}.${c.schema_name}.customer_accounts VALUES
  (1001, 'ACC-1001', 'Alice Johnson', 'alice@example.com', 'Active', 5000.00, '2024-10-15', '2024-01-15'),
  (1002, 'ACC-1002', 'Bob Smith', 'bob@example.com', 'Active', 3500.00, '2024-10-14', '2024-02-20'),
  (1003, 'ACC-1003', 'Carol White', 'carol@example.com', 'Active', 7200.00, '2024-10-13', '2024-03-10'),
  (1004, 'ACC-1004', 'David Brown', 'david@example.com', 'Active', 2100.00, '2024-10-12', '2024-04-05'),
  (1005, 'ACC-1005', 'Emma Davis', 'emma@example.com', 'Active', 8500.00, '2024-10-11', '2024-05-18');

-- View the initial data
SELECT * FROM ${c.catalog_name}.${c.schema_name}.customer_accounts ORDER BY customer_id;

## Step 3: Verify CDF is Enabled

Check that Change Data Feed is enabled on the table.

In [None]:
-- Show table properties to confirm CDF is enabled
SHOW TBLPROPERTIES ${c.catalog_name}.${c.schema_name}.customer_accounts;

## Step 4: Make Changes to the Table

Now let's perform INSERT, UPDATE, and DELETE operations to generate change data.

In [None]:
-- INSERT: Add new customers
INSERT INTO ${c.catalog_name}.${c.schema_name}.customer_accounts VALUES
  (1006, 'ACC-1006', 'Frank Miller', 'frank@example.com', 'Active', 4500.00, '2024-10-16', '2024-10-16'),
  (1007, 'ACC-1007', 'Grace Lee', 'grace@example.com', 'Active', 6200.00, '2024-10-16', '2024-10-16');

SELECT * FROM ${c.catalog_name}.${c.schema_name}.customer_accounts ORDER BY customer_id;

In [None]:
-- UPDATE: Update account balances and activity dates
UPDATE ${c.catalog_name}.${c.schema_name}.customer_accounts
SET 
  account_balance = account_balance + 1000.00,
  last_activity_date = '2024-10-16'
WHERE customer_id IN (1001, 1003);

SELECT * FROM ${c.catalog_name}.${c.schema_name}.customer_accounts ORDER BY customer_id;

In [None]:
-- DELETE: Remove an inactive account
DELETE FROM ${c.catalog_name}.${c.schema_name}.customer_accounts
WHERE customer_id = 1004;

SELECT * FROM ${c.catalog_name}.${c.schema_name}.customer_accounts ORDER BY customer_id;

## Step 5: View Change Data Feed

Query the change data to see what operations were performed.

In [None]:
-- View all changes since version 0 (the beginning)
SELECT 
  _change_type,
  _commit_version,
  _commit_timestamp,
  customer_id,
  account_number,
  customer_name,
  account_status,
  account_balance
FROM table_changes('${c.catalog_name}.${c.schema_name}.customer_accounts', 0)
ORDER BY _commit_version, customer_id;

### Understanding Change Types

The `_change_type` column shows:
- **insert** - New rows added
- **update_preimage** - Row values BEFORE the update
- **update_postimage** - Row values AFTER the update
- **delete** - Rows that were removed

## Step 6: Create a Share with CDF-Enabled Table

Now let's share this CDF-enabled table so recipients can track changes.

In [None]:
-- Create a share for CDF demo
CREATE SHARE IF NOT EXISTS ${c.cdf_share_name}
COMMENT 'Share with Change Data Feed enabled tables';

-- Add the CDF-enabled table to the share
ALTER SHARE ${c.cdf_share_name}
ADD TABLE ${c.catalog_name}.${c.schema_name}.customer_accounts
COMMENT 'Customer accounts with change tracking';

-- View the share contents
SHOW ALL IN SHARE ${c.cdf_share_name};

## Step 7: Create Recipient and Grant Access

Create a recipient and grant them access to the CDF-enabled share.

In [None]:
-- Create recipient (reusing d2d_recipient from setup)
-- In a real scenario, this would be a different Databricks workspace
CREATE RECIPIENT IF NOT EXISTS ${c.d2d_recipient}
COMMENT 'Recipient for CDF demo';

-- Grant access to the CDF share
GRANT SELECT ON SHARE ${c.cdf_share_name} TO RECIPIENT ${c.d2d_recipient};

-- Generate activation link
DESCRIBE RECIPIENT ${c.d2d_recipient};

## Step 8: Make Additional Changes

Let's make more changes so the recipient can see incremental updates.

In [None]:
-- Add more customers
INSERT INTO ${c.catalog_name}.${c.schema_name}.customer_accounts VALUES
  (1008, 'ACC-1008', 'Henry Wilson', 'henry@example.com', 'Active', 3800.00, '2024-10-16', '2024-10-16');

-- Update account status
UPDATE ${c.catalog_name}.${c.schema_name}.customer_accounts
SET 
  account_status = 'Suspended',
  last_activity_date = '2024-10-16'
WHERE customer_id = 1002;

-- View current state
SELECT * FROM ${c.catalog_name}.${c.schema_name}.customer_accounts ORDER BY customer_id;

## Summary: Change Data Feed Benefits

### What We Accomplished:
✅ Created a table with CDF enabled  
✅ Performed INSERT, UPDATE, and DELETE operations  
✅ Queried change data using `table_changes()`  
✅ Shared CDF-enabled table with recipients  
✅ Enabled incremental data processing

### Key Benefits of CDF:
- **Efficiency** - Process only changed data, not full snapshots
- **Real-time** - Track changes as they happen
- **Audit Trail** - Complete history of all changes
- **Cost Savings** - Reduce data transfer and processing costs
- **Flexibility** - Recipients can choose when to sync

### Use Cases:
- **Incremental ETL** - Update data warehouses efficiently
- **Change Data Capture** - Sync with downstream systems
- **Audit and Compliance** - Track all data modifications
- **Event Streaming** - Feed changes to event-driven systems
- **Replication** - Keep multiple systems in sync

### Next Steps:
Recipients can now use `table_changes()` to:
1. Query changes since their last sync
2. Process only new/modified/deleted records
3. Build incremental pipelines
4. Track data lineage and history