# Delta Change Data Feed Demo

## Overview

This notebook demonstrates **Delta Change Data Feed** in Oracle AI Data Platform (AIDP) Workbench. Change Data Feed allows you to capture and process row-level changes (inserts, updates, deletes) from Delta tables, enabling powerful use cases like:

- **Change Data Capture (CDC)**: Track all changes for auditing and compliance
- **Incremental Processing**: Process only changed data for efficiency
- **Real-time Analytics**: Build streaming pipelines on table changes
- **Data Replication**: Sync changes to downstream systems

### What is Delta Change Data Feed?

Change Data Feed is a Delta Lake feature that:
- **Captures all changes**: Records inserts, updates, and deletes with timestamps
- **Maintains history**: Keeps change history alongside current table state
- **Enables efficient processing**: Allows reading only changed data
- **Supports streaming**: Integrates with Spark Structured Streaming

### Use Case: Customer Order Processing

We'll demonstrate change feed with an e-commerce order processing scenario:
- Track customer orders and their lifecycle changes
- Process order updates in real-time
- Maintain audit trails for compliance
- Enable incremental data pipelines

### AIDP Environment Setup

This notebook leverages the existing Spark session in your AIDP environment.

## Step 1: Create Catalog and Schema

### Setup Strategy

We'll create a dedicated catalog and schema for our change feed demo:
- **Catalog**: `retail` for data isolation
- **Schema**: `orders` for order processing data

This follows AIDP best practices for data governance and organization.

In [1]:
# Create retail catalog and orders schema
# In AIDP, catalogs provide data isolation and governance

spark.sql("CREATE CATALOG IF NOT EXISTS retail")
spark.sql("CREATE SCHEMA IF NOT EXISTS retail.orders")

print("Retail catalog and orders schema created successfully!")

Retail catalog and orders schema created successfully!


## Step 2: Create Delta Table with Change Data Feed Enabled

### Table Design

Our `orders` table will store:
- **order_id**: Unique order identifier
- **customer_id**: Customer who placed the order
- **order_date**: Date the order was placed
- **product_id**: Product being ordered
- **quantity**: Quantity ordered
- **unit_price**: Price per unit
- **total_amount**: Total order amount
- **status**: Order status (pending, confirmed, shipped, delivered, cancelled)
- **last_updated**: Timestamp of last update

### Enabling Change Data Feed

To enable change data feed, we use the table property:
```sql
TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true')
```

This property must be set when creating the table - it cannot be altered later.

In [1]:
# Create Delta table with change data feed enabled
# TBLPROPERTIES enables change data feed for capturing all changes

spark.sql("""
CREATE TABLE IF NOT EXISTS retail.orders.orders (
    order_id STRING,
    customer_id STRING,
    order_date DATE,
    product_id STRING,
    quantity INT,
    unit_price DECIMAL(10,2),
    total_amount DECIMAL(12,2),
    status STRING,
    last_updated TIMESTAMP
)
USING DELTA TBLPROPERTIES (delta.enableChangeDataFeed = true)
""")

print("Delta table with change data feed enabled created successfully!")
print("All changes (inserts, updates, deletes) will now be captured automatically.")

Delta table with change data feed enabled created successfully!
All changes (inserts, updates, deletes) will now be captured automatically.


## Step 3: Generate and Insert Initial Order Data

### Data Generation Strategy

We'll create realistic e-commerce order data:
- **1,000 customers** with multiple orders
- **Various products** across different categories
- **Realistic order statuses** and timestamps
- **Historical data** spanning several months

This simulates a real e-commerce scenario where orders evolve over time.

In [1]:
# Generate sample customer order data
# Using fully qualified imports to avoid conflicts

import random
from datetime import datetime, timedelta

# Define e-commerce data constants
PRODUCTS = [
    ('LAPTOP-001', 'Gaming Laptop', 1299.99),
    ('PHONE-001', 'Smartphone', 799.99),
    ('TABLET-001', 'Tablet', 499.99),
    ('HEADPHONES-001', 'Wireless Headphones', 199.99),
    ('MOUSE-001', 'Gaming Mouse', 79.99),
    ('KEYBOARD-001', 'Mechanical Keyboard', 149.99),
    ('MONITOR-001', '4K Monitor', 399.99),
    ('SPEAKERS-001', 'Bluetooth Speakers', 129.99)
]

STATUSES = ['pending', 'confirmed', 'shipped', 'delivered']

# Generate order data
order_data = []
base_date = datetime(2024, 1, 1)

# Create 1,000 customers with 1-5 orders each
for customer_num in range(1, 1001):
    customer_id = f"CUST{customer_num:06d}"
    
    # Each customer places 1-5 orders
    num_orders = random.randint(1, 5)
    
    for order_num in range(1, num_orders + 1):
        order_id = f"ORDER{customer_num:06d}-{order_num:02d}"
        
        # Random order date within the year
        days_offset = random.randint(0, 365)
        order_date = base_date + timedelta(days=days_offset)
        
        # Select random product
        product_id, product_name, unit_price = random.choice(PRODUCTS)
        
        # Random quantity (1-5)
        quantity = random.randint(1, 5)
        total_amount = round(unit_price * quantity, 2)
        
        # Random status
        status = random.choice(STATUSES)
        
        # Last updated timestamp (could be different from order date for status changes)
        update_offset = random.randint(0, 30)  # Up to 30 days after order
        last_updated = order_date + timedelta(days=update_offset)
        
        order_data.append({
            "order_id": order_id,
            "customer_id": customer_id,
            "order_date": order_date.date(),
            "product_id": product_id,
            "quantity": int(quantity),
            "unit_price": float(unit_price),
            "total_amount": float(total_amount),
            "status": status,
            "last_updated": last_updated
        })

print(f"Generated {len(order_data)} customer order records")
print("Sample record:", order_data[0])

Generated 3030 customer order records
Sample record: {'order_id': 'ORDER000001-01', 'customer_id': 'CUST000001', 'order_date': datetime.date(2024, 7, 21), 'product_id': 'TABLET-001', 'quantity': 2, 'unit_price': 499.99, 'total_amount': 999.98, 'status': 'delivered', 'last_updated': datetime.datetime(2024, 8, 8, 0, 0)}


In [1]:
# Insert initial data into the Delta table
# This creates the first version of our table
from pyspark.sql.functions import col
from pyspark.sql.types import IntegerType, DecimalType

# Create DataFrame from generated data
df_orders = spark.createDataFrame(order_data)
df_orders = df_orders.withColumn('quantity', col('quantity').cast(IntegerType()))
df_orders = df_orders.withColumn('total_amount', col('total_amount').cast(DecimalType(12,2)))
df_orders = df_orders.withColumn('unit_price', col('unit_price').cast(DecimalType(10,2)))

# Display schema and sample data
print("DataFrame Schema:")
df_orders.printSchema()

print("\nSample Data:")
df_orders.show(5)

# Insert data into Delta table
# This will create version 0 of the table
df_orders.write.mode("append").saveAsTable("retail.orders.orders")

print(f"\nSuccessfully inserted {df_orders.count()} records into retail.orders.orders")
print("Change data feed is now capturing this initial data load.")

DataFrame Schema:
root
 |-- customer_id: string (nullable = true)
 |-- last_updated: timestamp (nullable = true)
 |-- order_date: date (nullable = true)
 |-- order_id: string (nullable = true)
 |-- product_id: string (nullable = true)
 |-- quantity: integer (nullable = true)
 |-- status: string (nullable = true)
 |-- total_amount: decimal(12,2) (nullable = true)
 |-- unit_price: decimal(10,2) (nullable = true)


Sample Data:
+-----------+-------------------+----------+--------------+--------------+--------+---------+------------+----------+
|customer_id|       last_updated|order_date|      order_id|    product_id|quantity|   status|total_amount|unit_price|
+-----------+-------------------+----------+--------------+--------------+--------+---------+------------+----------+
| CUST000001|2024-08-08 00:00:00|2024-07-21|ORDER000001-01|    TABLET-001|       2|delivered|      999.98|    499.99|
| CUST000001|2024-10-26 00:00:00|2024-10-15|ORDER000001-02|  SPEAKERS-001|       2|confirmed|      


Successfully inserted 3030 records into retail.orders.orders
Change data feed is now capturing this initial data load.


## Step 4: Make Changes to Demonstrate Change Feed

### Simulating Real-World Changes

Now we'll simulate real e-commerce operations:
1. **New orders**: Customers place additional orders
2. **Status updates**: Orders progress through their lifecycle
3. **Order cancellations**: Some orders are cancelled

Each change will be captured by the change data feed.

In [1]:
# Simulate new orders (INSERT operations)
print("=== Adding New Orders ===")

new_orders = [
    {
        "order_id": "ORDER01001-01",
        "customer_id": "CUST001001",
        "order_date": datetime(2024, 12, 1).date(),
        "product_id": "LAPTOP-001",
        "quantity": 1,
        "unit_price": 1299.99,
        "total_amount": 1299.99,
        "status": "pending",
        "last_updated": datetime(2024, 12, 1, 10, 30)
    },
    {
        "order_id": "ORDER01002-01",
        "customer_id": "CUST001002",
        "order_date": datetime(2024, 12, 1).date(),
        "product_id": "PHONE-001",
        "quantity": 2,
        "unit_price": 799.99,
        "total_amount": 1599.98,
        "status": "confirmed",
        "last_updated": datetime(2024, 12, 1, 11, 15)
    }
]

# Insert new orders
df_new_orders = spark.createDataFrame(new_orders)
df_new_orders = df_new_orders.withColumn('quantity', col('quantity').cast(IntegerType()))
df_new_orders = df_new_orders.withColumn('total_amount', col('total_amount').cast(DecimalType(12,2)))
df_new_orders = df_new_orders.withColumn('unit_price', col('unit_price').cast(DecimalType(10,2)))
df_new_orders.write.mode("append").saveAsTable("retail.orders.orders")
print("Added 2 new orders")

=== Adding New Orders ===


Added 2 new orders


In [1]:
# Simulate status updates (UPDATE operations)
print("=== Updating Order Statuses ===")

# Update some existing orders to new statuses
spark.sql("""
UPDATE retail.orders.orders 
SET status = 'shipped', last_updated = '2024-12-02 14:30:00'
WHERE order_id = 'ORDER000001-01'
""")

spark.sql("""
UPDATE retail.orders.orders 
SET status = 'delivered', last_updated = '2024-12-02 16:45:00'
WHERE order_id = 'ORDER000002-01'
""")

spark.sql("""
UPDATE retail.orders.orders 
SET status = 'confirmed', last_updated = '2024-12-01 12:00:00'
WHERE order_id IN ('ORDER000003-01', 'ORDER000004-01')
""")

print("Updated order statuses (3 updates)")

=== Updating Order Statuses ===


Updated order statuses (3 updates)


In [1]:
# Simulate order cancellations (DELETE operations)
print("=== Cancelling Orders (DELETE operations) ===")

# Cancel some orders (soft delete by setting status, or hard delete)
# For demo purposes, we'll do hard deletes
spark.sql("""
DELETE FROM retail.orders.orders 
WHERE order_id = 'ORDER000005-01'
""")

spark.sql("""
DELETE FROM retail.orders.orders 
WHERE order_id = 'ORDER000006-01'
""")

print("Deleted 2 orders (simulating cancellations)")

# Check current table state
current_count = spark.sql("SELECT COUNT(*) as current_orders FROM retail.orders.orders").collect()[0][0]
print(f"Current total orders in table: {current_count}")

=== Cancelling Orders (DELETE operations) ===


Deleted 2 orders (simulating cancellations)


Current total orders in table: 3030


In [1]:
# Now look at the history
spark.sql("DESCRIBE HISTORY retail.orders.orders").show()

+-------+-------------------+------+--------+------------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|version|          timestamp|userId|userName|   operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|    operationMetrics|userMetadata|          engineInfo|
+-------+-------------------+------+--------+------------+--------------------+----+--------+---------+-----------+--------------+-------------+--------------------+------------+--------------------+
|      7|2025-12-03 01:40:14|  NULL|    NULL|      DELETE|{predicate -> ["(...|NULL|    NULL|     NULL|          6|  Serializable|        false|{numRemovedFiles ...|        NULL|Apache-Spark/3.5....|
|      6|2025-12-03 01:40:09|  NULL|    NULL|      DELETE|{predicate -> ["(...|NULL|    NULL|     NULL|          5|  Serializable|        false|{numRemovedFiles ...|        NULL|Apache-Spark/3.5....|


## Step 5: Reading the Change Data Feed

### How Change Feed Works

The change data feed captures:
- **_change_type**: INSERT, UPDATE_PREIMAGE, UPDATE_POSTIMAGE, DELETE
- **_commit_version**: Delta table version when change occurred
- **_commit_timestamp**: When the change was committed
- All table columns with their values at the time of change

### Reading Change Feed

You can read the change feed using:
```python
spark.read.format("delta")
    .option("readChangeFeed", "true")
    .table("table_name")
```

For historical changes, specify startingVersion:
```python
.option("startingVersion", 0)
```

In [1]:
# Read the complete change data feed from the beginning
print("=== Complete Change Data Feed ===")

changes_df = spark.read.format("delta") \
    .option("readChangeFeed", "true") \
    .option("startingVersion", 0) \
    .table("retail.orders.orders")

# Show change feed schema
print("Change Feed Schema:")
changes_df.printSchema()

# Display all changes
print("\nAll Changes:")
changes_df.orderBy("_commit_version", "order_id").show(20, truncate=False)

=== Complete Change Data Feed ===


Change Feed Schema:
root
 |-- order_id: string (nullable = true)
 |-- customer_id: string (nullable = true)
 |-- order_date: date (nullable = true)
 |-- product_id: string (nullable = true)
 |-- quantity: integer (nullable = true)
 |-- unit_price: decimal(10,2) (nullable = true)
 |-- total_amount: decimal(12,2) (nullable = true)
 |-- status: string (nullable = true)
 |-- last_updated: timestamp (nullable = true)
 |-- _change_type: string (nullable = true)
 |-- _commit_version: long (nullable = true)
 |-- _commit_timestamp: timestamp (nullable = true)


All Changes:


+--------------+-----------+----------+--------------+--------+----------+------------+---------+-------------------+------------+---------------+-----------------------+
|order_id      |customer_id|order_date|product_id    |quantity|unit_price|total_amount|status   |last_updated       |_change_type|_commit_version|_commit_timestamp      |
+--------------+-----------+----------+--------------+--------+----------+------------+---------+-------------------+------------+---------------+-----------------------+
|ORDER000001-01|CUST000001 |2024-07-21|TABLET-001    |2       |499.99    |999.98      |delivered|2024-08-08 00:00:00|insert      |1              |2025-12-03 01:39:27.062|
|ORDER000001-02|CUST000001 |2024-10-15|SPEAKERS-001  |2       |129.99    |259.98      |confirmed|2024-10-26 00:00:00|insert      |1              |2025-12-03 01:39:27.062|
|ORDER000001-03|CUST000001 |2024-12-11|LAPTOP-001    |1       |1299.99   |1299.99     |pending  |2024-12-24 00:00:00|insert      |1              

In [1]:
# Analyze changes by type
print("=== Changes by Type ===")

change_summary = changes_df.groupBy("_change_type").count().orderBy("_change_type")
change_summary.show()

# Show INSERT operations
print("\nINSERT Operations:")
changes_df.filter("_change_type = 'insert'").select("order_id", "customer_id", "status", "total_amount", "_commit_version").show(10)

# Show UPDATE operations (both pre and post images)
print("\nUPDATE Operations:")
changes_df.filter("_change_type LIKE 'update%'").select("order_id", "status", "_change_type", "_commit_version").show(10)

# Show DELETE operations
print("\nDELETE Operations:")
changes_df.filter("_change_type = 'delete'").select("order_id", "customer_id", "status", "_commit_version").show()

=== Changes by Type ===


+----------------+-----+
|    _change_type|count|
+----------------+-----+
|          delete|    2|
|          insert| 3032|
|update_postimage|    4|
| update_preimage|    4|
+----------------+-----+


INSERT Operations:


+--------------+-----------+---------+------------+---------------+
|      order_id|customer_id|   status|total_amount|_commit_version|
+--------------+-----------+---------+------------+---------------+
|ORDER000248-04| CUST000248|  shipped|      599.96|              1|
|ORDER000248-05| CUST000248|confirmed|     1999.95|              1|
|ORDER000249-01| CUST000249|  pending|      999.95|              1|
|ORDER000250-01| CUST000250|confirmed|     1999.96|              1|
|ORDER000250-02| CUST000250|confirmed|      399.95|              1|
|ORDER000251-01| CUST000251|delivered|      239.97|              1|
|ORDER000251-02| CUST000251|delivered|     1299.99|              1|
|ORDER000252-01| CUST000252|  pending|      199.99|              1|
|ORDER000252-02| CUST000252|delivered|     2399.97|              1|
|ORDER000252-03| CUST000252|delivered|     1499.97|              1|
+--------------+-----------+---------+------------+---------------+
only showing top 10 rows


UPDATE Operations:


+--------------+---------+----------------+---------------+
|      order_id|   status|    _change_type|_commit_version|
+--------------+---------+----------------+---------------+
|ORDER000003-01|  pending| update_preimage|              5|
|ORDER000003-01|confirmed|update_postimage|              5|
|ORDER000004-01|  pending| update_preimage|              5|
|ORDER000004-01|confirmed|update_postimage|              5|
|ORDER000002-01|  pending| update_preimage|              4|
|ORDER000002-01|delivered|update_postimage|              4|
|ORDER000001-01|delivered| update_preimage|              3|
|ORDER000001-01|  shipped|update_postimage|              3|
+--------------+---------+----------------+---------------+


DELETE Operations:


+--------------+-----------+---------+---------------+
|      order_id|customer_id|   status|_commit_version|
+--------------+-----------+---------+---------------+
|ORDER000005-01| CUST000005|confirmed|              6|
|ORDER000006-01| CUST000006|  pending|              7|
+--------------+-----------+---------+---------------+



## Step 6: Processing Changes with a Batch Job

### Batch Processing Strategy

For batch processing of changes, we can:
1. Read change feed from a specific version
2. Process changes (e.g., send notifications, update downstream systems)
3. Track the last processed version

### Use Case: Order Status Notifications

We'll simulate a batch job that processes order status changes to send notifications.

In [1]:
# Get the latest table version
latest_version = spark.sql("DESCRIBE HISTORY retail.orders.orders").select("version").collect()[0][0]
print(f"Latest table version: {latest_version}")

# Simulate batch processing starting from version 1 (after initial load)
# In a real job, you'd store the last processed version in a checkpoint
last_processed_version = 1

print(f"Processing changes from version {last_processed_version} to {latest_version}")

# Read incremental changes
incremental_changes = spark.read.format("delta") \
    .option("readChangeFeed", "true") \
    .option("startingVersion", last_processed_version) \
    .table("retail.orders.orders")

print(f"Found {incremental_changes.count()} changes to process")

Latest table version: 7
Processing changes from version 1 to 7


Found 3042 changes to process


In [1]:
# Process changes - simulate order status notifications
print("=== Processing Order Status Changes ===")

# Filter for status updates (update operations)
status_updates = incremental_changes.filter("_change_type = 'update_postimage'") \
    .select("order_id", "customer_id", "status", "_commit_timestamp")

# Simulate sending notifications for status changes
notifications_sent = []

for row in status_updates.collect():
    notification = {
        "order_id": row.order_id,
        "customer_id": row.customer_id,
        "new_status": row.status,
        "notification_time": row._commit_timestamp,
        "message": f"Order {row.order_id} status changed to {row.status}"
    }
    notifications_sent.append(notification)
    print(f"Notification: {notification['message']}")

print(f"\nProcessed {len(notifications_sent)} status update notifications")

# In a real implementation, you'd update the checkpoint
# spark.sql(f"UPDATE checkpoint_table SET last_version = {latest_version}")
print(f"Checkpoint updated to version {latest_version}")

=== Processing Order Status Changes ===


Notification: Order ORDER000003-01 status changed to confirmed
Notification: Order ORDER000004-01 status changed to confirmed
Notification: Order ORDER000002-01 status changed to delivered
Notification: Order ORDER000001-01 status changed to shipped

Processed 4 status update notifications
Checkpoint updated to version 7


## Step 7: Real-time Change Processing with Structured Streaming

### Streaming Change Feed

For real-time processing, we can use Structured Streaming with change feed:
- Continuously process new changes as they arrive
- Enable real-time dashboards and alerts
- Integrate with streaming analytics platforms

### Streaming Query Setup

We'll create a streaming query that processes new changes in real-time.

In [1]:
# Set up streaming change feed reader
print("=== Setting up Streaming Change Feed ===")

# Read change feed in streaming mode
streaming_changes = spark.readStream.format("delta") \
    .option("readChangeFeed", "true") \
    .option("startingVersion", latest_version + 1) \
    .table("retail.orders.orders")

# Define streaming processing logic
def process_batch(batch_df, batch_id):
    """Process a batch of changes"""
    print(f"Processing batch {batch_id} with {batch_df.count()} changes")
    
    # Count changes by type
    change_counts = batch_df.groupBy("_change_type").count()
    change_counts.show()
    
    # Process new orders
    new_orders = batch_df.filter("_change_type = 'insert'")
    if new_orders.count() > 0:
        print(f"New orders: {new_orders.count()}")
        new_orders.select("order_id", "customer_id", "total_amount").show()
    
    # Process status updates
    status_changes = batch_df.filter("_change_type = 'update_postimage'")
    if status_changes.count() > 0:
        print(f"Status changes: {status_changes.count()}")
        status_changes.select("order_id", "status").show()

# Note: In a real scenario, you'd set up a proper streaming sink
# For demo purposes, we'll show the streaming DataFrame setup
print("Streaming DataFrame created (would start query in production)")
print("Schema:")
streaming_changes.printSchema()

=== Setting up Streaming Change Feed ===


Streaming DataFrame created (would start query in production)
Schema:
root
 |-- order_id: string (nullable = true)
 |-- customer_id: string (nullable = true)
 |-- order_date: date (nullable = true)
 |-- product_id: string (nullable = true)
 |-- quantity: integer (nullable = true)
 |-- unit_price: decimal(10,2) (nullable = true)
 |-- total_amount: decimal(12,2) (nullable = true)
 |-- status: string (nullable = true)
 |-- last_updated: timestamp (nullable = true)
 |-- _change_type: string (nullable = true)
 |-- _commit_version: long (nullable = true)
 |-- _commit_timestamp: timestamp (nullable = true)



## Step 8: Change Feed Analytics and Insights

### Analyzing Change Patterns

Let's analyze the change data to understand:
- Change frequency and patterns
- Order lifecycle analytics
- Data quality and audit insights

### Key Metrics

- Total changes over time
- Change type distribution
- Order status transition patterns
- Processing latency insights

In [1]:
from pyspark.sql.functions import col, date_format, count

# Analyze change patterns
print("=== Change Feed Analytics ===")

# Changes over time
changes_over_time = changes_df.groupBy(date_format(col("_commit_timestamp"), "yyyy-MM-dd")).count().orderBy(date_format(col("_commit_timestamp"), "yyyy-MM-dd"))
changes_over_time.show()

# Change type distribution
change_types = changes_df.groupBy("_change_type").count().orderBy("count", ascending=False)
print("\nChange Type Distribution:")
change_types.show()

# Simplified status analysis
current_statuses = spark.sql("SELECT status, COUNT(*) as count FROM retail.orders.orders GROUP BY status")
print("Current Order Status Distribution:")
current_statuses.show()

changes_df.createOrReplaceTempView("orders_v")
status_transitions = spark.sql( "SELECT order_id, status, _commit_timestamp   FROM orders_v  WHERE _change_type IN ('insert', 'update_postimage') ORDER BY _commit_timestamp")
status_transitions.show()


=== Change Feed Analytics ===


+------------------------------------------+-----+
|date_format(_commit_timestamp, yyyy-MM-dd)|count|
+------------------------------------------+-----+
|                                2025-12-03| 3042|
+------------------------------------------+-----+


Change Type Distribution:


+----------------+-----+
|    _change_type|count|
+----------------+-----+
|          insert| 3032|
| update_preimage|    4|
|update_postimage|    4|
|          delete|    2|
+----------------+-----+



Current Order Status Distribution:


+---------+-----+
|   status|count|
+---------+-----+
|  shipped|  725|
|delivered|  790|
|  pending|  759|
|confirmed|  756|
+---------+-----+



+--------------+---------+--------------------+
|      order_id|   status|   _commit_timestamp|
+--------------+---------+--------------------+
|ORDER000248-04|  shipped|2025-12-03 01:39:...|
|ORDER000001-01|delivered|2025-12-03 01:39:...|
|ORDER000248-05|confirmed|2025-12-03 01:39:...|
|ORDER000001-02|confirmed|2025-12-03 01:39:...|
|ORDER000249-01|  pending|2025-12-03 01:39:...|
|ORDER000001-03|  pending|2025-12-03 01:39:...|
|ORDER000250-01|confirmed|2025-12-03 01:39:...|
|ORDER000002-01|  pending|2025-12-03 01:39:...|
|ORDER000250-02|confirmed|2025-12-03 01:39:...|
|ORDER000002-02|  pending|2025-12-03 01:39:...|
|ORDER000251-01|delivered|2025-12-03 01:39:...|
|ORDER000002-03|  shipped|2025-12-03 01:39:...|
|ORDER000251-02|delivered|2025-12-03 01:39:...|
|ORDER000003-01|  pending|2025-12-03 01:39:...|
|ORDER000252-01|  pending|2025-12-03 01:39:...|
|ORDER000004-01|  pending|2025-12-03 01:39:...|
|ORDER000252-02|delivered|2025-12-03 01:39:...|
|ORDER000004-02|confirmed|2025-12-03 01:

## Key Takeaways: Delta Change Data Feed in AIDP

### What We Demonstrated

1. **Change Capture**: Enabled change feed on a Delta table to automatically capture all changes
2. **Change Types**: Demonstrated INSERT, UPDATE, and DELETE change capture
3. **Change Reading**: Read complete change history and incremental changes
4. **Batch Processing**: Processed changes in batch mode for downstream systems
5. **Streaming Setup**: Configured streaming change feed for real-time processing
6. **Analytics**: Analyzed change patterns for insights and monitoring

### AIDP Advantages

- **Unified Platform**: Change feed integrates seamlessly with other AIDP services
- **Performance**: Efficient change capture with minimal overhead
- **Governance**: Catalog and schema isolation for change data
- **Streaming**: Native integration with Spark Structured Streaming
- **Audit Trail**: Complete change history for compliance and debugging

### Best Practices for Change Data Feed

1. **Enable at Creation**: Change feed must be enabled when creating the table
2. **Version Management**: Track processed versions for incremental processing
3. **Change Type Handling**: Handle INSERT, UPDATE_PREIMAGE, UPDATE_POSTIMAGE, DELETE appropriately
4. **Performance**: Use startingVersion to avoid reprocessing old changes
5. **Monitoring**: Monitor change volume and processing latency

### Production Considerations

- **Checkpointing**: Store last processed version reliably
- **Error Handling**: Implement retry logic for failed processing
- **Scaling**: Consider partitioning strategy for high-volume tables
- **Retention**: Configure data retention policies for change history
- **Security**: Apply appropriate permissions for change feed access

### Next Steps

- Explore integration with AIDP's streaming analytics
- Implement change feed for existing tables (requires recreation)
- Build end-to-end CDC pipelines to external systems
- Monitor change feed performance at scale
- Implement automated alerting for data quality issues

This notebook demonstrates how Oracle AI Data Platform makes change data capture accessible while maintaining enterprise-grade performance, governance, and reliability.