# Backfill Demo: Process Data Notebook

This notebook processes data for a single position date. It's designed to be run as a Databricks Job task.

## What It Does:
1. Accepts `position_date` parameter via widget
2. Reads data from source table for that date
3. Writes to destination table (partition overwrite for idempotency)

**Note:** Business day validation is handled by the orchestrator, not in this notebook.

In [0]:
# Setup and Parameters
import sys
from pyspark.sql import functions as F

# Get current notebook path dynamically
notebook_path = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
workspace_path = f"/Workspace{notebook_path}"
base_path = workspace_path.rsplit('/', 1)[0]

# Add to sys.path for importing config
sys.path.append(base_path)

from config import SOURCE_TABLE, DEST_TABLE

# Widget for position_date parameter
dbutils.widgets.text("position_date", "", "Position Date (YYYY-MM-DD)")
position_date_str = dbutils.widgets.get("position_date")

print(f"📅 Position Date: {position_date_str}")
print(f"? Source: {SOURCE_TABLE}")
print(f"? Destination: {DEST_TABLE}")

📅 Position Date: 2025-01-15
? Source: demos.backfill_demo.source_data
? Destination: demos.backfill_demo.destination_data


In [0]:
# Perform ETL - Read, Transform, Write
try:
    print(f"\n🔄 Processing data for {position_date_str}...")
    
    # Read source data for the specific date
    # Cast position_date for proper comparison
    df_source = (spark.table(SOURCE_TABLE)
                 .filter(F.col("position_date") == F.to_date(F.lit(position_date_str))))
    
    record_count = df_source.count()
    
    if record_count == 0:
        print(f"⚠️ No data found for {position_date_str}")
    else:
        print(f"📊 Found {record_count:,} records")
        
        # Write to destination table using partition overwrite
        # This ensures idempotency - reruns will replace data for this date
        (df_source.write
            .format("delta")
            .mode("overwrite")
            .option("replaceWhere", f"position_date = cast('{position_date_str}' as date)")
            .saveAsTable(DEST_TABLE))
        
        print(f"✓ Successfully processed {position_date_str}")
        print(f"  └─ {record_count:,} records written to {DEST_TABLE}")

except Exception as e:
    error_msg = str(e)
    print(f"✗ Error processing {position_date_str}: {error_msg}")
    raise  # Re-raise to fail the job


🔄 Processing data for 2025-01-15...
📊 Found 10 records
✓ Successfully processed 2025-01-15
  └─ 10 records written to demos.backfill_demo.destination_data


## Notebook Complete! ✓

This notebook is designed to be called by:
- Databricks Workflow Jobs
- Orchestration notebooks (like `03_backfill_orchestrator.ipynb`)
- Manual runs for testing

**Key Features:**
- ✅ Simple ETL logic (read → write)
- ✅ Idempotent writes (partition overwrite)
- ✅ Error handling with proper exceptions
- ✅ Clear logging and status messages

**Note:** Business day validation should be handled by the caller/orchestrator.