# Load Bronze Data to Silver Table - Customer

## Overview
Load Customer sample data from Bronze lakehouse files into Silver lakehouse table.

## Data Flow
- **Source**: MAAG_LH_Bronze/Files/samples_fabric/shared/Customer_Samples.csv
- **Target**: MAAG_LH_Silver.shared.Customer table (or any attached default lakehouse)
- **Process**: Read CSV, validate schema, load to Delta table

---

In [11]:
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import col, sum as spark_sum
import os

# Configuration - Using correct Fabric cross-lakehouse path from Fabric portal
WORKSPACE_NAME = "Fabric_MAAG"
SOURCE_LAKEHOUSE_NAME = "MAAG_LH_Bronze"
SOURCE_PATH = f"abfss://{WORKSPACE_NAME}@onelake.dfs.fabric.microsoft.com/{SOURCE_LAKEHOUSE_NAME}.Lakehouse/Files/samples_fabric/shared/Customer_Samples.csv"

TARGET_SCHEMA = "shared"
TARGET_TABLE = "Customer"
TARGET_FULL_PATH = f"{TARGET_SCHEMA}.{TARGET_TABLE}"

print(f"🔄 Loading Customer data")
print(f"📂 Source: {SOURCE_PATH}")
print(f"🎯 Target: {TARGET_FULL_PATH}")

# Read CSV from Bronze lakehouse
df = spark.read.option("header", "true").option("inferSchema", "true").csv(SOURCE_PATH)

print(f"✅ Data loaded successfully")
print(f"📊 Records: {df.count()}")
print(f"📋 Columns: {df.columns}")

# Display sample data
print(f"\n📖 Sample data:")
df.show(10, truncate=False)

StatementMeta(, f96becee-86e7-43cc-aeae-7462a83d16d6, 13, Finished, Available, Finished)

🔄 Loading Customer data
📂 Source: abfss://Fabric_MAAG@onelake.dfs.fabric.microsoft.com/MAAG_LH_Bronze.Lakehouse/Files/samples_fabric/shared/Customer_Samples.csv
🎯 Target: shared.Customer
✅ Data loaded successfully
📊 Records: 513
📋 Columns: ['CustomerId', 'CustomerTypeId', 'CustomerRelationshipTypeId', 'DateOfBirth', 'CustomerEstablishedDate', 'IsActive', 'FirstName', 'LastName', 'Gender', 'PrimaryPhone', 'SecondaryPhone', 'PrimaryEmail', 'SecondaryEmail', 'CreatedBy']

📖 Sample data:
+----------+--------------+--------------------------+-----------+-----------------------+--------+---------+---------+------+--------------+--------------+---------------------+--------------+---------+
|CustomerId|CustomerTypeId|CustomerRelationshipTypeId|DateOfBirth|CustomerEstablishedDate|IsActive|FirstName|LastName |Gender|PrimaryPhone  |SecondaryPhone|PrimaryEmail         |SecondaryEmail|CreatedBy|
+----------+--------------+--------------------------+-----------+-----------------------+--------+----

In [10]:
# Validate and conform to target schema
print(f"🔍 Validating data quality...")

# Required columns from Model_Shared_Data.ipynb Customer table
required_columns = [
    "CustomerId", "CustomerTypeId", "IsActive", "CustomerNamePrefix", "FirstName", "LastName", "MiddleName",
    "Gender", "DateOfBirth", "PrimaryPhone", "SecondaryPhone", "PrimaryEmail", "SecondaryEmail",
    "CustomerEstablishedDate", "CustomerRelationshipTypeId", "CustomerNote", "CreatedBy", "UpdatedBy"
]

# Only add/populate UpdatedBy if missing
from pyspark.sql import functions as F
if "UpdatedBy" not in df.columns:
    df = df.withColumn("UpdatedBy", F.lit("Source_Data_Loader"))
    print("✅ Added UpdatedBy column with value 'Source_Data_Loader'.")

print(f"✅ Schema reference (required_columns) retained for documentation/model awareness.")

missing_columns = [c for c in required_columns if c not in df.columns]
if missing_columns:
    print(f"⚠️ Warning: Missing columns in source data: {missing_columns}")
else:
    print(f"✅ All required columns present in source data.")

print(f"✅ Schema validation complete (no error raised for missing columns).")

# Data quality checks
null_counts = df.select([spark_sum(col(c).isNull().cast("int")).alias(c) for c in df.columns]).collect()[0]
print(f"\n📊 Data Quality Check:")
for col_name in df.columns:
    null_count = null_counts[col_name]
    if null_count > 0:
        print(f"  {col_name}: {null_count} null values")
    else:
        print(f"  {col_name}: ✅ No nulls")

# Show value distributions for CustomerTypeId
print(f"\n🎯 CustomerTypeId Distribution:")
df.groupBy("CustomerTypeId").count().orderBy("CustomerTypeId").show()

StatementMeta(, f96becee-86e7-43cc-aeae-7462a83d16d6, 12, Finished, Available, Finished)

🔍 Validating data quality...
✅ Schema reference (required_columns) retained for documentation/model awareness.
✅ Schema validation complete (no error raised for missing columns).

📊 Data Quality Check:
  CustomerId: ✅ No nulls
  CustomerTypeId: ✅ No nulls
  CustomerRelationshipTypeId: ✅ No nulls
  DateOfBirth: ✅ No nulls
  CustomerEstablishedDate: ✅ No nulls
  IsActive: ✅ No nulls
  FirstName: ✅ No nulls
  LastName: ✅ No nulls
  Gender: ✅ No nulls
  PrimaryPhone: ✅ No nulls
  SecondaryPhone: 379 null values
  PrimaryEmail: ✅ No nulls
  SecondaryEmail: 433 null values
  CreatedBy: ✅ No nulls
  UpdatedBy: ✅ No nulls

🎯 CustomerTypeId Distribution:
+--------------+-----+
|CustomerTypeId|count|
+--------------+-----+
|      Business|  102|
|    Government|   58|
|    Individual|  353|
+--------------+-----+



In [12]:
# Load data to Silver table
print(f"💾 Loading data to Silver table: {TARGET_FULL_PATH}")

try:
    df.write \
      .format("delta") \
      .mode("overwrite") \
      .option("overwriteSchema", "true") \
      .saveAsTable(TARGET_FULL_PATH)

    print(f"✅ Data loaded successfully to {TARGET_FULL_PATH}")

    # Verify the load
    result_count = spark.sql(f"SELECT COUNT(*) as count FROM {TARGET_FULL_PATH}").collect()[0]["count"]
    print(f"📊 Records in target table: {result_count}")

    # Show sample of loaded data
    print(f"\n📖 Sample from Silver table:")
    spark.sql(f"SELECT * FROM {TARGET_FULL_PATH} ORDER BY CustomerId").show(10, truncate=False)

    print(f"🎉 Customer data load complete!")

except Exception as e:
    print(f"❌ Error loading data to table: {str(e)}")
    raise

StatementMeta(, f96becee-86e7-43cc-aeae-7462a83d16d6, 14, Finished, Available, Finished)

💾 Loading data to Silver table: shared.Customer
✅ Data loaded successfully to shared.Customer
📊 Records in target table: 513

📖 Sample from Silver table:
+----------+--------------+--------------------------+-----------+-----------------------+--------+---------+---------+------+--------------+--------------+---------------------+--------------+---------+
|CustomerId|CustomerTypeId|CustomerRelationshipTypeId|DateOfBirth|CustomerEstablishedDate|IsActive|FirstName|LastName |Gender|PrimaryPhone  |SecondaryPhone|PrimaryEmail         |SecondaryEmail|CreatedBy|
+----------+--------------+--------------------------+-----------+-----------------------+--------+---------+---------+------+--------------+--------------+---------------------+--------------+---------+
|CID-001   |Individual    |VIP                       |1960-03-18 |2021-02-08             |true    |Tsehayetu|Abera    |Female|(985) 555-0158|NULL          |tsehayetu@contoso.com|NULL          |Sales    |
|CID-002   |Government    |Loc