# Load Silver Table to Gold Table - Invoice

## Overview
Load Invoice data from Silver lakehouse table to Gold lakehouse table.

## Data Flow
- **Source**: MAAG_LH_Silver.finance.Invoice (Silver lakehouse table)
- **Target**: MAAG_LH_Gold.finance.Invoice (Gold lakehouse - attached as default)
- **Process**: Read Silver table, apply transformations, load to Gold Delta table


In [2]:
import pandas as pd
from pyspark.sql.types import *
from pyspark.sql.functions import col, sum as spark_sum, current_timestamp
import os

# Configuration - Silver to Gold data flow
WORKSPACE_NAME = "Fabric_MAAG"
SOURCE_LAKEHOUSE_NAME = "MAAG_LH_Silver"
SOURCE_SCHEMA = "finance"
SOURCE_TABLE = "invoice"

# Source: Absolute path to Silver lakehouse table
SOURCE_TABLE_PATH = f"abfss://{WORKSPACE_NAME}@onelake.dfs.fabric.microsoft.com/{SOURCE_LAKEHOUSE_NAME}.Lakehouse/Tables/{SOURCE_SCHEMA}/{SOURCE_TABLE}"

# Target: Gold lakehouse (attached as default)
TARGET_SCHEMA = "finance"
TARGET_TABLE = "invoice"
TARGET_FULL_PATH = f"{TARGET_SCHEMA}.{TARGET_TABLE}"

print(f"🔄 Loading Invoice from Silver to Gold")
print(f"📂 Source: {SOURCE_TABLE_PATH}")
print(f"🎯 Target: {TARGET_FULL_PATH}")
print("="*50)

# Read from Silver lakehouse table
df = spark.read.format("delta").load(SOURCE_TABLE_PATH)

print(f"✅ Data loaded from Silver table")
print(f"📊 Records: {df.count()}")
print(f"📋 Columns: {df.columns}")

# Display sample data
print(f"\n📖 Sample data from Silver:")
df.show(10, truncate=False)

StatementMeta(, ce633d9f-73d9-4165-9a69-1d777a92dc61, 4, Finished, Available, Finished)

🔄 Loading Invoice from Silver to Gold
📂 Source: abfss://Fabric_MAAG@onelake.dfs.fabric.microsoft.com/MAAG_LH_Silver.Lakehouse/Tables/finance/invoice
🎯 Target: finance.invoice
✅ Data loaded from Silver table
📊 Records: 3619
📋 Columns: ['InvoiceId', 'InvoiceNumber', 'CustomerId', 'OrderId', 'InvoiceDate', 'DueDate', 'SubTotal', 'TaxAmount', 'TotalAmount', 'InvoiceStatus', 'CreatedBy']

📖 Sample data from Silver:
+------------------------------------+-------------+----------+------------------------------------+-----------+----------+--------+---------+-----------+-------------+---------+
|InvoiceId                           |InvoiceNumber|CustomerId|OrderId                             |InvoiceDate|DueDate   |SubTotal|TaxAmount|TotalAmount|InvoiceStatus|CreatedBy|
+------------------------------------+-------------+----------+------------------------------------+-----------+----------+--------+---------+-----------+-------------+---------+
|e59c993d-80cc-4f4c-a9da-feec349b3a13|IN-A100000 

In [3]:
# --- Gold layer transformations and data quality ---
print(f"🔧 Applying Gold layer transformations...")

# Add audit columns for Gold layer and set default for CreatedBy if blank or null
from pyspark.sql.functions import when, trim

df_gold = df.withColumn("GoldLoadTimestamp", current_timestamp())\
    .withColumn("CreatedBy", when(trim(col("CreatedBy")).isNull() | (trim(col("CreatedBy")) == ""), "Sample script").otherwise(col("CreatedBy")))

# Data quality checks for Gold layer
print(f"\n🔍 Gold layer data quality validation...")

# Check for duplicates
duplicate_count = df_gold.groupBy("InvoiceId").count().filter(col("count") > 1).count()
if duplicate_count > 0:
    print(f"⚠️ Found {duplicate_count} duplicate InvoiceId values")
else:
    print(f"✅ No duplicates found")

# Check for nulls in key fields
null_checks = df_gold.select(
    spark_sum(col("InvoiceId").isNull().cast("int")).alias("null_invoiceid"),
    spark_sum(col("InvoiceStatus").isNull().cast("int")).alias("null_invoicestatus")
).collect()[0]

if null_checks["null_invoiceid"] > 0 or null_checks["null_invoicestatus"] > 0:
    print(f"⚠️ Found nulls: InvoiceId={null_checks['null_invoiceid']}, InvoiceStatus={null_checks['null_invoicestatus']}")
else:
    print(f"✅ No nulls in key fields")

print(f"\n📖 Sample Gold data:")
df_gold.show(10, truncate=False)

StatementMeta(, ce633d9f-73d9-4165-9a69-1d777a92dc61, 5, Finished, Available, Finished)

🔧 Applying Gold layer transformations...

🔍 Gold layer data quality validation...
✅ No duplicates found
✅ No nulls in key fields

📖 Sample Gold data:
+------------------------------------+-------------+----------+------------------------------------+-----------+----------+--------+---------+-----------+-------------+-------------+--------------------------+
|InvoiceId                           |InvoiceNumber|CustomerId|OrderId                             |InvoiceDate|DueDate   |SubTotal|TaxAmount|TotalAmount|InvoiceStatus|CreatedBy    |GoldLoadTimestamp         |
+------------------------------------+-------------+----------+------------------------------------+-----------+----------+--------+---------+-----------+-------------+-------------+--------------------------+
|e59c993d-80cc-4f4c-a9da-feec349b3a13|IN-A100000   |CID-001   |6c4014de-1b39-4836-93de-700d42bd626d|2024-09-27 |2024-09-27|1853.82 |92.69    |1946.51    |Issued       |Sample script|2025-08-25 20:13:52.344762|
|fef6e776-

In [4]:
# --- Load data to Gold table ---
print(f"💾 Loading data to Gold table: {TARGET_FULL_PATH}")

try:
    # Write to Gold Delta table (default lakehouse)
    df_gold.write \
      .format("delta") \
      .mode("overwrite") \
      .option("overwriteSchema", "true") \
      .saveAsTable(TARGET_FULL_PATH)

    print(f"✅ Data loaded successfully to Gold table")

    # Verify the load
    result_count = spark.sql(f"SELECT COUNT(*) as count FROM {TARGET_FULL_PATH}").collect()[0]["count"]
    print(f"📊 Records in Gold table: {result_count}")

    # Show sample of loaded Gold data
    print(f"\n📖 Sample from Gold table:")
    spark.sql(f"SELECT * FROM {TARGET_FULL_PATH} ORDER BY InvoiceId").show(10, truncate=False)

    print(f"🎉 Silver to Gold data load complete!")

except Exception as e:
    print(f"❌ Error loading data to Gold table: {str(e)}")
    raise

StatementMeta(, ce633d9f-73d9-4165-9a69-1d777a92dc61, 6, Finished, Available, Finished)

💾 Loading data to Gold table: finance.invoice
✅ Data loaded successfully to Gold table
📊 Records in Gold table: 3619

📖 Sample from Gold table:
+------------------------------------+-------------+----------+------------------------------------+-----------+----------+--------+---------+-----------+-------------+-------------+--------------------------+
|InvoiceId                           |InvoiceNumber|CustomerId|OrderId                             |InvoiceDate|DueDate   |SubTotal|TaxAmount|TotalAmount|InvoiceStatus|CreatedBy    |GoldLoadTimestamp         |
+------------------------------------+-------------+----------+------------------------------------+-----------+----------+--------+---------+-----------+-------------+-------------+--------------------------+
|000065ef-8ff9-49d6-a92a-b2100ca8b166|IN-F101587   |CID-448   |1dec299d-9af2-4e89-b109-84be24fb65f6|2022-06-22 |2022-06-22|7057.91 |352.89   |7410.8     |Issued       |Sample script|2025-08-25 20:14:13.006272|
|0002da04-349b-4