# Silver → Gold: Customer Dimension

## Purpose
Create customer master dimension table for star schema

## Source
- Silver: `customers_clean`

## Output
- Gold: `dim_customers`
- Grain: One row per unique customer
- Type: SCD Type 1 (current state only)

**Author:** Kevin  
**Date:** Feb 9, 2026


In [0]:
from pyspark.sql.functions import col, count, current_timestamp

storage_account_name = "stgolistmigration"
account_key = ""

spark.conf.set(
    f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net",
    account_key
)

def get_silver_path(table):
    return f"abfss://silver@{storage_account_name}.dfs.core.windows.net/{table}/"

def get_gold_path(table):
    return f"abfss://gold@{storage_account_name}.dfs.core.windows.net/{table}/"

print("✅ Config loaded")


✅ Config loaded


In [0]:
print("📖 Loading customers from Silver...")

customers_path = get_silver_path("customers_clean")
df_customers = spark.read.format("delta").load(customers_path)

print(f"✅ Loaded: {df_customers.count():,} customers")


📖 Loading customers from Silver...
✅ Loaded: 99,441 customers


In [0]:
print("🔄 Creating customer dimension...")

df_dim_customers = df_customers \
    .select(
        col("customer_id").alias("customer_key"),
        "customer_unique_id",
        col("zip_code").alias("customer_zip_code"),
        col("city").alias("customer_city"),
        col("state").alias("customer_state")
    ) \
    .withColumn("dim_load_timestamp", current_timestamp())

print(f"✅ Dimension created: {df_dim_customers.count():,} customers")

# Show sample
df_dim_customers.limit(3).show(truncate=False, vertical=True)


🔄 Creating customer dimension...
✅ Dimension created: 99,441 customers
-RECORD 0----------------------------------------------
 customer_key       | cb96e80748675729d4d3857321cacb26 
 customer_unique_id | 67626c49069bdf05ff127d59a6649948 
 customer_zip_code  | 88348                            
 customer_city      | Camboriu                         
 customer_state     | SC                               
 dim_load_timestamp | 2026-02-09 13:22:39.312859       
-RECORD 1----------------------------------------------
 customer_key       | 279a316b31aac761c16483859fa9d33b 
 customer_unique_id | 7fe108cef30f4cca2f8696a711a07d0b 
 customer_zip_code  | 4003.                            
 customer_city      | Sao Paulo                        
 customer_state     | SP                               
 dim_load_timestamp | 2026-02-09 13:22:39.312859       
-RECORD 2----------------------------------------------
 customer_key       | 1914576a74b2122c63a456293cf7e52c 
 customer_unique_id | 8574faf241b

In [0]:
output_path = get_gold_path("dim_customers")

print(f"💾 Writing to: {output_path}")

df_dim_customers.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .save(output_path)

print("✅ dim_customers complete!")
print(f"   Rows: {df_dim_customers.count():,}")


💾 Writing to: abfss://gold@stgolistmigration.dfs.core.windows.net/dim_customers/
✅ dim_customers complete!
   Rows: 99,441


In [0]:
print("🔍 Verifying...")

df_verify = spark.read.format("delta").load(output_path)

print(f"✅ Verified: {df_verify.count():,} customers")

print("\nCustomers by state:")
df_verify.groupBy("customer_state").count().orderBy(col("count").desc()).limit(10).show()

print("🎉 dim_customers → Gold complete!")


🔍 Verifying...
✅ Verified: 99,441 customers

Customers by state:
+--------------+-----+
|customer_state|count|
+--------------+-----+
|            SP|41746|
|            RJ|12852|
|            MG|11635|
|            RS| 5466|
|            PR| 5045|
|            SC| 3637|
|            BA| 3380|
|            DF| 2140|
|            ES| 2033|
|            GO| 2020|
+--------------+-----+

🎉 dim_customers → Gold complete!
