#Silver Layer Scripting : Transformation Notebook

This notebook focuses exclusively on transforming the **customer information** dataset from the Bronze layer into a clean and trusted Silver table.
Each transformation ensures data quality, consistency, and analytics readiness

**Dataset full Name** : bike_lakehouse.bronze.crm_cust_info



###Load functions and libraries 

In [0]:
# imports all built-in Spark SQL functions
import   pyspark.sql.functions as F

# imports a data type definition for explicitly use
from     pyspark.sql.types     import StringType

#Imports specific functions directly instead of using the F. prefix
from     pyspark.sql.functions import trim , col

# ! Note : col() Refers to a column in a DataFrame. Equivalent to: df["customer_name"]

###Load Bronze Table 
Read the Bronze table into a Spark DataFrame to begin transformations.

In [0]:
# Spark refers to tables by loading them into a DataFrame variable (df).
df = spark.table("bike_lakehouse.bronze.crm_cust_info")

###Trim String Columns
Automatically remove leading/trailing spaces from all string columns.

In [0]:
for field in df.schema.fields :
    if isinstance(field.dataType,StringType) :
        df = df.withColumn(field.name, trim(col(field.name)))

# Note : This returns metadata about the DataFrame structure (Column names, dataType, Nullable info)

###Normalize Categorical Columns
Convert coded values to readable, standardized categories.

In [0]:
df = (
    df.withColumn(
    'cst_marital_status' ,
    F
    .when(F.upper(F.col('cst_marital_status')) == 'S' , 'Single')
    .when(F.upper(F.col('cst_marital_status')) == 'M' ,'Married')
    .otherwise('n/a')
     
    )
    .withColumn(
    'cst_gndr' ,
    F
    .when(F.upper(F.col('cst_gndr')) == 'M' , 'Male'  )
    .when(F.upper(F.col('cst_gndr')) == 'F' , 'Female')
    .otherwise('n/a')

    )
)

# ! Note : withColumn() Creates or replaces a column resulting a new DataFrame

###Remove Rows with Null Keys
Filter out records with null primary keys to maintain referential integrity.

In [0]:
df = df.filter(col('cst_id').isNotNull())
"""
Note :  .filter()       Keeps only rows where the condition is True.
        .isNotNull()    Checks whether the column value is not null.
"""

###Rename Columns
Standardize column names across the dataset using a mapping dictionary.

In [0]:
RENAME_MAP = {
    "cst_id": "customer_id",
    "cst_key": "customer_number",
    "cst_firstname": "first_name",
    "cst_lastname": "last_name",
    "cst_marital_status": "marital_status",
    "cst_gndr": "gender",
    "cst_create_date": "created_date"
}

In [0]:
for old_name , new_name in RENAME_MAP.items() :
    df = df.withColumnRenamed(old_name,new_name)

# Note : withColumnRenamed() Renames columns and Returns a new DataFrame

###Sanity checks of dataframe
Quickly check the result of transformations, before moving forward with the dataFrame

In [0]:
df.limit(10).display()


customer_id,customer_number,first_name,last_name,marital_status,gender,created_date
11000,AW00011000,Jon,Yang,Married,Male,2025-10-06
11001,AW00011001,Eugene,Huang,Single,Male,2025-10-06
11002,AW00011002,Ruben,Torres,Married,Male,2025-10-06
11003,AW00011003,Christy,Zhu,Single,Female,2025-10-06
11004,AW00011004,Elizabeth,Johnson,Single,Female,2025-10-06
11005,AW00011005,Julio,Ruiz,Single,Male,2025-10-06
11006,AW00011006,Janet,Alvarez,Single,Female,2025-10-06
11007,AW00011007,Marco,Mehta,Married,Male,2025-10-06
11008,AW00011008,Rob,Verhoff,Single,Female,2025-10-06
11009,AW00011009,Shannon,Carlson,Single,Male,2025-10-06


###Write Silver Table
Persist the cleaned DataFrame as a Delta table in the Silver layer.

In [0]:
df.write.mode("overwrite").format("delta").saveAsTable("bike_lakehouse.silver.crm_customers")

"""
- .mode("overwrite") : Defines how Spark handles existing data. (overwrite Replace existing table)
- .format("delta")   : Specifies the storage format as a Delta
- .saveAsTable(..)   : Creates or replaces a managed table

"""