# Silver Layer Scripting : Transformation Notebook

This notebook focuses exclusively on transforming the **sales details** dataset from the Bronze layer into a clean and trusted Silver table.
Each transformation ensures data quality, consistency, and analytics readiness

**Dataset full Name** : bike_lakehouse.bronze.erp_cust_az12


###Load functions and Libraries

In [0]:
import pyspark.sql.functions as F
from   pyspark.sql.functions import col , trim , length

from   pyspark.sql.types import StringType , DateType  

### Load Bronze Table
Read the Bronze table into a Spark DataFrame to begin transformations.

In [0]:
df = spark.table('bike_lakehouse.bronze.erp_cust_az12')

In [0]:
df.limit(10).display()

CID,BDATE,GEN
NASAW00011000,1971-10-06,Male
NASAW00011001,1976-05-10,Male
NASAW00011002,1971-02-09,Male
NASAW00011003,1973-08-14,Female
NASAW00011004,1979-08-05,Female
NASAW00011005,1976-08-01,Male
NASAW00011006,1976-12-02,Female
NASAW00011007,1969-11-06,Male
NASAW00011008,1975-07-04,Female
NASAW00011009,1969-09-29,Male


### Trim String Columns
Automatically remove leading/trailing spaces from all string columns.

In [0]:
for field in df.schema.fields :

    if isinstance(field.name , StringType) :
        df = df.withColumn(field.name , trim(col(field.name)))

### Customer ID Cleanup

In [0]:
df = (
    df.withColumn(
        'CID' ,
        F
        .when(col('CID').startswith('NAS') , F.substring(col('CID') , 4  , F.length(col('CID'))))
        .otherwise(col('CID'))
    )
)

###Birthdate Validation

In [0]:
df  = df.withColumn(
    'BDATE' ,
    F
    .when(col('BDATE') > F.current_date() , None)
    .otherwise (col('BDATE'))

)

### Gender Normalization

In [0]:
df = df.withColumn('GEN' ,
    F
    .when(F.upper(col('GEN')).isin('MALE', 'M') , 'Male' )
    .when(F.upper(col('GEN')).isin('FEMALE', 'F') , 'Female' )
    .otherwise('n/a')

)

### Renaming Columns

In [0]:

RENAME_MAP = {
    "cid": "customer_number",
    "bdate": "birth_date",
    "gen": "gender"
}

In [0]:
for old_name, new_name in RENAME_MAP.items():

    df = df.withColumnRenamed(old_name , new_name)

### Sanity checks of dataframe
Quickly check the result of transformations, before moving forward with the dataFrame

In [0]:
df.limit(10).display()

customer_number,birth_date,gender
AW00011000,1971-10-06,Male
AW00011001,1976-05-10,Male
AW00011002,1971-02-09,Male
AW00011003,1973-08-14,Female
AW00011004,1979-08-05,Female
AW00011005,1976-08-01,Male
AW00011006,1976-12-02,Female
AW00011007,1969-11-06,Male
AW00011008,1975-07-04,Female
AW00011009,1969-09-29,Male


### Write Silver Table
Persist the cleaned DataFrame as a Delta table in the Silver layer.


In [0]:
df.write.mode('overwrite').format('delta').saveAsTable('bike_lakehouse.silver.erp_customers')