#Silver Layer Scripting : Transformation Notebook

This notebook focuses exclusively on transforming the **sales details** dataset from the Bronze layer into a clean and trusted Silver table.
Each transformation ensures data quality, consistency, and analytics readiness

**Dataset full Name** : bike_lakehouse.bronze.crm_sales_details


### Load functions and Libraries 

In [0]:
import  pyspark.sql.functions as F
from    pyspark.sql.functions import col , trim , length

from    pyspark.sql.types import StringType , DateType

### Load Bronze Table
Read the Bronze table into a Spark DataFrame to begin transformations.

In [0]:
df = spark.table('bike_lakehouse.bronze.crm_sales_details')

In [0]:
df.limit(10).display() 

sls_ord_num,sls_prd_key,sls_cust_id,sls_order_dt,sls_ship_dt,sls_due_dt,sls_sales,sls_quantity,sls_price
SO43697,BK-R93R-62,21768,20101229,20110105,20110110,3578,1,3578
SO43698,BK-M82S-44,28389,20101229,20110105,20110110,3400,1,3400
SO43699,BK-M82S-44,25863,20101229,20110105,20110110,3400,1,3400
SO43700,BK-R50B-62,14501,20101229,20110105,20110110,699,1,699
SO43701,BK-M82S-44,11003,20101229,20110105,20110110,3400,1,3400
SO43702,BK-R93R-44,27645,20101230,20110106,20110111,3578,1,3578
SO43703,BK-R93R-62,16624,20101230,20110106,20110111,3578,1,3578
SO43704,BK-M82B-48,11005,20101230,20110106,20110111,3375,1,3375
SO43705,BK-M82S-38,11011,20101230,20110106,20110111,3400,1,3400
SO43706,BK-R93R-48,27621,20101231,20110107,20110112,3578,1,3578


### Trim String Columns
Automatically remove leading/trailing spaces from all string columns.

In [0]:
for field in df.schema.fields :

    if isinstance(field.dataType,StringType) :

        df = df.withColumn(field.name, trim(col(field.name)))

### Date Casting Transformation
The following line converts the column prd_start_dt into a proper Spark Date data type.

In [0]:
df = (
    df
    .withColumn(
        'sls_order_dt',
        F.when((col('sls_order_dt') == 0) | (length(col('sls_order_dt')) != 8), None)
        .otherwise(F.to_date(col('sls_order_dt').cast('string'), 'yyyyMMdd'))
    )
    .withColumn(
        'sls_ship_dt',
        F.when((col('sls_ship_dt') == 0) | (length(col('sls_ship_dt')) != 8), None)
        .otherwise(F.to_date(col('sls_ship_dt').cast('string'), 'yyyyMMdd'))
    )
    .withColumn(
        'sls_due_dt',
        F.when((col('sls_due_dt') == 0) | (length(col('sls_due_dt')) != 8), None)
        .otherwise(F.to_date(col('sls_due_dt').cast('string'), 'yyyyMMdd'))
    )
)

###Sales and Price Corrections
Check out sales amount for each product, and extract the missing prices from the total sales and sold quantity

In [0]:
df = (
    df
    .withColumn(
        "sls_price",
        F.when(
            (col("sls_price").isNull()) | (col("sls_price") <= 0),
            F.when(
                col("sls_quantity") != 0,
                col("sls_sales") / col("sls_quantity")
            ).otherwise(None)
        ).otherwise(col("sls_price"))
    )
)


### Rename Columns
Standardize column names across the dataset using a mapping dictionary.

In [0]:
RENAME_MAP = {
    "sls_ord_num": "order_number",
    "sls_prd_key": "product_number",
    "sls_cust_id": "customer_id",
    "sls_order_dt": "order_date",
    "sls_ship_dt": "ship_date",
    "sls_due_dt": "due_date",
    "sls_sales": "sales_amount",
    "sls_quantity": "quantity",
    "sls_price": "price"
}

In [0]:
for old_name , new_name in RENAME_MAP.items() :
    df = df.withColumnRenamed(old_name , new_name)

###Sanity checks of dataframe
Quickly check the result of transformations, before moving forward with the dataFrame

In [0]:
df.limit(10).display()

order_number,product_number,customer_id,order_date,ship_date,due_date,sales_amount,quantity,price
SO43697,BK-R93R-62,21768,2010-12-29,2011-01-05,2011-01-10,3578,1,3578.0
SO43698,BK-M82S-44,28389,2010-12-29,2011-01-05,2011-01-10,3400,1,3400.0
SO43699,BK-M82S-44,25863,2010-12-29,2011-01-05,2011-01-10,3400,1,3400.0
SO43700,BK-R50B-62,14501,2010-12-29,2011-01-05,2011-01-10,699,1,699.0
SO43701,BK-M82S-44,11003,2010-12-29,2011-01-05,2011-01-10,3400,1,3400.0
SO43702,BK-R93R-44,27645,2010-12-30,2011-01-06,2011-01-11,3578,1,3578.0
SO43703,BK-R93R-62,16624,2010-12-30,2011-01-06,2011-01-11,3578,1,3578.0
SO43704,BK-M82B-48,11005,2010-12-30,2011-01-06,2011-01-11,3375,1,3375.0
SO43705,BK-M82S-38,11011,2010-12-30,2011-01-06,2011-01-11,3400,1,3400.0
SO43706,BK-R93R-48,27621,2010-12-31,2011-01-07,2011-01-12,3578,1,3578.0


### Write Silver Table
Persist the cleaned DataFrame as a Delta table in the Silver layer.

In [0]:
df.write.mode('overwrite').format('delta').saveAsTable('bike_lakehouse.silver.crm_sales')

In [0]:
%sql
drop table if exists bike_lakehouse.silver.sales ;