### Silver Layer - Data Cleaning & Standardization

### Purpose
The Silver layer converts raw Bronze data into a clean, reliable dataset suitable for analytics.

### Transformations Applied
- Filters invalid purchase amounts
- Trims string columns to standardize text fields
- Derives reusable columns (year, month)
- Removes duplicate transactions
- Applies basic data quality rules

### Why Silver Layer?
- Ensures trusted, high-quality data
- Centralizes cleaning logic
- Prevents bad data from reaching analytics & dashboards

### Execution
- Reads from Bronze Delta table
- Writes cleaned data to Silver Delta table
- Orchestrated via Databricks Jobs with task dependencies


In [0]:
# Widgets (parameters)
dbutils.widgets.text("bronze_table", "ecom_bronze.transactions_bronze", "bronze_table")
dbutils.widgets.text("silver_table", "ecom_silver.transactions_silver", "silver_table")

bronze_table = dbutils.widgets.get("bronze_table")
silver_table = dbutils.widgets.get("silver_table")

In [0]:
bronze = spark.table(bronze_table)


In [0]:
from pyspark.sql import functions as F

silver_df = (
    bronze
    .filter(F.col("Purchase_Amount") > 0)
    .withColumn("User_Name", F.trim(F.col("User_Name")))
    .withColumn("Country", F.trim(F.col("Country")))
    .withColumn("Product_Category", F.trim(F.col("Product_Category")))
    .withColumn("Payment_Method", F.trim(F.col("Payment_Method")))
    .withColumn("transaction_year", F.year("Transaction_Date"))
    .withColumn("transaction_month", F.month("Transaction_Date"))
    .dropDuplicates(["Transaction_ID"])
)


In [0]:
silver_df.write.format("delta").mode("overwrite").saveAsTable(silver_table)


In [0]:
silver = spark.table(silver_table)
print("Silver count:", silver.count())
display(silver.limit(5))


Silver count: 50000


Transaction_ID,User_Name,Age,Country,Product_Category,Purchase_Amount,Payment_Method,Transaction_Date,ingest_ts,source_name,ingest_date,transaction_year,transaction_month
271,Noah Thompson,52,India,Home & Kitchen,97.84,Debit Card,2025-01-04,2026-01-15T16:20:24.516Z,default.ecommerce_transactions,2026-01-15,2025,1
1785,Emma Harris,35,Germany,Grocery,55.06,PayPal,2023-08-04,2026-01-15T16:20:24.516Z,default.ecommerce_transactions,2026-01-15,2023,8
1898,Sophia Thompson,65,Germany,Books,323.11,Debit Card,2024-06-18,2026-01-15T16:20:24.516Z,default.ecommerce_transactions,2026-01-15,2024,6
2851,James Clark,64,France,Grocery,179.78,Credit Card,2024-05-21,2026-01-15T16:20:24.516Z,default.ecommerce_transactions,2026-01-15,2024,5
7312,Elijah Rodriguez,41,Germany,Clothing,117.59,UPI,2024-02-07,2026-01-15T16:20:24.516Z,default.ecommerce_transactions,2026-01-15,2024,2
