# Bronze Ingestion — Sales Table

This notebook ingests raw sales data into the Bronze layer of the Lakehouse. It applies schema alignment, enforces deterministic column types, and writes the normalized dataset into the Bronze Delta table. The workflow is consolidated into a single execution block to maintain clarity, reproducibility, and operational consistency across environments.

This ingestion pattern serves as the foundation for downstream Data Quality checks, SCD2 processing, and Silver transformations within the Lakehouse Expansion pillar.

In [None]:
# Step 1 — Read raw source data
raw_df = (
    spark.read
    .format("csv")
    .option("header", True)
    .load("/lakehouse/default/Files/raw/sales")
)

# Step 2 — Apply schema normalization
from pyspark.sql.functions import col

bronze_df = (
    raw_df
    .withColumn("order_id", col("order_id").cast("string"))
    .withColumn("customer_id", col("customer_id").cast("string"))
    .withColumn("order_amount", col("order_amount").cast("double"))
    .withColumn("order_date", col("order_date").cast("date"))
)

# Step 3 — Write to Bronze table
bronze_df.write.mode("overwrite").format("delta").saveAsTable("lakehouse.bronze_sales")

# Step 4 — Return preview
bronze_df.limit(10).toPandas()