# Bronze Ingestion — Sales Analytics

This notebook ingests raw CSV files from the Lakehouse Files area and writes
Delta tables into the Bronze layer. Bronze is intentionally light‑touch:
no business logic, no conformance, no joins. The goal is to land raw‑but‑readable
Delta tables for downstream Silver and Gold transformations.

### Source Files
- `customers.csv`
- `products.csv`
- `sales.csv`

### Workflow
1. Read raw CSVs from Lakehouse Files  
2. Apply minimal Bronze cleanup (trim whitespace)  
3. Write Delta tables into the Lakehouse as `bronze_*`  
4. Validate row counts  

### Notes
- This notebook reflects the chosen ingestion pattern after evaluating
  Dataflow Gen2, Warehouse SQL, Pipelines, and Notebooks.
- Notebooks provide a modern, Fabric‑native ingestion surface using Python and Spark.

In [None]:
# ============================================================
# BRONZE INGESTION — SALES ANALYTICS (EXPLICIT SCHEMA VERSION)
# ============================================================

from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType, DateType

# ------------------------------------------------------------
# 1. CONFIGURATION
# ------------------------------------------------------------
lakehouse_path = "Files/"

files = {
    "customers": "customers.csv",
    "products":  "products.csv",
    "sales":     "sales.csv"
}

# ------------------------------------------------------------
# 2. DEFINE SCHEMAS
# ------------------------------------------------------------
schema_customers = StructType([
    StructField("customer_id", IntegerType(), True),
    StructField("first_name", StringType(), True),
    StructField("last_name", StringType(), True),
    StructField("address", StringType(), True),
    StructField("city", StringType(), True),
    StructField("state", StringType(), True),
    StructField("zip", StringType(), True),
    StructField("country", StringType(), True)
])

schema_products = StructType([
    StructField("product_id", IntegerType(), True),
    StructField("product_name", StringType(), True),
    StructField("category", StringType(), True),
    StructField("price", DoubleType(), True)
])

schema_sales = StructType([
    StructField("sale_id", IntegerType(), True),
    StructField("customer_id", IntegerType(), True),
    StructField("product_id", IntegerType(), True),
    StructField("quantity", IntegerType(), True),
    StructField("sale_date", StringType(), True)  # We'll cast this later in Silver
])

# ------------------------------------------------------------
# 3. LOAD RAW CSV FILES (NO HEADER INFERENCE)
# ------------------------------------------------------------
df_customers = spark.read.csv(f"{lakehouse_path}{files['customers']}", header=False, schema=schema_customers)
df_products  = spark.read.csv(f"{lakehouse_path}{files['products']}",  header=False, schema=schema_products)
df_sales     = spark.read.csv(f"{lakehouse_path}{files['sales']}",     header=False, schema=schema_sales)

# ------------------------------------------------------------
# 4. BRONZE CLEANUP (TRIM STRINGS)
# ------------------------------------------------------------
def bronze_trim(df):
    for col_name, dtype in df.dtypes:
        if dtype == "string":
            df = df.withColumn(col_name, F.trim(F.col(col_name)))
    return df

df_customers = bronze_trim(df_customers)
df_products  = bronze_trim(df_products)
df_sales     = bronze_trim(df_sales)

# ------------------------------------------------------------
# 5. WRITE BRONZE DELTA TABLES
# ------------------------------------------------------------
df_customers.write.format("delta").mode("overwrite").saveAsTable("bronze_customers")
df_products.write.format("delta").mode("overwrite").saveAsTable("bronze_products")
df_sales.write.format("delta").mode("overwrite").saveAsTable("bronze_sales")

# ------------------------------------------------------------
# 6. VALIDATION — ROW COUNTS
# ------------------------------------------------------------
print("Bronze table row counts:")
spark.sql("SELECT 'bronze_customers' AS table_name, COUNT(*) AS row_count FROM bronze_customers").show()
spark.sql("SELECT 'bronze_products'  AS table_name, COUNT(*) AS row_count FROM bronze_products").show()
spark.sql("SELECT 'bronze_sales'     AS table_name, COUNT(*) AS row_count FROM bronze_sales").show()

# ============================================================
# END OF BRONZE INGESTION
# ============================================================

NameError: name 'spark' is not defined