# 01 â€“ Bronze Schema Drift

## Context
This notebook demonstrates how schema drift behaves at the Bronze layer and why strict schema enforcement during ingestion is discouraged in Databricks architectures.

This aligns directly with **Scenario 1** in Week 1.

## Exam Lens
The Databricks Data Engineer Professional exam expects Bronze ingestion to:
- Absorb upstream variability
- Preserve replayability
- Avoid early enforcement


## Step 1: Simulate Raw JSON Data with Schema Drift

In [None]:

raw_data_v1 = [
    '{"id": 1, "event": "click", "ts": "2025-01-01T10:00:00"}',
    '{"id": 2, "event": "view", "ts": "2025-01-01T10:05:00"}'
]

raw_data_v2 = [
    '{"id": 3, "event": "click", "ts": "2025-01-01T11:00:00", "device": "mobile"}'
]


## Step 2: Ingest with Schema Inference (Bronze-style)

In [None]:

df_v1 = spark.read.json(spark.sparkContext.parallelize(raw_data_v1))
df_v1.write.mode("overwrite").saveAsTable("de_pro_week1.bronze_events")

df_v2 = spark.read.json(spark.sparkContext.parallelize(raw_data_v2))
df_v2.write.mode("append").saveAsTable("de_pro_week1.bronze_events")


Inspect the resulting schema:

In [None]:

spark.sql("DESCRIBE TABLE de_pro_week1.bronze_events").show(truncate=False)


### Observation
- Schema evolves safely
- No ingestion failure
- All source data preserved


## Step 3: Enforce a Strict Schema (Anti-pattern at Bronze)

In [None]:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType

strict_schema = StructType([
    StructField("id", IntegerType(), True),
    StructField("event", StringType(), True),
    StructField("ts", StringType(), True)
])

spark.read.schema(strict_schema).json(
    spark.sparkContext.parallelize(raw_data_v2)
).write.mode("overwrite").saveAsTable("de_pro_week1.bronze_events_strict")


### Observation
- New column (`device`) is dropped or ingestion fails
- Replayability is compromised
- Bronze becomes fragile


## Reflection Questions
1. Where should schema enforcement happen instead?
2. How would this failure propagate downstream?
3. Why does the exam prefer permissive Bronze ingestion?


## Exam Takeaway
Bronze ingestion should **absorb change, not reject it**.

Strict schema enforcement belongs in Silver, where contracts are intentional.