# Enabling Schema Evolution in Delta Lake using PySpark

This notebook demonstrates how to enable **schema evolution** in Delta Lake using PySpark. Schema evolution allows you to append data with new columns to an existing Delta table by using the `mergeSchema` option.

## 🔧 Setup SparkSession

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("DeltaSchemaEvolution") \
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
    .getOrCreate()

## 📦 Create Initial Delta Table
We start by creating a simple DataFrame and saving it as a Delta table.

In [None]:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema
schema_v1 = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True)
])

# Create DataFrame
data_v1 = [(1, "Alice"), (2, "Bob")]
df_v1 = spark.createDataFrame(data_v1, schema=schema_v1)

# Save as Delta table
df_v1.write.format("delta").mode("overwrite").save("/tmp/delta/schema_evolution_demo")

## ➕ Append Data with New Column
Now we create a new DataFrame with an additional column `age` and append it to the existing Delta table using `mergeSchema=True`.

In [None]:
# Define new schema with an additional column
schema_v2 = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])

# Create new DataFrame
data_v2 = [(3, "Charlie", 30), (4, "Diana", 28)]
df_v2 = spark.createDataFrame(data_v2, schema=schema_v2)

# Append with schema evolution
df_v2.write.format("delta") \
    .option("mergeSchema", "true") \
    .mode("append") \
    .save("/tmp/delta/schema_evolution_demo")

## 📄 Read and Display Final Table
We read the Delta table to verify that the schema has evolved to include the new column.

In [None]:
df_final = spark.read.format("delta").load("/tmp/delta/schema_evolution_demo")
df_final.printSchema()
df_final.show()

## ✅ Summary
- Schema evolution allows appending data with new columns.
- Use `.option("mergeSchema", "true")` when writing to Delta tables.
- This feature is useful for evolving data models without rewriting existing data.