# 🧊 Apache Iceberg - Complex Schema Evolution Scenario
This notebook demonstrates a more advanced schema evolution journey in Apache Iceberg using PySpark.

## ⚙️ SparkSession Setup

In [1]:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("IcebergComplexSchemaEvolution") \
    .config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.local.type", "hadoop") \
    .config("spark.sql.catalog.local.warehouse", "/home/jovyan/iceberg/warehouse") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .getOrCreate()

In [2]:
spark.sql("DROP TABLE IF EXISTS local.db.orders")

DataFrame[]

## 📘 Step 1: Create Initial Orders Table

In [3]:

spark.sql("""
    CREATE TABLE local.db.orders (
        order_id INT,
        customer_id INT,
        order_date STRING
    )
    USING iceberg
""")


DataFrame[]

## ➕ Step 2: Add Columns for Order Details

In [4]:

spark.sql("ALTER TABLE local.db.orders ADD COLUMN amount DOUBLE")
spark.sql("ALTER TABLE local.db.orders ADD COLUMN currency STRING")
spark.sql("ALTER TABLE local.db.orders ADD COLUMN is_priority BOOLEAN")


DataFrame[]

## 🔄 Step 3: Rename and Reorder Columns

In [5]:

spark.sql("ALTER TABLE local.db.orders RENAME COLUMN order_date TO placed_at")
spark.sql("ALTER TABLE local.db.orders ALTER COLUMN is_priority AFTER order_id")


DataFrame[]

## 📝 Step 4: Insert Data After Changes

In [6]:

spark.sql("""
    INSERT INTO local.db.orders VALUES
    (1, true, 101, '2024-12-01', 150.0, 'USD'),
    (2, false, 102, '2024-12-02', 75.5, 'EUR')
""")


DataFrame[]

## 🔧 Step 5: Change Type, Drop Unused Columns

In [7]:

spark.sql("ALTER TABLE local.db.orders ALTER COLUMN customer_id TYPE BIGINT")
spark.sql("ALTER TABLE local.db.orders DROP COLUMN currency")


DataFrame[]

## 🧱 Step 6: Add Nested Struct Column

In [8]:

spark.sql("""
    ALTER TABLE local.db.orders ADD COLUMN shipping_info STRUCT<city:STRING, country:STRING>
""")


DataFrame[]

## 📊 Step 7: Final Schema and Sample Data

In [9]:

spark.sql("DESCRIBE TABLE local.db.orders").show(truncate=False)
spark.sql("SELECT * FROM local.db.orders").show()


+-------------+----------------------------------+-------+
|col_name     |data_type                         |comment|
+-------------+----------------------------------+-------+
|order_id     |int                               |NULL   |
|is_priority  |boolean                           |NULL   |
|customer_id  |bigint                            |NULL   |
|placed_at    |string                            |NULL   |
|amount       |double                            |NULL   |
|shipping_info|struct<city:string,country:string>|NULL   |
+-------------+----------------------------------+-------+

+--------+-----------+-----------+----------+------+-------------+
|order_id|is_priority|customer_id| placed_at|amount|shipping_info|
+--------+-----------+-----------+----------+------+-------------+
|       1|       true|        101|2024-12-01| 150.0|         NULL|
|       2|      false|        102|2024-12-02|  75.5|         NULL|
+--------+-----------+-----------+----------+------+-------------+

