# 🧊 Apache Iceberg Time Travel Tutorial
This notebook demonstrates **time travel** features in Apache Iceberg using PySpark SQL, progressing from basic to advanced examples.

✅ Requirements:
- Spark 3.5+ or 4.0
- Iceberg Spark extensions enabled
- Using `spark_catalog` and namespace `local.db`
- Format version 2

## 🔧 1. Spark & Iceberg Setup

In [1]:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("IcebergComplexSchemaEvolution") \
    .config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.local.type", "hadoop") \
    .config("spark.sql.catalog.local.warehouse", "/home/jovyan/iceberg/warehouse") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .getOrCreate()

## 📦 2. Create Table and Initial Load

In [2]:
spark.sql("""
CREATE TABLE IF NOT EXISTS local.db.timeTravelTest (
    id INT,
    name STRING
) 
USING iceberg
TBLPROPERTIES ('format-version'='2')
""")

DataFrame[]

In [3]:
spark.sql("""
INSERT INTO local.db.timeTravelTest VALUES
(1, 'Alpha'),
(2, 'Beta')
""")

DataFrame[]

## ➕ 3. Insert More Data (Version 2)

In [4]:
spark.sql("""
INSERT INTO local.db.timeTravelTest VALUES
(3, 'Gamma')
""")

DataFrame[]

## 📸 4. View Snapshot History

In [5]:
spark.sql("SELECT * FROM local.db.timeTravelTest.snapshots").show(truncate=False)

+-----------------------+-------------------+-------------------+---------+------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|committed_at           |snapshot_id        |parent_id          |operation|manifest_list                                                                                                                 |summary                                                                                                                                                                                                                                                                                        |
+-----------------

## ⏳ 5. Time Travel by Snapshot ID

In [7]:
# Replace with actual snapshot ID
spark.sql("""
SELECT * FROM local.db.timeTravelTest VERSION AS OF 6399747780325267945
""").show()

+---+-----+
| id| name|
+---+-----+
|  1|Alpha|
|  2| Beta|
|  3|Gamma|
+---+-----+



## 🕰️ 6. Time Travel by Timestamp

In [11]:
# Replace with actual timestamp
spark.sql("""
SELECT * FROM local.db.timeTravelTest TIMESTAMP AS OF TIMESTAMP '2025-07-11 18:35:21'
""").show()

+---+-----+
| id| name|
+---+-----+
|  1|Alpha|
|  2| Beta|
+---+-----+



## 🔍 7. Compare Data Between Versions

In [None]:
df_v1 = spark.sql("SELECT * FROM local.db.timeTravelTest VERSION AS OF 123456789")
df_v2 = spark.sql("SELECT * FROM local.db.timeTravelTest VERSION AS OF 987654321")

df_diff = df_v2.subtract(df_v1)
df_diff.show()

## 🔁 8. Rollback to Previous Snapshot

In [None]:
# Use SQL to rollback (can also use spark.sql)
spark.sql("""
ALTER TABLE local.db.timeTravelTest
    SET CURRENT SNAPSHOT = 123456789
""")

## 🧾 9. Show Table History

In [9]:
spark.sql("SELECT * FROM local.db.timeTravelTest.history").show(truncate=False)

+-----------------------+-------------------+-------------------+-------------------+
|made_current_at        |snapshot_id        |parent_id          |is_current_ancestor|
+-----------------------+-------------------+-------------------+-------------------+
|2025-07-11 18:31:09.257|2893334066603151014|NULL               |true               |
|2025-07-11 18:35:21.835|6399747780325267945|2893334066603151014|true               |
+-----------------------+-------------------+-------------------+-------------------+



## 🧹 10. Expire Old Snapshots

In [12]:
spark.sql("""
CALL spark_catalog.system.expire_snapshots(
    'local.db.timeTravelTest',
    TIMESTAMP => TIMESTAMP '2024-07-11 11:00:00'
)
""")

AnalysisException: Cannot use catalog spark_catalog: not a ProcedureCatalog