Apache Iceberg provides versioning and time travel capabilities, allowing users to query data as it existed at a specific point in time. This feature can be extremely useful for debugging, auditing, and historical analysis.

Time travel in Iceberg allows users to access historical snapshots of their table. Snapshots are created whenever a table is modified, such as adding or deleting data, and are assigned unique identifiers. Each snapshot is a consistent and complete view of the table at a given point in time.

In [4]:
from pyspark.sql import SparkSession

# Set the absolute paths to the Iceberg tables and JAR files
iceberg_tables_path = "/Users/france.cama/code/iceberg-practice/iceberg_tables"
iceberg_jars_path = "/Users/france.cama/code/iceberg-practice/jars/iceberg-spark-runtime-3.5_2.13-1.5.0.jar"

# Create a Spark session
spark = SparkSession.builder \
    .appName("Iceberg schema evolution feature") \
    .config("spark.driver.extraJavaOptions", "-Dderby.system.home=" + iceberg_tables_path) \
    .config("spark.jars", iceberg_jars_path) \
    .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog") \
    .config("spark.sql.catalog.spark_catalog.type", "hadoop") \
    .config("spark.sql.catalog.spark_catalog.warehouse", iceberg_tables_path) \
    .getOrCreate()

# show snapshots details
spark.sql("SELECT * FROM default.titanic.history ORDER BY made_current_at DESC;").show()

# time travel to timestamp
df = spark.sql(f"SELECT * FROM default.titanic TIMESTAMP AS OF '2024-04-27T15:13:00.289+00:00';")
df.show(5)

# time travel using snapshot_id
df_id = spark.sql(f"SELECT * FROM default.titanic VERSION AS OF 8059457254979550641;")
df_id.show(5)

+--------------------+-------------------+-------------------+-------------------+
|     made_current_at|        snapshot_id|          parent_id|is_current_ancestor|
+--------------------+-------------------+-------------------+-------------------+
|2024-04-27 15:29:...|6711211621056771738|2707077687595228535|               true|
|2024-04-27 15:27:...|2707077687595228535|8059457254979550641|               true|
|2024-04-27 15:26:...|8059457254979550641|4316376306380043017|               true|
|2024-04-27 15:25:...|4316376306380043017|6945889682975321469|               true|
|2024-04-27 15:25:...|6945889682975321469|2500024997980005591|               true|
|2024-04-27 15:23:...|2500024997980005591|9015970749471366070|               true|
|2024-04-27 15:13:...|9015970749471366070|1036892303015044680|               true|
|2024-04-27 15:11:...|1036892303015044680|3663590870506627609|               true|
|2024-04-27 13:29:...|3663590870506627609| 922663229200831164|               true|
|202