In [3]:
from pyspark.sql import SparkSession

# Set the absolute paths to the Iceberg tables and JAR files
iceberg_tables_path = "/Users/france.cama/code/iceberg-practice/iceberg_tables"
iceberg_jars_path = "/Users/france.cama/code/iceberg-practice/jars/iceberg-spark-runtime-3.5_2.13-1.5.0.jar"

# Create a Spark session
spark = SparkSession.builder \
    .appName("Iceberg time travel feature") \
    .config("spark.driver.extraJavaOptions", "-Dderby.system.home=" + iceberg_tables_path) \
    .config("spark.jars", iceberg_jars_path) \
    .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog") \
    .config("spark.sql.catalog.spark_catalog.type", "hadoop") \
    .config("spark.sql.catalog.spark_catalog.warehouse", iceberg_tables_path) \
    .getOrCreate()

df = spark.sql("SELECT * FROM default.titanic;").show(4)

# show snapshots details
spark.sql("SELECT * FROM default.titanic.history ORDER BY made_current_at DESC;").show()

# modify partitioning schema
df.updateSpec()
    .addField(bucket("Age", 5))
    .commit();

+----+-----+--------+-------+--------------------+-----+-----------+------+------+-----+--------+----------------+-----+-----------+----------+---------------+
| Age|Cabin|Embarked|   Fare|                Name|Parch|PassengerId|Pclass|   Sex|SibSp|Survived|          Ticket|Title|Family_Size|new_column|choose_a_column|
+----+-----+--------+-------+--------------------+-----+-----------+------+------+-----+--------+----------------+-----+-----------+----------+---------------+
|22.0| NULL|       S|   7.25|Braund, Mr. Owen ...|    0|          1|     3|  male|    1|     0.0|       A/5 21171|   Mr|          1|      NULL|           NULL|
|38.0|  C85|       C|71.2833|Cumings, Mrs. Joh...|    0|          2|     1|female|    1|     1.0|        PC 17599|  Mrs|          1|      NULL|           NULL|
|26.0| NULL|       S|  7.925|Heikkinen, Miss. ...|    0|          3|     3|female|    0|     1.0|STON/O2. 3101282| Miss|          0|      NULL|           NULL|
|35.0| C123|       S|   53.1|Futrelle, M

ParseException: 
[PARSE_SYNTAX_ERROR] Syntax error at or near 'FIELD': missing '('.(line 1, pos 42)

== SQL ==
ALTER TABLE default.titanic ADD PARTITION FIELD bucket(Age);
------------------------------------------^^^


Iceberg table partitioning logic can be updated in an existing table because queries do not reference partition values directly.
When you evolve a partition spec, the old data written with an earlier spec remains unchanged (are not physically rewritten on the disk). New data is written using the new spec in a new layout. Metadata for each of the partition versions is kept separately. Because of this, when you start writing queries, you get split planning: 

<img src="./images/partition-spec-evolution.png" style="max-width: 50%;"></img>
The data for 2008 is partitioned by month. Starting from 2009 the table is updated so that the data is instead partitioned by day. Both partitioning layouts are able to coexist in the same table.

#### Iceberg vs Hive table format
- When making a query using iceberg you don't have to know which is the partitioning layout of a table, thanks to the hidden partitioning feature.
- iceberg partition layouts can evolve as needed.
- iceberg handles partition logic values on its own, in Hive you have to write the query correctly and working queries are tied to the table's partitioning scheme, so partitioning configuration cannot be changed without breaking queries.

Most importantly, queries no longer depend on a table's physical layout. With a separation between physical and logical, Iceberg tables can evolve partition schemes over time as data volume changes. Iceberg does not require costly distractions, like rewriting table data or migrating to a new table.
Partitioning schema is contained in the metadata file.

#### Sort order evolution
Similar to partition spec, Iceberg sort order can also be updated in an existing table. When you evolve a sort order, the old data written with an earlier order remains unchanged. Engines can always choose to write data in the latest sort order or unsorted when sorting is prohibitively expensive.