# First Delta Tables

In [None]:
from utils.spark import get_spark

spark = get_spark()

In [None]:
!rm -rf /data/delta-table

## First Delta Table

Create a simple dataframe and save it as usual, but instead of "parquet" in "delta" format.

In [None]:
data = spark.range(0, 5)
data.write.format("delta").save("/data/delta-table")

The folder contains some parquet files plus a delta-log:

In [None]:
!ls /data/delta-table

In [None]:
!ls /data/delta-table/_delta_log

In [None]:
!cat /data/delta-table/_delta_log/00000000000000000000.json

The table can now be loaded to a Spark dataframe as usual, just choose "delta" instead of "parquet".

In [None]:
df = spark.read.format("delta").load("/data/delta-table")
df.show()

New lines can be appended, this is also not new but the same as in parquet files:

In [None]:
new_data = spark.range(5,10)
new_data.write.format("delta").mode("append").save("/data/delta-table")

In [None]:
df = spark.read.format("delta").load("/data/delta-table")
df.show()

We get a new entry in the delta-log for this modification:

In [None]:
!ls /data/delta-table/_delta_log

In [None]:
!cat /data/delta-table/_delta_log/00000000000000000001.json

## Using the Delta API

To do delta-specific things, let's use the delta-API.
Be aware that the resulting object is not a dataframe, but a DeltaTable object. 
If you want to do anything with the data, you can still convert it to a dataframe easily.

In [None]:
from delta.tables import DeltaTable

delta_df = DeltaTable.forPath(spark, "/data/delta-table")

In [None]:
delta_df.toDF().show()

We can now delete rows from the delta table using the Delta API.
This is different to the workflow that we had in Spark with parquet tables.
There we would modify the dataframe and then overwrite the parquet file.
This implies that the old data is gone!
Here, the delta log notes which parts of the delta-table were deleted. 
The data is still available but delta will not show them any more (except you time travel, but more about this later).

In [None]:
delta_df.delete("id<=5")

In [None]:
delta_df.toDF().show()

In [None]:
!cat /data/delta-table/_delta_log/00000000000000000002.json

In [None]:
spark.read.parquet("/data/delta-table/part-00002-e20c4fdb-197e-4296-9c83-9b8aa05459e4-c000.snappy.parquet").show()

We can also update rows according to conditions:

In [None]:
delta_df.update(condition = "id = 8", set = { "id": "888" })

In [None]:
delta_df.toDF().show()

In [None]:
!cat /data/delta-table/_delta_log/00000000000000000003.json

In [None]:
spark.read.parquet("/data/delta-table/part-00004-e42f66d2-9083-46b0-be7a-17066de0443d-c000.snappy.parquet").show()

In [None]:
spark.read.parquet("/data/delta-table/part-00000-a697701f-b8f9-4f76-a8b6-d4f8b8b240c3-c000.snappy.parquet").show()