# Use Case:
Delta Lake is one of the most popular file formats in Azure Data Lakes. It allows you to apply ACID transactions to your fils on the lake and perform operations as update, delete and merge. It also provides time travel capabilities to look at historical data.
This sample Notebook shows you how to create, update, and query a Delta Lake table.
Documentation: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-delta-lake-overview?pivots=programming-language-python#sql-support

## Setup
First we create the Delta Table. We will use a copy of the Fact Call Center Parquet file and include a couple of partition columns

In [None]:
# Variables used on this example
synapse_data_lake = "REPLACE_DATALAKE_NAME"
source_path = "abfss://data@" + synapse_data_lake + ".dfs.core.windows.net/Sample/AdventureWorks/FactCallCenter/"
target_path = "abfss://data@" + synapse_data_lake + ".dfs.core.windows.net/Sample/AdventureWorks/FactCallCenter_Delta/"

In [None]:
# First we will create a copy of the sample data
# and we will convert the copy from Parquet to Delta Lake Format
# adding Partition columns in the process
from delta.tables import *
df = spark.read.format('parquet').load(source_path)
df.write.partitionBy('WageType', 'Shift').format('delta').save(target_path)
# Register the FactCallCenter_Delta table.
spark.sql("CREATE DATABASE DeltaTest")
spark.sql("CREATE TABLE DeltaTest.FactCallCenter_Delta USING DELTA LOCATION '" + target_path + "'")

## Query the Delta Table
Now you can use Spark or SQL to query the data stored in your delta table. Notice you don't have to do anything special to work with the partition folders as both Synapse Spark and Synapse SQL Serverless Pools understand Delta lake partitions.

In [None]:
# Now you can query the new Delta Table using PySpark
spark.sql("SELECT WageType, COUNT(*) FROM DeltaTest.FactCallCenter_Delta GROUP BY WageType").show()

In [None]:
%%sql
-- You can also run queries using Spark SQL
SELECT WageType, COUNT(*) 
FROM DeltaTest.FactCallCenter_Delta 
GROUP BY WageType 

### Using Synapse SQL Serverless Pool and T-SQL
Now you can open a new SQL Script Windows and use Serverless to query the newly created Delta Table.
Synapse takes care of "sharing" the Spark table definition with the Synapse SQL Serverless engine.
Notice the use of the **dbo** schema
```sql
SELECT WageType, COUNT(*) 
FROM DeltaTest.dbo.FactCallCenter_Delta 
GROUP BY WageType 
```

## Update Data
This example show you how to perform append, update, and delete operations on your delta lake and how to use time travel to look at previous versions of your table.
On this example, We will limit the value stored in the calls column to 500 ( update all records with a calls value > 500 )

In [None]:
%%sql
-- First we check the number of records we will update:
SELECT COUNT(*)
FROM DeltaTest.FactCallCenter_Delta 
WHERE calls > 500 

In [None]:
%%sql
-- Now we update the records
UPDATE DeltaTest.FactCallCenter_Delta 
    SET calls = 500
WHERE calls > 500 

In [None]:
%%sql
-- If we check again, we shouldn't have any records with calls > 500
SELECT COUNT(*)
FROM DeltaTest.FactCallCenter_Delta 
WHERE calls > 500 

In [None]:
%%sql
-- We also want to delete all the records where TotalOperators is equal to 12
-- First, we check how many records we will delete
SELECT COUNT(*) FROM DeltaTest.FactCallCenter_Delta 
WHERE TotalOperators == 12

In [None]:
%%sql
-- Now we proceed to delete the records where TotalOperators is 12 
DELETE FROM DeltaTest.FactCallCenter_Delta 
WHERE TotalOperators == 12

In [None]:
%%sql
-- If we check again, there shouldn't be any records with that condition
SELECT COUNT(*) FROM DeltaTest.FactCallCenter_Delta 
WHERE TotalOperators == 12

## Time Travel Operations
Let's look at how we can see the different versions of our table before and after the Update and Delete operations

In [None]:
%%sql
-- We should be able to see the 3 versions of the table:
-- Version 0 represents the creation of the Delta Table, 
-- Version 1 represents the table after the update of the calls columns
-- Version 2 represents the table after records were deleted
DESCRIBE HISTORY DeltaTest.FactCallCenter_Delta

In [None]:
# Now we use time travel to get table counts before and after the Delete operation
df = spark.read.format("delta").option("versionAsOf", 1).load(target_path)
before_delete = df.count()
df = spark.read.format("delta").option("versionAsOf", 2).load(target_path)
after_delete = df.count()
print(f'Records before delete:{before_delete}')
print(f'Records after delete:{after_delete}')

# Clean Up
We will now proceed to delete the delta table and the folder created

In [None]:
from notebookutils import mssparkutils
# Delete Delta Table definitions
spark.sql("DROP TABLE DeltaTest.FactCallCenter_Delta")
spark.sql("DROP DATABASE DeltaTest")
# Delete Delta Folder
mssparkutils.fs.rm(target_path, recurse=True)