## Funda

A **Delta table** is a data storage format provided by **Delta Lake**—an open-source storage layer that brings ACID transactions and other data management features to Apache Spark and big data workloads.


## Import libraries

In [1]:
from delta import configure_spark_with_delta_pip, DeltaTable

from pyspark.sql import SparkSession

## Create SparkSession Object

In [3]:
builder = (SparkSession.builder
           .appName("create-delta-table")
           .master("spark://spark-master:7077")
           .config("spark.executor.memory", "512m")
           .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
           .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
          )

spark = configure_spark_with_delta_pip(builder).getOrCreate()

spark.sparkContext.setLogLevel("ERROR")

## Create a Delta Table

%%sparksql

**Jupyter Cell Magic**:
Jupyter supports special commands called “cell magics” (denoted by %%) that change the way the cell’s content is processed. For example, %%bash lets you run shell commands, and %%python reaffirms that the cell contains Python code.

In [7]:

sql_query = """ 
CREATE OR REPLACE TABLE default.netflix_titles(
    show_id STRING,
    type STRING,
    title STRING,
    director STRING,
    cast STRING,
    country STRING,
    date_added STRING,
    release_year STRING,
    rating STRING,
    duration STRING,
    listed_in STRING,
    description STRING
) USING DELTA LOCATION "/opt/workspace/data/delta_lake/netflix_titles" 
"""

spark.sql(sql_query)

                                                                                

DataFrame[]

## Reading the data

In [8]:
df = (spark.read.format("csv")
      .option("header", "true")
      .load("../../data/netflix_titles.csv"))


## Write data to Delta Lake

Once we have the data(in the dataframe), we can write it to Delta Lake using the write method.
Specify the format as "delta" and choose a mode for writing the data.

In [9]:
df.write.format("delta").mode("overwrite").saveAsTable("default.netflix_titles")

                                                                                

In [14]:
sql_query = """
SELECT * FROM default.netflix_titles LIMIT 5;
"""

result_df = spark.sql(sql_query)
result_df.show()

+-------+-------+--------------------+---------------+--------------------+-------------+------------------+------------+------+---------+--------------------+--------------------+
|show_id|   type|               title|       director|                cast|      country|        date_added|release_year|rating| duration|           listed_in|         description|
+-------+-------+--------------------+---------------+--------------------+-------------+------------------+------------+------+---------+--------------------+--------------------+
|     s1|  Movie|Dick Johnson Is Dead|Kirsten Johnson|                null|United States|September 25, 2021|        2020| PG-13|   90 min|       Documentaries|As her father nea...|
|     s2|TV Show|       Blood & Water|           null|Ama Qamata, Khosi...| South Africa|September 24, 2021|        2021| TV-MA|2 Seasons|International TV ...|After crossing pa...|
|     s3|TV Show|           Ganglands|Julien Leclercq|Sami Bouajila, Tr...|         null|Septem