# Databricks Delta
#### Load a CSV file

In [2]:
file_name = "UsedCars.csv"
PATH = '/mnt/rawdata/{}'.format(file_name)

used_cars = spark.read.option("header", "true").csv(PATH)

In [3]:
display(used_cars)

## Persist in Delta format

In [5]:
used_cars.write.mode("overwrite").format("delta").save("/delta/{}".format(file_name))

In [6]:
display(dbutils.fs.ls("dbfs:/delta/{}".format(file_name)))

## Create table using Delta Lake

In [8]:
spark.sql("""
  DROP TABLE IF EXISTS used_cars
""")

spark.sql("""
  CREATE TABLE used_cars 
  USING DELTA 
  LOCATION '/delta/{}' 
""".format(file_name))

In [9]:
%sql
SELECT * FROM used_cars

###Review Spark Catalog

In [11]:
spark.catalog.listTables()

## Split by partition
Use Fuel Type to split in partitions (performance)

In [13]:
file_name_by_partition = "UsedCars_by_fuel_type"

used_cars.write.mode("overwrite").partitionBy("FuelType").format("delta").save("/delta/{}".format(file_name_by_partition))
spark.sql("CREATE TABLE used_cars_by_fuel_type USING DELTA LOCATION '/delta/{}/'".format(file_name_by_partition))

In [14]:
display(dbutils.fs.ls("dbfs:/delta"))

In [15]:
%sql
SELECT FuelType, COUNT(*) FROM used_cars_by_fuel_type GROUP BY FuelType

In [16]:
%sql
DESCRIBE DETAIL used_cars_by_fuel_type

## Time Travel
Because Delta Lake is version controlled, you have the option to query past versions of the data. Let's look at the history of our current Delta table.

In [18]:
%sql
DESCRIBE HISTORY used_cars

### Insert a new row

In [20]:
%sql
INSERT INTO used_cars
SELECT 5500, 44, 34000, 'Petrol', 110, 1, 1, 1300, 3, 1005

Querying an older version is as easy as adding `VERSION AS OF desired_version`. Let's verify that our table from one version back still exists.

In [22]:
%sql
SELECT count(*)
FROM used_cars
VERSION AS OF 1

In [23]:
%sql
SELECT count(*)
FROM used_cars
VERSION AS OF 0

### Delete Delta

In [25]:
dbutils.fs.rm("/delta/", True)