# Creating  Delta Lake table

In Databricks, tables are organized within a hierarchical structure (Data Catalog). By clicking on the Catalog icon, the Data Catalog and the traditional **hive_metastore** are shown. While Unity Catalog is the latest and default Data Governance solution, the **hive_metastore** is used in this Notebook for simplicity, as it requires no aditional setup.

Configuring the notebook to use the hive_metastore catalog:

In [0]:
%sql

USE CATALOG hive_metastore

To create a Delta Lake table, it is needed:
* `CREATE TABLE` statement
* Table name
* Table schema

Delta Lake is the default table format on Databricks, therefore, there is no need to specify the `USING DELTA` format.

In [0]:
%sql

CREATE TABLE employees
  (id INT, name STRING, salary DOUBLE)

Once the table is created, it can be confirmed by open the **hive_metastore**. This table can also be opened on the Catalog Explorer. A new window will open and show:
* Data source (Delta)
* Table schema

Inserting some values on the table. After executing the following cell with 4 statements, there will be 4 data files in the table directory.

In [0]:
%sql
INSERT INTO employees
VALUES
  (1, "Adam", 3500.0),
  (2, "Sarah", 4020.5);

INSERT INTO employees
VALUES
  (3, "John", 2999.3),
  (4, "Thomas", 4000.3);

  INSERT INTO employees
VALUES
  (5, "Anna", 2500.0);

  INSERT INTO employees
VALUES
  (6, "Kim", 6200.3);

Only the result of the last statement will be displayed in the output. For this reason, we only see on record at the output.

Query the table with a `SELECT` statement:

In [0]:
%sql
SELECT * FROM employees

Sometimes it is important to look at metadata. The `DESCRIBE DETAIL` commands will show some information:
* Table location
* Number of current data files
* ...

In [0]:
%sql

DESCRIBE DETAIL employees

Exploring the files using the `%fs` magic command:

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees'

This directory contains:
* Four data files in parquet format
* Delta Log directory

# Update Operation

The salary of those exployeed whose name starts with letter 'A' will be increased by 100$:

In [0]:
%sql
UPDATE employees
SET salary = salary + 100
WHERE NAME like "A%"

Two records (Adam and Anna) were affected by the update operation. When these records were loaded on the table, they were in a different data file

In [0]:
%sql
SELECT * FROM employees

What happened in the table directory?

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees'

Two new data files have been added to the directory. Delta Lake doesn't modify existing files directing when performing updates. **Parquet files are immutable**, once created, they cannot be modified.
Delta Lake generates a new version of the files containing the updated records. Later, the query engine uses the transaction log to indicate which files are valid in the current version of the table.

In [0]:
%sql
DESCRIBE DETAIL employees

The number of files is 4, instead of 6. Two of these files were created as parto of the updating process.

Since the transaction log stores all the changes to the Delta Lake table, the table history can easily be reviewed using the `DESCRIBE HISTORY` command:

In [0]:
%sql
DESCRIBE HISTORY employees

There are 6 versions of the table:
* 0: table is created
* 1-4: insert statements
* 5: table is updated

Thanks to the transaction log, there is a completed record of all operations that are performed on the table. This transaction log is located under the Delta log folder in the table directory.

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees/_delta_log'

Each transaction is a new JSON file written to the Delta Lake transaction log.There are 6 JSON files representing the 6 transactions that have been performed on the table starting from version 0. The rest of the files are just the checksum for the JSON files.

What is inside the last JSON file?
* The `add` elements show the new files that have been written to the table.
* The `remove` tags show the files that have been deleted from the table. This files, therefore, should no longer be included in the table.

In [0]:
%fs head 'dbfs:/user/hive/warehouse/employees/_delta_log/00000000000000000005.json'