# Delta Live Tables

![bookstore dataset schema](../Includes/images/image1.png)

In this SQL notebook, Delta Live Tables (DLT) are declared. They together implement a simple multi-hop architecture.

In [0]:
SET datasets.path=dbfs:/mnt/demo-datasets/bookstore;

## Bronze Layer `Tables`

Two DLT will be declared implementing the **bronze layer**. They represent the data in its rawest from.

### orders_raw Table

DLT tables will always be preceded by the `LIVE` keyword.

The table `orders_raw` ingests Parquet data incrementally by Auto Loader from the dataset directory. Incremental processing via Auto Loader requires the addition of `STREAMING` keywork in the declaration.

The `cloudFiles` method enables Auto Loader to be used natively with SQL. This method takes three paramaters:
* Data file source location ("${datasets.path}/orders-raw")
* Source data format ("parquet")
* Array of Reader options (schema of data)

The `COMMENT` will be visible to anyone exploring the data catalog.

In [0]:
CREATE OR REFRESH STREAMING LIVE TABLE orders_raw
COMMENT "The raw books orders, ingested from orders-raw"
AS SELECT * FROM cloud_files("${datasets.path}/orders-json-raw", "json",
                             map("cloudFiles.inferColumnTypes", "true"))

Running a DLT query from a notebook only validates that it is syntactically valid. To define and populate this table, a DLT pipeline must be created.

### customers Table

This table presents JSON customer data. This table is used below in a `JOIN` operation to look up customer information.

In [0]:
CREATE OR REFRESH LIVE TABLE customers
COMMENT "The customers lookup table, ingested from customers-json"
AS SELECT * FROM json.`${datasets.path}/customers-json`

## Silver Layer Tables

This layer represents a refined copy of data from the bronze layer. At this level, operations such as data cleansing and enrichment are applied.

### orders_cleaned Table

The silver table `orders_cleaned` enriches the order's data with customer information.

Quality control is implemented using `CONSTRAINT` keywords by rejecting records with no `order_id`. This `CONSTRAINT` enables DLT to collect metrics on constraint violations. It provides an optional `ON VIOLATION` clause specifying an action to take on records that violate the constraints.

The three modes currently supported by Delta:
- `DROP ROW`: discard records that violate constraints
- `FAIL UPDATE` violated contraints causes the pipeline to fail
- Omitted: records violatig constraints will be kept, and reported in metrics

The `LIVE` prefix is needed to refer to other DLT tables.

For streaming DLT tables, the `STREM` method has to be used.

In [0]:
CREATE OR REFRESH STREAMING LIVE TABLE orders_cleaned (
  CONSTRAINT valid_order_number EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW
)
COMMENT "The cleaned books orders with valid order_id"
AS
  SELECT order_id, quantity, o.customer_id, c.profile:first_name as f_name, c.profile:last_name as l_name,
         cast(from_unixtime(order_timestamp, 'yyyy-MM-dd HH:mm:ss') AS timestamp) order_timestamp, o.books,
         c.profile:address:country as country
  FROM STREAM(LIVE.orders_raw) o
  LEFT JOIN LIVE.customers c
    ON o.customer_id = c.customer_id

## Gold Layer Tables

The gold tables keep the daily number of books per customer in a specific region (China and France).

### cn_daily_customer_books Table

In [0]:
CREATE OR REFRESH LIVE TABLE cn_daily_customer_books
COMMENT "Daily number of books per customer in China"
AS
  SELECT customer_id, f_name, l_name, date_trunc("DD", order_timestamp) order_date, sum(quantity) books_counts
  FROM LIVE.orders_cleaned
  WHERE country = "China"
  GROUP BY customer_id, f_name, l_name, date_trunc("DD", order_timestamp)

### fr_daily_customer_books Table

In [0]:
CREATE OR REFRESH LIVE TABLE fr_daily_customer_books
COMMENT "Daily number of books per customer in France"
AS
  SELECT customer_id, f_name, l_name, date_trunc("DD", order_timestamp) order_date, sum(quantity) books_counts
  FROM LIVE.orders_cleaned
  WHERE country = "France"
  GROUP BY customer_id, f_name, l_name, date_trunc("DD", order_timestamp)