-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Lab: Migrating SQL Notebooks to Delta Live Tables

This notebook describes the overall structure for the lab exercise, configures the environment for the lab, provides simulated data streaming, and performs cleanup once you are done. A notebook like this is not typically needed in a production pipeline scenario.

## Learning Objectives
By the end of this lab, you should be able to:
* Convert existing data pipelines to Delta Live Tables

## Datasets Used

This demo uses simplified artificially generated medical data. The schema of our two datasets is represented below. Note that we will be manipulating these schema during various steps.

#### Recordings
The main dataset uses heart rate recordings from medical devices delivered in the JSON format. 

| Field | Type |
| --- | --- |
| device_id | int |
| mrn | long |
| time | double |
| heartrate | double |

#### PII
These data will later be joined with a static table of patient information stored in an external system to identify patients by name.

| Field | Type |
| --- | --- |
| mrn | long |
| name | string |

## Getting Started

Begin by running the following cell to configure the lab environment.

In [0]:
%run ../../Includes/Classroom-Setup-8.2.1L

## Land Initial Data
Seed the landing zone with some data before proceeding. You will re-run this command to land additional data later.

In [0]:
DA.data_factory.load()

Execute the following cell to print out values that will be used during the following configuration steps.

In [0]:
DA.print_pipeline_config()    

## Create and Configure a Pipeline

1. Click the **Jobs** button on the sidebar, then select the **Delta Live Tables** tab.
1. Click **Create Pipeline**.
1. Leave **Product Edition** as **Advanced**.
1. Fill in a **Pipeline Name** - because these names must be unique, we suggest using the **Pipline Name** provided in the cell above.
1. For **Notebook Libraries**, use the navigator to locate and select the notebook **`DE 8.2.2L - Migrating a SQL Pipeline to DLT Lab`**.
1. Configure the Source
    * Click **`Add configuration`**
    * Enter the word **`source`** in the **Key** field
    * Enter the **Source** value specified above to the **`Value`** field
1. Enter the database name printed next to **`Target`** below in the **Target** field.
1. Enter the location printed next to **`Storage Location`** below in the **Storage Location** field.
1. Set **Pipeline Mode** to **Triggered**.
1. Disable autoscaling.
1. Set the number of **`workers`** to **`1`** (one).
1. Click **Create**.

## Open and Complete DLT Pipeline Notebook

You will perform your work in the companion notebook [DE 8.2.2L - Migrating a SQL Pipeline to DLT Lab]($./DE 8.2.2L - Migrating a SQL Pipeline to DLT Lab),<br/>
which you will ultimately deploy as a pipeline.

Open the Notebook and, following the guidelines provided therein, fill in the cells where prompted to<br/>
implement a multi-hop architecture similar to the one we worked with in the previous section.

## Run your Pipeline

Select **Development** mode, which accelerates the development lifecycle by reusing the same cluster across runs.<br/>
It will also turn off automatic retries when jobs fail.

Click **Start** to begin the first update to your table.

Delta Live Tables will automatically deploy all the necessary infrastructure and resolve the dependencies between all datasets.

**NOTE**: The first table update may take several minutes as relationships are resolved and infrastructure deploys.

## Troubleshooting Code in Development Mode

Don't despair if your pipeline fails the first time. Delta Live Tables is in active development, and error messages are improving all the time.

Because relationships between tables are mapped as a DAG, error messages will often indicate that a dataset isn't found.

Let's consider our DAG below:

<img src="https://files.training.databricks.com/images/dlt-dag.png">

If the error message **`Dataset not found: 'recordings_parsed'`** is raised, there may be several culprits:
1. The logic defining **`recordings_parsed`** is invalid
1. There is an error reading from **`recordings_bronze`**
1. A typo exists in either **`recordings_parsed`** or **`recordings_bronze`**

The safest way to identify the culprit is to iteratively add table/view definitions back into your DAG starting from your initial ingestion tables. You can simply comment out later table/view definitions and uncomment these between runs.

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>