-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Last Mile ETL with Databricks SQL

Before we continue, let's do a recap of some of the things we've learned so far:
1. The Databricks workspace contains a suite of tools to simplify the data engineering development lifecycle
1. Databricks notebooks allow users to mix SQL with other programming languages to define ETL workloads
1. Delta Lake provides ACID compliant transactions and makes incremental data processing easy in the Lakehouse
1. Delta Live Tables extends the SQL syntax to support many design patterns in the Lakehouse, and simplifies infrastructure deployment
1. Multi-task jobs allows for full task orchestration, adding dependencies while scheduling a mix of notebooks and DLT pipelines
1. Databricks SQL allows users to edit and execute SQL queries, build visualizations, and define dashboards
1. Data Explorer simplifies managing Table ACLs, making Lakehouse data available to SQL analysts (soon to be expanded greatly by Unity Catalog)

In this section, we'll focus on exploring more DBSQL functionality to support production workloads. 

We'll start by focusing on leveraging Databricks SQL to configure queries that support last mile ETL for analytics. Note that while we'll be using the Databricks SQL UI for this demo, SQL Endpoints <a href="https://docs.databricks.com/integrations/partners.html" target="_blank">integrate with a number of other tools to allow external query execution</a>, as well as having <a href="https://docs.databricks.com/sql/api/index.html" target="_blank">full API support for executing arbitrary queries programmatically</a>.

From these query results, we'll generate a series of visualizations, which we'll combine into a dashboard.

Finally, we'll walk through scheduling updates for queries and dashboards, and demonstrate setting alerts to help monitor the state of production datasets over time.

## Learning Objectives
By the end of this lesson, you should be able to:
* Use Databricks SQL as a tool to support production ETL tasks backing analytic workloads
* Configure SQL queries and visualizations with the Databricks SQL Editor
* Create dashboards in Databricks SQL
* Schedule updates for queries and dashboards
* Set alerts for SQL queries

## Run Setup Script
The following cells runs a notebook that defines a class we'll use to generate SQL queries.

In [0]:
%run ../Includes/Classroom-Setup-12.1

## Create a Demo Database
Execute the following cell and copy the results into the Databricks SQL Editor.

These queries:
* Create a new database
* Declare two tables (we'll use these for loading data)
* Declare two functions (we'll use these for generating data)

Once copied, execute the query using the **Run** button.

In [0]:
DA.generate_config()

**NOTE**: The queries above are only designed to be run once after resetting the demo completely to reconfigure the environment. Users will need to have **`CREATE`** and **`USAGE`** permissions on the catalog to execute them.

<img src="https://files.training.databricks.com/images/icon_warn_32.png"> 
**WARNING:** Make sure to select your database before proceeding as the **`USE`** statement<br/>doesn't yet change the database against which your queries will execute

## Create a Query to Load Data
Steps:
1. Execute the cell below to print out a formatted SQL query for loading data in the **`user_ping`** table created in the previous step.
1. Save this query with the name **Load Ping Data**.
1. Run this query to load a batch of data.

In [0]:
DA.generate_load()

Executing the query should load some data and return a preview of the data in the table.

**NOTE**: Random numbers are being used to define and load data, so each user will have slightly different values present.

## Set a Query Refresh Schedule

Steps:
1. Locate the **Refresh Schedule** field at the bottom right of the SQL query editor box; click the blue **Never**
1. Use the drop down to change to Refresh every **1 minute**
1. For **Ends**, click the **On** radio button
1. Select tomorrow's date
1. Click **OK**

## Create a Query to Track Total Records
Steps:
1. Execute the cell below.
1. Save this query with the name **User Counts**.
1. Run the query to calculate the current results.

In [0]:
DA.generate_user_counts()

## Create a Bar Graph Visualization

Steps:
1. Click the **Add Visualization** button, located beneath the Refresh Schedule button in the bottom right-hand corner of the query window
1. Click on the name (should default to something like **`Visualization 1`**) and change the name to **Total User Records**
1. Set **`user_id`** for the **X Column**
1. Set **`total_records`** for the **Y Columns**
1. Click **Save**

## Create a New Dashboard

Steps:
1. Click the button with three vertical dots at the bottom of the screen and select **Add to Dashboard**.
1. Click the **Create new dashboard** option
1. Name your dashboard <strong>User Ping Summary **`<your_initials_here>`**</strong>
1. Click **Save** to create the new dashboard
1. Your newly created dashboard should now be selected as the target; click **OK** to add your visualization

## Create a Query to Calculate the Recent Average Ping
Steps:
1. Execute the cell below to print out the formatted SQL query.
1. Save this query with the name **Avg Ping**.
1. Run the query to calculate the current results.

In [0]:
DA.generate_avg_ping()

## Add a Line Plot Visualization to your Dashboard

Steps:
1. Click the **Add Visualization** button
1. Click on the name (should default to something like **`Visualization 1`**) and change the name to **Avg User Ping**
1. Select **`Line`** for the **Visualization Type**
1. Set **`end_time`** for the **X Column**
1. Set **`avg_ping`** for the **Y Columns**
1. Set **`user_id`** for the **Group by**
1. Click **Save**
1. Click the button with three vertical dots at the bottom of the screen and select **Add to Dashboard**.
1. Select the dashboard you created earlier
1. Click **OK** to add your visualization

## Create a Query to Report Summary Statistics
Steps:
1. Execute the cell below.
1. Save this query with the name **Ping Summary**.
1. Run the query to calculate the current results.

In [0]:
DA.generate_summary()

## Add the Summary Table to your Dashboard

Steps:
1. Click the button with three vertical dots at the bottom of the screen and select **Add to Dashboard**.
1. Select the dashboard you created earlier
1. Click **OK** to add your visualization

## Review and Refresh your Dashboard

Steps:
1. Use the left side bar to navigate to **Dashboards**
1. Find the dashboard you've added your queries to
1. Click the blue **Refresh** button to update your dashboard
1. Click the **Schedule** button to review dashboard scheduling options
  * Note that scheduling a dashboard to update will execute all queries associated with that dashboard
  * Do not schedule the dashboard at this time

## Share your Dashboard

Steps:
1. Click the blue **Share** button
1. Select **All Users** from the top field
1. Choose **Can Run** from the right field
1. Click **Add**
1. Change the **Credentials** to **Run as viewer**

**NOTE**: At present, no other users should have any permissions to run your dashboard, as they have not been granted permissions to the underlying databases and tables using Table ACLs. If you wish other users to be able to trigger updates to your dashboard, you will either need to grant them permissions to **Run as owner** or add permissions for the tables referenced in your queries.

## Set Up an Alert

Steps:
1. Use the left side bar to navigate to **Alerts**
1. Click **Create Alert** in the top right
1. Click the field at the top left of the screen to give the alert a name **`<your_initials> Count Check`**
1. Select your **User Counts** query
1. For the **Trigger when** options, configure:
  * **Value column**: **`total_records`**
  * **Condition**: **`>`**
  * **Threshold**: **`15`**
1. For **Refresh**, select **Never**
1. Click **Create Alert**
1. On the next screen, click the blue **Refresh** in the top right to evaluate the alert

## Review Alert Destination Options



Steps:
1. From the preview of your alert, click the blue **Add** button to the right of **Destinations** on the right side of the screen
1. At the bottom of the window that pops up, locate the and click the blue text in the message **Create new destinations in Alert Destinations**
1. Review the available alerting options

In [0]:
DA.cleanup()

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>