-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Reset Pipelines

In this notebook, code is provided to remove all existing databases, data, and tables. 

Code is then provided to redeclare each table used in the architecture.

This notebook should be run prior to scheduling jobs.

In [0]:
%run ../../Includes/Classroom-Setup-8.4.1

We'll be using the **`bronze_dev`** table, which is a clone that already contains all of our daily data.

In [0]:
%sql
SHOW TABLES

Code to declare all the other tables in our pipelines are provided below.

In [0]:
%sql
CREATE TABLE IF NOT EXISTS heart_rate_silver
(device_id LONG, time TIMESTAMP, heartrate DOUBLE, bpm_check STRING)
USING DELTA
LOCATION '${da.paths.user_db}/heart_rate_silver'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS workouts_silver
(user_id INT, workout_id INT, time TIMESTAMP, action STRING, session_id INT)
USING DELTA
LOCATION '${da.paths.user_db}/workouts_silver'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS users
(alt_id STRING, dob DATE, sex STRING, gender STRING, first_name STRING, last_name STRING, street_address STRING, city STRING, state STRING, zip INT, updated TIMESTAMP)
USING DELTA
LOCATION '${da.paths.user_db}/users'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS completed_workouts
(user_id INT, workout_id INT, session_id INT, start_time TIMESTAMP, end_time TIMESTAMP, in_progress BOOLEAN)
USING DELTA
LOCATION '${da.paths.user_db}/completed_workouts'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS workout_bpm
(user_id INT, workout_id INT, session_id INT, time TIMESTAMP, heartrate DOUBLE)
USING DELTA
LOCATION '${da.paths.user_db}/workout_bpm'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS user_bins
(user_id BIGINT, age STRING, gender STRING, city STRING, state STRING)
USING DELTA
LOCATION '${da.paths.user_db}/user_bins'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS registered_users
(device_id long, mac_address string, registration_timestamp double, user_id long)
USING DELTA 
LOCATION '${da.paths.user_db}/registered_users'

In [0]:
%sql
CREATE TABLE IF NOT EXISTS workout_bpm_summary
(workout_id INT, session_id INT, user_id BIGINT, age STRING, gender STRING, city STRING, state STRING, min_bpm DOUBLE, avg_bpm DOUBLE, max_bpm DOUBLE, num_recordings BIGINT)
USING DELTA 
LOCATION '${da.paths.user_db}/workout_bpm_summary'

For this demo, we're only focused on processing those data coming through our multiplex bronze table, so we'll bypass the incremental loading for the **`gym_mac_logs`** and **`user_lookup`** tables and recreate the final results with a direct read of all files.

In [0]:
DA.create_gym_mac_logs()
DA.create_user_lookup()

In [0]:
%sql
CREATE VIEW IF NOT EXISTS gym_user_stats AS (
SELECT gym, mac_address, date, workouts, 
       (last_timestamp - first_timestamp)/60 AS minutes_in_gym, 
       (to_unix_timestamp(end_workout) - to_unix_timestamp(start_workout))/60 AS minutes_exercising
FROM gym_mac_logs c
INNER JOIN (
  SELECT b.mac_address, 
         to_date(start_time) AS date, 
         collect_set(workout_id) AS workouts, 
         min(start_time) AS start_workout, 
         max(end_time) AS end_workout
  FROM completed_workouts a
  INNER JOIN user_lookup b
  ON a.user_id = b.user_id
  GROUP BY mac_address, to_date(start_time)
) d
ON c.mac = d.mac_address AND 
   to_date(CAST(c.first_timestamp AS timestamp)) = d.date)

In [0]:
%sql
SHOW TABLES

# Load Sample Data
Note that the following cell will run a scrpt that continues to execute for approximately 30 minutes.

This will land batches of data to the location that our **`bronze`** table is currently configured to load data from.

This will provide an idea of how data is flowing through the piplines.

In [0]:
dbutils.widgets.text("batch_delay", "5", "Batch Delay")
batch_delay = int(dbutils.widgets.get("batch_delay"))

In [0]:
DA.bronze_data_stream.load(from_batch=0, batch_delay=batch_delay)

Unlike other lessons, we will **NOT** be be executing our **`DA.cleanup()`** command<br/>
as we want these assets to persist through all the notebooks in this demo.

-sandbox
&copy; 2022 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>