
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Workflows Lab

In this lab, you'll be configuring a multi-task job comprising of three notebooks.

## Learning Objectives
By the end of this lab, you should be able to:
* Schedule a notebook as a task in a Databricks Job
* Configure linear dependencies between tasks using the Databricks Workflows UI

## A. Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to **dbacademy** and the schema to your specific schema name shown below using the `USE` statements.
<br></br>
```
USE CATALOG dbacademy;
USE SCHEMA dbacademy.<your unique schema name>;
```

**NOTE:** The **DA** object is only used in Databricks Academy courses and is not available outside of these courses.

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

In [0]:
%run ./Includes/Classroom-Setup-2L

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


0,1
Course Catalog:,
Your Schema:,


## B. Generate Job Configuration
1. Run the cell below to print out the values you'll use to configure your pipeline in subsequent steps. Make sure to specify the correct job name and notebooks.

In [0]:
DA.print_job_config(
    job_name_extension='Lesson_02',
    notebook_paths='/Task Notebooks/Lesson 2 Notebooks',
    notebooks=[
        '2.01 - Ingest CSV',
        '2.02 - Create Invalid Region Table',
        '2.02 - Create Valid Region Table'
    ],
    job_tasks={
        'Ingest_CSV': [],
        'Create_Invalid_Region_Table': ['Ingest_CSV'],
        'Create_Valid_Region_Table': ['Ingest_CSV']
    },
    check_task_dependencies = True
)

0,1
Job Name:,
Notebook #1:,
Notebook #2:,
Notebook #3:,


## C. Configure a Job With Multiple Tasks
The job will complete three simple tasks:

- (Notebook #1) Ingest a CSV file and create the **customers_bronze** table in your schema.
- (Notebook #2) Create a table called **customers_invalid_region** in your schema.
- (Notebook #3) Create a table called **customers_valid_region** in your schema.

### C1. Add a Single Notebook Task

Let's start by scheduling the first notebook [2.01 - Ingest CSV]($./Task Notebooks/Lesson 2 Notebooks/2.01 - Ingest CSV) notebook. Click the hotlink in previous sentence to to review the code.

The notebook creates a table named **customers_bronze** in your schema from the CSV file in the volume */Volumes/dbacademy_retail/v01/source_files/customers.csv*. 

1. Right click on the **Workflows** button on the sidebar and select *Open Link in New Tab*. 

2. In **Workflows** select the **Jobs** tab, and then click the **Create Job** button.

3. In the top-left of the screen, enter the **Job Name** provided above to add a name for the job (must use the job name specified above).

4. Configure the task as specified below. You'll need the values provided in the cell output above for this step.


| Setting | Instructions |
|--|--|
| Task name | Enter **Ingest_CSV** |
| Type | Choose **Notebook** |
| Source | Choose **Workspace** |
| Path | Use the navigator to specify the **Notebook #1** path provided above (notebook **Task Notebooks/Lesson 2 Notebooks/2.01 - Ingest CSV**) |
| Compute | From the dropdown menu, select a **Serverless** cluster (We will be using Serverless clusters for jobs in this course. You can also specify a different cluster if required outside of this course) |

**NOTE**: When selecting your all-purpose cluster, you may get a warning about how this will be billed as all-purpose compute. Production jobs should always be scheduled against new job clusters appropriately sized for the workload, as this is billed at a much lower rate.
<br>

![Lesson02_Lab_OneTask](files/images/deploy-workloads-with-databricks-workflows-2.0.1/Lesson02_Lab_OneTask.png)

4. Click the **Create task** button.

5. Click the blue **Run now** button in the top right to start the job.

6. Select the **Runs** tab in the navigation bar and verify that the job completes successfully.

![Lesson02_Lab_OneTaskSuccess](files/images/deploy-workloads-with-databricks-workflows-2.0.1/Lesson02_Lab_OneTaskSuccess.png)

7. From **Catalog**, navigate to your schema in the **dbacademy** catalog and confirm the table **customers_bronze** was created (you might have refresh your schema).

### C2. Add the Second Task to the Job

Now, configure a second task that depends on the first task, **Ingest_CSV** successfully completing. The second task will be the notebook [2.02 - Create Invalid Table]($./Task Notebooks/Lesson 2 Notebooks/2.02 - Create Invalid Region Table). Open the notebook and review the code.

The notebook creates a table named **customers_invalid_region** in your schema from the **customers_bronze** table created from the previous task.

Steps:
1. Go back to your job. On the Job details page, click the **Tasks** tab.

2. Click the blue **+ Add task** button at the center bottom of the screen and select **Notebook** in the dropdown menu.

3. Configure the task:

| Setting | Instructions |
|--|--|
| Task name | Enter **Create_Invalid_Region_Table** |
| Type | Choose **Notebook** |
| Source | Choose **Workspace** |
| Path | Use the navigator to specify the **Notebook #2** path provided above (notebook **Task Notebooks/Lesson 2 Notebooks/2.02 - Create Invalid Region Table**) |
| Compute | From the dropdown menu, select a **Serverless** cluster (We will be using Serverless clusters for jobs in this course. You can also specify a different cluster if required outside of this course) |
| Depends on | Verify **Ingest_CSV** (the previous task we defined) is listed |

<br>

4. Click the blue **Create task** button

<br></br>

![Lesson02_Lab_TwoTasks](files/images/deploy-workloads-with-databricks-workflows-2.0.1/Lesson02_Lab_TwoTasks.png)

### C3. Add the Third Task to the Job

Now, configure a third task that depends on the **Ingest_CSV** successfully completing. The third task will be the notebook [2.03 - Create Valid Table]($./Task Notebooks/Lesson 2 Notebooks/2.02 - Create Valid Region Table). 

The notebook creates a table named **customers_valid_region** in your schema from the **customers_bronze** table created from the first task.

Steps:
1. On the Job details page, confirm you are on the **Tasks** tab.

2. Click on the **Ingest_CSV** tasks.

3. Click the blue **+ Add task** button at the center bottom of the screen and select **Notebook** in the dropdown menu.

4. Configure the task:

| Setting | Instructions |
|--|--|
| Task name | Enter **Create_Valid_Region_Table** |
| Type | Choose **Notebook** |
| Source | Choose **Workspace** |
| Path | Use the navigator to specify the **Notebook #3** path provided above (notebook **Task Notebooks/Lesson 2 Notebooks/2.02 - Create Valid Region Table**) |
| Compute | From the dropdown menu, select a **Serverless** cluster (We will be using Serverless clusters for jobs in this course. You can also specify a different cluster if required outside of this course) |
| Depends on | Remove current **Depends on** task and replace with **Ingest_CSV** (the previous task we defined) is listed |

5. Click the blue **Create task** button

<br></br>

![Lesson02_Lab_ThreeTasks](files/images/deploy-workloads-with-databricks-workflows-2.0.1/Lesson02_Lab_ThreeTasks.png)

## D. Verify the Job is Configured Correctly
Run the cell below to check if you configured the job correctly. Modify any errors.

In [0]:
DA.validate_job_config()

1. Required job Id has been found.
2. Required job name labuser9084188_1738337208_Lesson_02 has been found.
3. Required task notebooks set correctly.
4. Job task names set correctly.
5. Task dependencies are set correctly.
-------------------------------------------
Your Job has been validated. Tests passed!


## E. Run the Job
1. Click the blue **Run now** button in the top right to run this job. It should take a few minutes to complete.

2. From the **Runs** tab, you will be able to click on the start time for this run under the **Active runs** section and visually track task progress.

3. On the **Runs** tab confirm that the job completed successfully.

<br></br>
![Lesson02_Lab_SuccessRun](files/images/deploy-workloads-with-databricks-workflows-2.0.1/Lesson02_Lab_SuccessRun.png)



## F. View the New Tables
1. In the left pane, select **Catalog**.

2. Expand the **dbacademy** catalog.

3. Expand your unique schema name.

4. Confirm that the job created the **customers_bronze**, **customers_invalid_region**, and **customers_valid_region** tables.

You can also use the `SHOW TABLES` statement to view available tables in your schema.

In [0]:
%sql
SHOW TABLES;

database,tableName,isTemporary
labuser9084188_1738337208,customers_bronze,False
labuser9084188_1738337208,customers_invalid_region,False
labuser9084188_1738337208,customers_valid_region,False
labuser9084188_1738337208,lesson1_workflow_users,False



&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>