
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# 07L - Make a Machine Learning Task

### Estimated Duration: 25-30 minutes

In this lab, you will
- Create and maintain variable configurations for data assets with DABs.
- Understand and modify bundle YAML configuration files.
- Use the Databricks CLI with Notebooks to validate and deploy DABs with a ML asset.

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## A. Classroom Setup

Run the following cell to configure your working environment for this course.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ../Includes/Classroom-Setup-7L

## IMPORTANT LAB INFORMATION

Recall that your credentials are stored in a file when running [0 - REQUIRED - Course Setup and Authentication]($../0 - REQUIRED - Course Setup and Authentication).

If you end your lab or your lab session times out, your environment will be reset.

If you encounter an error regarding unavailable catalogs or if your Databricks CLI is not authenticated, you will need to rerun the [0 - REQUIRED - Course Setup and Authentication]($../0 - REQUIRED - Course Setup and Authentication) notebook to recreate the catalogs and your Databricks CLI credentials.

**Use classic compute to use the CLI through a notebook.**

## SCENARIO

Congratulations! You’ve successfully built the bulk of your workflow. The ML team has asked you to ensure your tests meet their requirements for inferencing a model they've deployed in the Dev environment. You don’t need to learn ML—just know how to attach the model to the workflow using the bundle you’ve already built.

**Optional task before starting:** For data scientists with ML knowledge, you can inspect the pre-trained model by navigating to experiments. For this demonstration, you don't need to understand the model—your goal is simply to add it to your bundle.


Run the Databricks CLI command below to confirm the Databricks CLI is authenticated.

<br></br>
##### DATABRICKS CLI ERROR TROUBLESHOOTING:
  - If you encounter an Databricks CLI authentication error, it means you haven't created the PAT token specified in notebook **0 - REQUIRED - Course Setup and Authentication**. You will need to set up Databricks CLI authentication as shown in that notebook.

  - If you encounter the error below, it means your `databricks.yml` file is invalid due to a modification. Even for non-DAB CLI commands, the `databricks.yml` file is still required, as it may contain important authentication details, such as the host and profile, which are utilized by the CLI commands.

![CLI Invalid YAML](../Includes/images/databricks_cli_error_invalid_yaml.png)

In [0]:
%sh 
databricks workspace list /Users

## B. Update **variables.yml**

Within the folder where this notebook is located, you will find a folder called [**TODO - Lab DABs Workflows**]($./TODO - Lab DABs Workflow). You will be updating some of the files in this folder to attach a machine learning model to the workflow. You *do not* need to know what this model does. The goal of this exercise is to understand how to attach an additional Unity Catalog asset, which is a registered ML model in this case.

### Instructions:
1. Navigate to the **src** folder. Here you will find two folders, **dlt_pipelines** and **helpers**, and two notebooks, **Final Visualization** and **Inference**. The notebook we’ll focus on for the lab is **Inference**. Click on it and inspect the cells.

2. In a separate tab, navigate to **resources** and click on **variables.yml**. We will need to update this YAML file with some additional variables.

3. In the **Inference** notebook, you’ll see some variables being called in the cell under the header **Parameterize the notebook for our workflow and passing variables**—namely `base_model_name` and `silver_table_name`.

    - We will need to point these to our bundle YAML files.

    - Add two new variables, `base_model_name` and `silver_table_name`, to **variables.yml** in the section marked **PLEASE ONLY CHANGE THE VARIABLES IN THE FOLLOWING SECTION**.

    - To find the default value for `base_model_name`, locate the model in the dev catalog under **Models**.

    - The default value for `silver_table_name` is the silver table you created within the dev catalog. You can provide whatever description you like for these two new variables.

        - **HINT**: If you didn't run through the previous demonstration, the silver table's name is also in the `ingest-bronze-silver_dlt` notebook located in **src/dlt_pipelines**.

4. We will use a cluster for this ML task. Define a third variable called `cluster_id`. You have four options for defining this variable:

    - Option 1: Define the lookup variable in username and use `${var.username}` to reference the `username` variable.

    - Option 2: Use `lookup` and set the `cluster` value to `${workspace.current_user.userName}`.

    - Option 3: Hardcode the default value using the `lookup` method.

    - Option 4: Find your cluster ID by navigating to Compute on the left menu, clicking on your cluster, selecting the three vertical dots, and clicking **View JSON**. Copy the cluster ID near the top of the JSON. Alternatively, you can get the cluster ID by running the following code snippet in a new cell: `print(spark.conf.get("spark.databricks.clusterUsageTags.clusterId"))`. Paste this value for the `default` value of `cluster_id` in the `variables.yml` file.

### Summary:
By completing the tasks, you should have created three new variables: `base_model_name`, `silver_table_name`, and `cluster_id`. Each variable will have a description and a default value.

## C. Update **dabs_workflow_with_ml.job.yml**

Now that we’ve updated our **variables.yml** file, let's move on to updating our workflow. We will not be configuring the DLT pipeline in this step.

### Instructions:
Navigate to **resources** and open the **job** folder. Here you will find all the tasks previously created. Create a new task for inferencing the ML model with the following constraints:

  - The task name can be anything you choose.
  - The task must depend on **Health_ETL**.
  - Add a key called `existing_cluster_id` and set its value to reference the `cluster_id` variable you created in the previous step.
  - Add a `notebook_task` that contains a `notebook_path`, `base_parameters`, and `source`:
    - `notebook_path` should reference our `Inference` notebook.
    - `base_parameters` should have 3 keys. There are two keys that reference the variables we created earlier: `base_model_name` and `silver_table_name` and one that will reference the dev catalog (_Hint: use a variable that was already pre-configured in the `variables.yml` file_).
  - You can also provide a description if desired.

**HINT**: Use the existing tasks as templates to help with this step.

### Summary:
By completing this task, you should have created a new task in your workflow within `dabs_workflow.job.yml` and be ready to validate the bundle.


## D. Bundle Validation and Deployment
Now that you have updated your **variables.yml** and **dabs_workflow.job.yml`** files, you are ready to validate your bundle before deployment!

Use the Databricks CLI to print out a summary of all of the resources defined in the project and the corresponding names which will be generated after deploying the bundle. Note, you will have to `cd` into the bundle folder.

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle summary

Use the Databricks CLI to validate the bundle. Note, you will have to `cd` into the bundle folder.

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
pwd;
databricks bundle validate -t development;

Deploy the bundle to the development environment. Note, you will have to `cd` into the bundle folder.

In [0]:
%sh
cd "./TODO - Lab DABs Workflow"
databricks bundle deploy -t development

## E. Run the Job
Option 1: Use the UI to run the job and visually watch the tasks kick off. Click on each task to inspect the notebooks or DLT pipeline. 

Option 2: Run the job using the CLI.

Note: You will need to delete the DLT pipeline if you worked through the previous demonstration. The name of the pipeline created in the previous demonstration is of the form **[dev <usesrname>] health_etl_pipeline_development**.

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle run ml_health_etl_workflow -t development

Destroy the bundle for development.

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle destroy -t development --auto-approve

### Staging Bundle Validation, Deployment, and Run

Imagine now that you have gone through the process of reviewing your code, analyzed code coverage, etc. and you are ready to now deploy and test within a staging environment. DABs makes this extremely easy by changing and passing a few parameter values. 

#### Instructions:
Using the stage environment (catalog), do the following:
1. Run a summary on the bundle. 
1. Validate the bundle. 
1. Deploy the bundle. 
1. Run the bundle. 
1. Destroy the bundle.

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle summary -t stage

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle validate -t stage

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle deploy -t stage

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle run ml_health_etl_workflow -t stage

Destroy the bundle for stage.

In [0]:
%sh 
cd "./TODO - Lab DABs Workflow"
databricks bundle destroy -t stage --auto-approve

## Summary:

In this lab you used a new data asset stored in Unity Catalog to create a new task for your workflow. By understanding how the YAML files are structured, you were able to update your workflow by defining new variables and create a new task - one that you didn't even necessarily need to know about.

### Next Steps:
Try to create your own DAB from scratch using the results from this lab. It's recommended that you incrementally build your workflow one task at a time and making small changes until you are comfortable with understanding the architecture of your workflow. 

* mention VSC for the next step


&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>