
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>


# Demonstration - Integration Tests with Databricks Workflows

This demonstration guides users through creating a Model Training Pipeline Workflow in Databricks using three notebooks: **01 Silver to Feature Store for preparing features**, **02 Features Validation**, and **03 Train Model on Validated Features**. The workflow begins by setting up a Databricks job with the first notebook, adding necessary parameters, and ensuring successful execution. Dependent tasks are then added sequentially for the remaining notebooks, with each task configured to use appropriate parameters and run on the specified cluster. This process demonstrates how to build an end-to-end pipeline in Databricks Workflows, integrating data preparation, validation, and model training in a scalable and structured manner.

**Learning Objectives**

By the end of this demonstration, you will be able to: 

- Understand how to navigate the Databricks Workflows interface to create and manage jobs.
- Learn how to configure a Databricks job with tasks, including setting task dependencies and parameters.
- Develop the ability to integrate multiple notebooks into a single workflow for building an end-to-end pipeline.
- Demonstrate how to use Databricks Workflows to automate data preparation, feature validation, and model training tasks.
- Gain hands-on experience in parameterizing tasks to customize workflow execution for different data schemas and configurations.
- Understand the importance of task dependencies in ensuring sequential execution within a pipeline.
- Learn to troubleshoot and validate task success before progressing to subsequent steps in the workflow.



## Step 1

**ðŸš¨WARNING: The following instructions are designed to show you what a failed run looks like. DO NOT PANIC, the next cell describes what to fix.**

Using the folder titled **2.2.1 - Model Training Pipeline**, you will find three different notebooks:
1. **01 Silver to Feature Store**
2. **02 Feature Validation**
3. **03 Train Model on Validated Features**

These notebooks will be used to create a workflow in Databricks.

## Steps to Create the Workflow

1. **Navigate to Jobs & Pipelines**:
   - Click on **Jobs & Pipelines** in the Databricks UI.

2. **Create a New Job**:
   - Click on **Create** and select **Job** from the dropdown.
   - Set the Job name to ***Integration Tests with Databricks Workflows*** or something similar for easy identification.

3. **Set Up the First Task**:
   - Select `Notebook`.
   - Provide a **Task Name** (e.g., `Silver to Feature Store`).
   - Set the **Type** to `Notebook`.
   - For the **Source**, select `Workspace`.
   - In **Path**, navigate to the first notebook: `01 Silver to Feature Store`.

4. **Configure Compute**:
   - In the **Compute** section, select the cluster you have been working on.

5. **Add Parameters**:
   - In the **Parameters** section, add the required parameters for the notebook using the keys and values given here (key: value):
     - **catalog: dbacademy**
     - **column: Age**
     - **schema: <your schema>**
     - **silver_table_name: diabetes**
     - **target_column: Diabetes_binary**
   Alternatively, you can click on JSON next to the parameter entries and copy and paste the following:

   ```
      {
         "catalog": "dbacademy",
         "target_column": "Diabetes_binary",
         "schema": "<your_schema>",
         "column": "Age",
         "silver_table_name": "diabetes"
      }
   ```

6. **Create the Task**:
   - Once all details and parameters are entered, click on **Create Task**.
7. Click **Run now** at the top right.

## Handling the Error

1. **Navigate to Jobs & Pipelines**:
   - In the left sidebar, click on **Jobs & Pipelines**.
   - Find the job you just ran.

2. **Identify the Failed Run**:
   - Look to the right under **Recent Runs**.
   - Find the **red X** with a circle around it, indicating a failed run.
   - Click on the red X to view details about the failure.

3. **Analyze the Error**:
   - On the screen, you'll see the error message explaining why the job failed.
   - In this case, the error states: **"No input widget named primary_key is defined."**
   - This happened because we forgot to define the **primary key** parameter when setting up the task.

4. **Fix the Issue**:
   - Navigate to the top right and click **Edit Task**, located next to **Repair Run**.
   - Go to the **Parameters** section and click **Add**.
   - Enter the following:
     - **Key**: `primary_key`
     - **Value**: `id`
   - Click **Save Task**.

5. **Re-run the Workflow**:
   - Click **Run Now** at the top right.
   - Once the run starts, a message box will appear. Click **View Run** in the message box.
   - This will take you to the notebook, where you can watch the notebook run successfully.

By correcting the parameter setup and re-running the task, the workflow will now execute as expected.


## Step 2: Add and Run the Remaining Tasks

**1. Navigate to Jobs & Pipelines:**
- Click on **Jobs & Pipelines** in the Databricks UI.
- Open the job you just created.

**2. Add the Feature Validation Task:**
  1. Click on the **Tasks** tab.
  2. Click on the **Add Task** button.
  3. Set up the task:
     - **Task Name**: `Feature Validation`
     - **Type**: `Notebook`
     - **Source**: `Workspace`
     - **Path**: Select the notebook **`02 Feature Validation`**.
  4. Configure **Compute**:
    - Choose the **same cluster** as the previous task.
  5. Set **Task Dependencies**:
    - In the **Depends On** section, ensure it is set to **`Silver to Feature Store`**.
  6. **Add Parameters**:
    - Copy and paste the following parameters:
      ```json
      {
        "catalog": "dbacademy",
        "normalized_column": "Age",
        "schema": "<your_schema>"
      }
      ```
  7. Click **Create Task**.

**3. Add the Train Model on Features Task:**

Now, we need to add the **third task** to the workflow, which will train the model on validated features.

  1. Click on **Add Task** again.
  2. Set up the **third task**:

      - **Task Name**: `Train Model on Features` (or any preferred name).
      - **Type**: `Notebook`.
      - **Source**: `Workspace`.
      - **Path**: Navigate to the **third notebook**: `03 Train Model on Validated Features`.

  3. Configure **Compute**:
    - Choose the **same cluster** as the previous tasks.
  4. **Set Task Dependencies**:
    - In the **Depends On** section, select **`Feature Validation`**.
  5. **Add Parameters**:
    - Copy and paste the following parameters:
      ```json
      {
        "catalog": "dbacademy",
        "username": "<your_username>",
        "target_column": "Diabetes_binary",
        "schema": "<your_schema>",
        "primary_key": "id",
        "silver_table_name": "diabetes"
      }
      ```
  6. Click **Create Task**.

**4. Run the Workflow**
  1. Once all tasks are created, click **Run Now**.
  2. Wait for the pipeline to complete execution.
  3. Ensure that **all tasks run successfully** before proceeding to the next step.

### Step 3: Verify Model Registration in Unity Catalog
After completing Step 2, the final step is to confirm that the trained model has been successfully **logged in Unity Catalog**.

**1. Check the Model Naming Convention**
- The model will be named following this pattern:
  ```
  dbacademy.<schema_name>.workflows_classifier_model
  ```
  - Replace `<schema_name>` with the **schema name you used in Step 2**.

**2. Locate the Model in Unity Catalog**
1. Navigate to **Catalog** in the Databricks UI.
2. Open the **schema** you have been working on.
3. Look for the model named **`workflows_classifier_model`**.

**3. Verify Model Lineage (Optional)**
- You can inspect the model lineage within **Catalog Explorer**:
  1. Click on the **model version**.
  2. Select **Lineage**.
  3. Choose **Workflows** to see that this model was created using the workflow you just ran.

By confirming the model's presence in Unity Catalog, you ensure that the **end-to-end workflow has successfully stored and registered the model for future use**.

Additionally, you can inspect the model lineage within **Catalog Explorer** by clicking on the model version and selecting lineage. Select **Models** to see that this model was created using the job you created.

## Concluding Remarks

In this demonstration, we explored how to perform an **integration test** using **Databricks Workflows**. The process involved the following steps:

1. **Creating a Simple Job**:
   - We started by creating a workflow that read silver-layered data from **Unity Catalog**, transformed it, and created a **feature table** stored in the **Databricks Feature Store**.

2. **Validating Features**:
   - We attached a second notebook to the workflow to validate that the features in the feature table were behaving as expected, such as confirming normalization.

3. **Training a Model**:
   - A third notebook was added to train a model using the validated features, and the resulting model was stored in **Unity Catalog** for centralized management.

By completing this workflow, we demonstrated how to build and validate an end-to-end pipeline for machine learning using Databricks Workflows, ensuring both the data and model meet expectations at every step.

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>