
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# 04L - Deploy a DAB to Multiple Environments

### Estimated Duration: 15-20 minutes

In this lab, you will:

1. Modify the variables in the **databricks.yml** file to reference the correct resources and variables (dev and prod).

2. Deploy a project to the **development** and **production** environments with different configurations for each.


## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## A. Classroom Setup

Run the following cell to configure your working environment for this course.

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ../Includes/Classroom-Setup-04L

## IMPORTANT LAB INFORMATION

Recall that your credentials are stored in a file when running [0 - REQUIRED - Course Setup and Authentication]($../0 - REQUIRED - Course Setup and Authentication).

If you end your lab or your lab session times out, your environment will be reset.

If you encounter an error regarding unavailable catalogs or if your Databricks CLI is not authenticated, you will need to rerun the [0 - REQUIRED - Course Setup and Authentication]($../0 - REQUIRED - Course Setup and Authentication) notebook to recreate the catalogs and your Databricks CLI credentials.

**Use classic compute to use the CLI through a notebook.**

## SCENARIO

You are in charge of deploying Databricks projects in your organization using Databricks Asset Bundles. So far, you've configured the project to deploy to the **development** environment (**02L - Deploy a Simple DAB**). Your next task is to modify the **databricks.yml** file to deploy the project into the development and production environments with different configurations. You will accomplish this with variable substitution.

#### Development configuration target requirements:
- Use the development data in your **username_1_dev** catalog

#### Production configuration target requirements:
- Use the development data in your **username_3_stage** catalog

## B. Preview the Development and Production Data

1. Preview the **nyctaxi_raw** development data within your **username_1_dev** catalog. Notice that the dev data contains 100 rows.


In [0]:
spark.sql(f'''
          SELECT * 
          FROM {DA.catalog_dev}.default.nyctaxi_raw
          ''').display()

2. View the tables in your **username_1_dev** catalog. Notice that the **nyctaxi_bronze** and **nyctaxi_silver** tables do not exist.


In [0]:
spark.sql(f'SHOW TABLES IN {DA.catalog_dev}.default').display()

3. Preview the **nyctaxi_raw** production data within your **username_3_prod** catalog. Notice that the production data contains about 22,000 rows.

In [0]:
spark.sql(f'''
          SELECT count(*) AS RotalRows 
          FROM {DA.catalog_prod}.default.nyctaxi_raw
          ''').display()

In [0]:
spark.sql(f'''
          SELECT * 
          FROM {DA.catalog_prod}.default.nyctaxi_raw
          ''').display()

4. View the tables in your **username_3_prod** catalog. Notice that the **nyctaxi_bronze** and **nyctaxi_silver** tables do not exist.

In [0]:
spark.sql(f'SHOW TABLES IN {DA.catalog_prod}.default').display()

## C. TO DO: STEPS



1. Run the cell below to obtain your lab user name.

In [0]:
print(DA.catalog_name)

2. In a new tab, open the **./resources/lab04_nyc.job.yml** file. Explore the file and complete the following:

   a. Name the actual job **lab04_dab_`${workspace.current_user.userName}`**. This will dynamically add your user name to the end of the job.

   b. Under **parameters** add the bundle target variable as the default value of **display_target**
    - **HINT:** [Variable substitutions](https://docs.databricks.com/aws/en/dev-tools/bundles/variables)


<br></br>
**Solution Resources File**
```YAML
resources:
  jobs:
    lab04_dab:
      name: lab04_dab_${workspace.current_user.userName}  # <----- lab04_dab_ + Append your user name variable value to the end of the job name
      tasks:
        - task_key: create_nyc_tables
          notebook_task:
            notebook_path: ../src/our_project_code.sql
            source: WORKSPACE
      parameters:
        - name: target
          default: ${bundle.target}       # <---- Add the bundle.target variable here as a job value
```

3. In the new tab, open the **databricks.yml** file and explore the bundle configuration. Notice the following:

  - The bundle name is **demo04_lab_bundle**.

  - The **include** mapping is empty.

  - The **variables** mapping contains a variety of variables. Explore the variables.

  - The **target** mapping contains a **dev** and **prod** target environment.

  Leave the **databricks.yml** file open.


4. In the **databricks.yml**, complete the following:

   a. In **includes**, add the **./resources/lab04_nyc.job.yml** file.
      - **HINT:** [include mapping](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#include)

   b. In **variables**, add your username to the variable `user_name` (your lab username can be found in step 1 of this section).
      - The `user_name` variable populates the `catalog_dev` and `catalog_prod` variables dynamically.

   c. Under **targets**, complete the following to modify the catalog for the **dev** and **prod** targets:

      - In **dev**, below **resources** within your job, add the `catalog_dev` variable as the value for `target_catalog`.

      - In **prod**, below **resources** within your job, add the `catalog_prod` variable as the value for `target_catalog`.

   **NOTE:** You can add or modify the configuration of your resources within the **targets** configuration based on the environment requirements. In this example, we are adding an additional job parameter to our job defined in the **./resources/lab04_nyc.job.yml** file:
   
   - For **dev**, create a job parameter named **catalog_name** that uses the `catalog_dev` value.
   
   - For **prod**, create a job parameter named **catalog_name** that uses the `catalog_prod` value.

   This is a great way to change the deployment based on the target environment. This example keeps it simple by adding basic job parameters, but you can modify a variety of configuration values using this method.


### C1. Deploy to Development

1. Run the Databricks CLI command below to confirm the Databricks CLI is authenticated.

<br></br>
##### DATABRICKS CLI ERROR TROUBLESHOOTING:
  - If you encounter an Databricks CLI authentication error, it means you haven't created the PAT token specified in notebook **0 - REQUIRED - Course Setup and Authentication**. You will need to set up Databricks CLI authentication as shown in that notebook.

  - If you encounter the error below, it means your `databricks.yml` file is invalid due to a modification. Even for non-DAB CLI commands, the `databricks.yml` file is still required, as it may contain important authentication details, such as the host and profile, which are utilized by the CLI commands.

![CLI Invalid YAML](../Includes/images/databricks_cli_error_invalid_yaml.png)

In [0]:
%sh
databricks catalogs list

2. Check the version of the Databricks CLI. Confirm the version is **v0.240.0**.

In [0]:
%sh
databricks -v

3. Validate your **databricks.yml** bundle configuration file using the Databricks CLI. Run the cell and confirm the validation was successful. If there is an error fix the error. 

    **HINT:** You can refer to the documentation for the [bundle command group](https://docs.databricks.com/en/dev-tools/cli/bundle-commands.html) for help with validating, deploying, running, and destroying a bundle.


    **NOTE:** For an example solution you can view the **databricks_solution.yml** file within the **solutions** folder. 


In [0]:
%sh
databricks bundle validate

4. Deploy the bundle to the development environment using the Databricks CLI.

    After the cell completes:
    - Manually check to see if the job was created successfully. The job name will be **[dev user_name] lab04_job_username**.
    - Check the **job parameters** and confirm it's using your **username_1_dev** catalog and that the **target** is *dev*.

    **NOTE:** This will take about a minute to complete.

    **HINT:** You can refer to the documentation for the [bundle command group](https://docs.databricks.com/en/dev-tools/cli/bundle-commands.html) for help with validating, deploying, running, and destroying a bundle.


In [0]:
%sh
databricks bundle deploy -t dev

5. Run the bundle using the target development environment using the Databricks CLI. 

    **NOTE:** This will take about a 1-2 minutes to complete.

    **HINT:** You can refer to the documentation for the [bundle command group](https://docs.databricks.com/en/dev-tools/cli/bundle-commands.html) for help with validating, deploying, running, and destroying a bundle.


    **HINT:** Remember to use the key name from the resources mapping in the databricks.yml file(your name will differ):
```
...
resources:
  jobs:
    lab04_dab:    # <--- The job key name here
      name: lab04_dab_${var.user_name}
```


In [0]:
%sh
databricks bundle run -t dev lab04_dab

6. After the job successfully completes, run the following cells to confirm both tables **nyctaxi_bronze** and **nyctaxi_silver**  were created in the **username_1_dev** catalog, and the **nyctaxi_bronze** table contains 100 rows.

In [0]:
spark.sql(f'SHOW TABLES IN {DA.catalog_dev}.default').display()

In [0]:
check_nyctaxi_bronze_table(user_catalog = DA.catalog_dev, total_count=100)

### C2. Deploy to Production

1. Deploy the bundle to the production environment using the Databricks CLI. This will take about a minute to complete.

    After the cell completes:
    - Manually check to see if the job was created successfully. The job name will be **lab04_job_username**.
    - Check the **job parameters** and confirm it's using your **username_3_prod** catalog and that the **target** is *prod*.

**HINT:** You can refer to the documentation for the [bundle command group](https://docs.databricks.com/en/dev-tools/cli/bundle-commands.html) for help with validating, deploying, running, and destroying a bundle.

**NOTE:** Typically when running in production you will want to run the job using a service principal. For more information, check out the [Set a bundle run identity](https://docs.databricks.com/aws/en/dev-tools/bundles/run-as). For demonstration purposes, we are simply running the production job as the user.

In [0]:
%sh
databricks bundle deploy -t prod

2. Run the bundle using the target production environment using the Databricks CLI. 

    **NOTE:** This will take about a 1-2 minutes to complete.

    **HINT:** You can refer to the documentation for the [bundle command group](https://docs.databricks.com/en/dev-tools/cli/bundle-commands.html) for help with validating, deploying, running, and destroying a bundle.

In [0]:
%sh
databricks bundle run -t prod lab04_dab

3. After the job successfully completes, run the following cells to confirm both tables **nyctaxi_bronze** and **nyctaxi_silver**  were created in the **username_3_prod** catalog, and the **nyctaxi_bronze** table contains 21,932 rows.

In [0]:
spark.sql(f'SHOW TABLES IN {DA.catalog_prod}.default').display()

In [0]:
check_nyctaxi_bronze_table(user_catalog = DA.catalog_prod, total_count=21932)

### BONUS
This was a simple example of deploying a DAB to multiple environments.

- There are a variety of ways to set a variable's value. In this lab, we set values within the **databricks.yml** configuration file. You can also set variable values within the Databricks CLI. For more information, view the [Set a variable’s value](https://docs.databricks.com/en/dev-tools/bundles/variables.html#set-a-variables-value) documentation.

- For additional information on overriding configuration values for environments, view the [Override cluster settings in Databricks Asset Bundles](https://docs.databricks.com/aws/en/dev-tools/bundles/cluster-override) documentation.



&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>