# Databricks Asset Bundles

[Databricks Asset Bundles](https://www.databricks.com/resources/demos/tours/data-engineering/databricks-asset-bundles):

> Databricks Asset Bundles (DAB) is a new capability on Databricks that **standardizes and unifies the deployment strategy** for all data products developed on the platform.
> It allows developers to describe the infrastructure and resources of their project through a **YAML configuration file**.

The main take-aways from the above introduction about DAB are as follows:

1. DAB is all about standardizing deployment of Databricks projects
1. DAB is an [Infrastructure as code (IaC)](https://en.wikipedia.org/wiki/Infrastructure_as_code) tool
1. DAB uses a YAML configuration file to declaratively describe what/when/how


The [slides](https://docs.google.com/presentation/d/1bnnTR19j_nZhB0bDCMoGga-8Sq6eBjhBAom-6NJ6F0I/edit) of the talk on Databricks Asset Bundles at Data & AI Summit 2023


[Databricks Asset Bundle deployment modes](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html):

> Bundles enable programmatic management of Databricks Workflows


Databricks asset bundles make it possible to express complete data, analytics, and ML projects as a collection of source files called a bundle.

➡️ Learn more in the [official documentation](https://docs.databricks.com/en/dev-tools/bundles/index.html)

## Automate Databricks Deployments

DAB is not alone in the IaC/deployment 'market'.

Developers have been using the following for quite some time:

1. [Databricks REST API](https://docs.databricks.com/api/)
1. [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/index.html)
1. [Databricks Terraform provider](https://docs.databricks.com/en/dev-tools/terraform/index.html)
1. ~[dbx by Databricks Labs](https://docs.databricks.com/en/archive/dev-tools/dbx/dbx.html)~

### Migrate from dbx to Databricks Asset Bundles

[Migrate from dbx to bundles](https://docs.databricks.com/en/archive/dev-tools/dbx/dbx-migrate.html)

From [databrickslabs/dbx](https://github.com/databrickslabs/dbx#legal-information):

> Databricks recommends using Databricks asset bundles for CI/CD. Please see migration guidance on how to migrate from dbx to dabs


## Fun Fact: DAB == terraform

Note `terraform apply` in the output of `databricks bundle deploy`. 

```
Starting resource deployment
Error: terraform apply: exit status 1

Error: cannot create job: Invalid quartz_cron_expression: '44 37 8 * * ?'. Databricks uses Quartz cron syntax, which is different from the standard cron syntax. See https://docs.databricks.com/jobs.html#schedule-a-job  for more details.

  with databricks_job.jacek_demo_meetup_job,
  on bundle.tf.json line 82, in resource.databricks_job.jacek_demo_meetup_job:
  82:       }
```

## 🚀 Demo: On Fast Track to Deploy

[Develop a job on Databricks by using Databricks asset bundles](https://docs.databricks.com/en/workflows/jobs/how-to/use-bundles-with-jobs.html)


```
$ databricks --version
Databricks CLI v0.208.1
```


```shell
$ databricks bundle
Databricks Asset Bundles

Online documentation: https://docs.databricks.com/en/dev-tools/bundles

Usage:
  databricks bundle [command]

Available Commands:
  deploy      Deploy bundle
  destroy     Destroy deployed bundle resources
  init        Initialize using a bundle template
  run         Run a resource (e.g. a job or a pipeline)
  schema      Generate JSON Schema for bundle configuration
  sync        Synchronize bundle tree to the workspace
  validate    Validate configuration

Flags:
  -h, --help          help for bundle
      --var strings   set values for variables defined in bundle config. Example: --var="foo=bar"

Global Flags:
      --log-file file            file to write logs to (default stderr)
      --log-format type          log output format (text or json) (default text)
      --log-level format         log level (default disabled)
  -o, --output type              output type: text or json (default text)
  -p, --profile string           ~/.databrickscfg profile
      --progress-format format   format for progress logs (append, inplace, json) (default default)
  -t, --target string            bundle target to use (if applicable)

Use "databricks bundle [command] --help" for more information about a command.
```


Typical development flow using `databricks bundle`:

* `init`
* `deploy`
* `run`


```shell
$ databricks bundle init
Template to use [default-python]:
Unique name for this project [my_project]:
Include a stub (sample) notebook in 'my_project/src': yes
Include a stub (sample) Delta Live Tables pipeline in 'my_project/src': yes
Include a stub (sample) Python package in 'my_project/src': no

✨ Your new project has been created in the 'my_project' directory!

Please refer to the README.md of your project for further instructions on getting started.
Or read the documentation on Databricks Asset Bundles at https://docs.databricks.com/dev-tools/bundles/index.html.
```

```shell
$ cd my_project
```

```shell
$ databricks bundle deploy
Starting upload of bundle files
Uploaded bundle files at /Users/jacek@japila.pl/.bundle/my_project/dev/files!

Starting resource deployment
Resource deployment completed!
```

```shell
$ databricks bundle run
Update URL: https://training-partners.cloud.databricks.com/#joblist/pipelines/84f3895d-a910-4d9a-b8ec-ac275d4985bd/updates/ce459b1d-5323-46de-b0a4-86b459c13301

2023-10-21T12:58:16.972Z update_progress INFO "Update ce459b is WAITING_FOR_RESOURCES."
2023-10-21T13:01:50.065Z update_progress INFO "Update ce459b is INITIALIZING."
2023-10-21T13:02:36.634Z update_progress INFO "Update ce459b is SETTING_UP_TABLES."
2023-10-21T13:03:01.865Z update_progress INFO "Update ce459b is RUNNING."
2023-10-21T13:03:01.871Z flow_progress   INFO "Flow 'filtered_taxis' is QUEUED."
2023-10-21T13:03:01.893Z flow_progress   INFO "Flow 'filtered_taxis' is PLANNING."
2023-10-21T13:03:02.673Z flow_progress   INFO "Flow 'filtered_taxis' is STARTING."
2023-10-21T13:03:02.712Z flow_progress   INFO "Flow 'filtered_taxis' is RUNNING."
2023-10-21T13:03:42.162Z flow_progress   INFO "Flow 'filtered_taxis' has COMPLETED."
2023-10-21T13:03:43.702Z update_progress INFO "Update ce459b is COMPLETED."
```

## Variables

[Custom variables](https://docs.databricks.com/en/dev-tools/bundles/settings.html#custom-variables):

* Use custom variables to make your bundle configuration files more modular and reusable
* Variables work only with string-based values.
* E.g., the ID of an existing cluster for various workflow runs within multiple targets


`variables` mapping in a bundle configuration file

```yaml
variables:
  <variable-name>:
    description: <optional-description>
    default: <optional-default-value>
```


* You should provide the same values during both the deployment and run stages
* For variables, use substitutions in the format `${var.<variable_name>}`
* Use Databricks CLI's `--var` option to define the value of a variable


```shell
databricks bundle deploy --var "quartz_cron_expression=1"
```

## Deployment Modes

[Databricks Asset Bundle deployment modes](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html)


1. In CI/CD workflows, developers typically code, test, deploy, and run solutions in various phases, or modes.
1. The most common deployment modes include:
    * A development mode for pre-production validation
    * A production mode for validated deliverables
1. Databricks Asset Bundles provides an optional collection of default behaviors that correspond to each of these modes.1. Modes specify (declaratively) intended behaviors
1. `mode` mapping in a target (under `targets`)
    * `databricks bundle deploy -t <target-name>`


### Development mode

[Development mode](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html#development-mode)

1. `mode: development`
1. Tags deployed jobs and pipelines with a `dev` Databricks tag
1. Delta Live Tables pipelines run in `development: true`
1. _others_

### Production mode

[Production mode](https://docs.databricks.com/en/dev-tools/bundles/deployment-modes.html#production-mode)

1. `mode: production`
1. Validates that all related deployed Delta Live Tables pipelines are marked as `development: false`.
1. Validates that the current git branch is equal to the git branch that is specified in the target
      ```
      git:
        branch: main
      ```

## Bundle Templates

[Databricks Asset Bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html)

`databricks bundle init` accepts an optional path of the template to use to initialize a DAB project:
- `default-python` for the default Python template
- a local file system path with a template directory
- a git repository URL, e.g. https://github.com/my/repository


```shell
$ databricks bundle init --help
Initialize using a bundle template.

TEMPLATE_PATH optionally specifies which template to use. It can be one of the following:
- 'default-python' for the default Python template
- a local file system path with a template directory
- a Git repository URL, e.g. https://github.com/my/repository

See https://docs.databricks.com//dev-tools/bundles/templates.html for more information on templates.

Usage:
  databricks bundle init [TEMPLATE_PATH] [flags]

Flags:
      --config-file string    File containing input parameters for template initialization.
  -h, --help                  help for init
      --output-dir string     Directory to write the initialized template to.
      --template-dir string   Directory path within a Git repository containing the template.

Global Flags:
      --log-file file            file to write logs to (default stderr)
      --log-format type          log output format (text or json) (default text)
      --log-level format         log level (default disabled)
  -o, --output type              output type: text or json (default text)
  -p, --profile string           ~/.databrickscfg profile
      --progress-format format   format for progress logs (append, inplace, json) (default default)
  -t, --target string            bundle target to use (if applicable)
      --var strings              set values for variables defined in bundle config. Example: --var="foo=bar"
```

## 🚀 Demo: Create DAB Template (WIP)


An idea is to execute the following command with a random template name and guide the audience through errors.

```
databricks bundle init
```

## Source Code

Given [this recent PR](https://github.com/databricks/cli/pull/795/files), it appears that the source code of `bundle` command of Databricks CLI is in [Databricks CLI](https://github.com/databricks/cli/tree/main/cmd/bundle) repo itself.

> **Note**
>
> Phew, the source code is Go! 😬

## Questions

1. Any relationship between DAB and Databricks SDK?
1. What is `fixtures` directory for?
1. What is `tests` directory for?