# Assets bundles

Asset bundles are a way to define the databricks project as a code. You can develop your project locally by following the typical Databricks project patterns, and deploy it to the platform using your CI/CD pipeline.

## Configuration

The bundle's configuration of the bundle is defined in the `databricks.yml` file. Consider the configuration options for the bundle. Check the official desriptions in the [Databricks Asset Bundle configuration](https://docs.databricks.com/aws/en/dev-tools/bundles/settings).

- `bundle`: Specifies the Databricks environment and the bundle's basic proerpties.
- `include`: allows to specify other configuration files. When configuration is relatively complex, it's convenient to keep some configurations in the other files.
- [`scripts`](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#scripts): Define a script to be run in the local environemt. But the configuration specific to the Databricks environemnt that corresponds to the bunlde will be applied. You can use a command like `databricks bundle run <name specified for the script>`. 
- [`sync`](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#sync): Specifies which files will be pushed to the Databricks environemtn during `databricks bundle deploy`.
- [`artifacts`](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#artifacts): if your project is supposed to produce some output files during build (python whl, java jar, etc.) you have to specify this using `artifacts` attribute. The most important detail is that here is defined the script that generates the artifact; this script will be executed with the `databricks bundle build` command.
- [`variables`](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#variables): here, you can define variables that can be used in subtitutions.
- [`resources`](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#resources): specifies the Databricks [resources](https://docs.databricks.com/aws/en/dev-tools/bundles/resources#supported-resources). It is literaly the features of the Databricks used by the project lie: jobs, dashboards, clusters etc.
- [`targets`](https://docs.databricks.com/aws/en/dev-tools/bundles/settings#targets): sometimes you need several setups for the same project. The most popular cases are `dev` and `production`. The `targets` allows to specify exactly this.

### Artifacts

Consider the simpliest possible `artifact` usage. The following code specifies the artifact as the `result` file.

```yaml
artifacts:
  default:
    build: echo "this is new configuration" > result
```

Running the command `databricks bundle build` will create the `result` file, which will then published in the Databricks environment.

## Substitutions

With substitutions mechanisms you will be able to retrieve some values and substitute them to the config during `bundle build` or `bundle run`. As typcail you have to define your substitutions in the `${<variable name>}` format.

For example the following pattern in the configuration:

```yaml
artifacts:
  default:
    build: echo "This is ${bundle.name} bundle" > ${bundle.target}
```

It will create a file with the same name as the bundle's target and save the string that containing the bundle name there.

## Variables

Variables can be sepcified using following symtax:
```yaml
variables:
  <var_name1>:
    ...
  <var_name2>:
    ...
```

To pass a value to a variable, use the environment variable that follows the pattern `BUNDLE_VAR_<name of variable>`, databricks CLI commands executed from corresponding environement will automatically substitute this value.

---

As exmaple consider the following configuration for variables:

```yaml
variables:
  var1:
    default: value1
  var2:
    default: value2

artifacts:
  default:
    build: echo "${var.var1} and ${var.var2}" > result
```

The values `var1` and `var2` are defined in the bundle, and then used in the command that creates the file.

After running the pipeline, the content of the `result` file content will contain the default values of the variables.

```bash
$ databricks bundle deploy

Building default...
Uploading bundle files to /Workspace/Users/fedor.kobak@innowise.com/.bundle/python_default/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!

$ cat result

value1 and value2
```

You can specify the value by defining the environemnt variable `BUNDLE_VAR_<name of the variable>`:

```bash
$ BUNDLE_VAR_var1="hello" databricks bundle deploy

Building default...
Uploading bundle files to /Workspace/Users/fedor.kobak@innowise.com/.bundle/python_default/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!

$ cat result

hello and value2
```

## Execute scripts

To execute scripts with a bundle configuration cretedentials use `databricks bundle run [reference to the script]`. The script can be defined *inline* or specified in *databricks.yml*.

---

For example if you try to access the `DATABRICKS_HOST` from the local raw python environment, you will receive an error:


```bash
$python3 -c 'import os; print(os.environ["DATABRICKS_HOST"])'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    import os; print(os.environ["DATABRICKS_HOST"])
                     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "<frozen os>", line 717, in __getitem__
KeyError: 'DATABRICKS_HOST'
```

But the same command wroks fine under the Databricks CLI.

```bash
$ databricks bundle run -- python3 -c 'import os; print(os.environ["DATABRICKS_HOST"][:20])'
https://dbc-da0651ae
```