# Databricks Asset Bundles to deploy a Kedro project


Deploying Kedro pipelines to Databricks using asset bundles offers a streamlined way to productionise data workflows. Asset bundles package your Kedro project—including code, dependencies, and configuration—into a self-contained artifact that can be uploaded and executed on Databricks. This approach simplifies deployment by separating development from execution, ensuring reproducibility and minimizing setup overhead. Once the asset bundle is built, it can be run as a Databricks job, making it easier to schedule and monitor pipelines in a scalable cloud environment.


## kedro-databricks

`kedro-databricks` is a Kedro plugin to develop Kedro pipelines for Databricks. The plugin provides three main features:

1. Initialisation: Transform your local Kedro project into a Databricks Asset Bundle project with a single command.
2. Generation: Generate Asset Bundle resources definition with a single command.
3. Deployment: Deploy your Kedro project to Databricks with a single command.

This online tutorial demonstrates https://www.youtube.com/watch?v=9ZttTN6zDM0 how to plugin works.

## Install kedro-databricks

To install the plugin, simply run:

```bash
pip install kedro-databricks
```
Now you can use the plugin to develop Kedro pipelines for Databricks.



## How to get started

### Prerequisites:

Before you begin, ensure that the Databricks CLI is installed and configured. For more information on installation and configuration, please refer to the [Databricks CLI documentation](https://docs.databricks.com/dev-tools/cli/index.html).

- [Installation Help](https://docs.databricks.com/en/dev-tools/cli/install.html)
- [Configuration Help](https://docs.databricks.com/en/dev-tools/cli/authentication.html)

Now you can initialise the Databricks asset bundle

```bash
kedro databricks init
```

Next, generate the Asset Bundle resources definition:

```bash
kedro databricks bundle
```

Finally, deploy the Kedro project to Databricks:

```bash
kedro databricks deploy
```

That's it! Your pipelines have now been deployed as a workflow to Databricks as `[dev <user>] <project_name>`. Try running the workflow to see the results.

## Commands

### `kedro databricks init`

To initialize a Kedro project for Databricks, run:

```bash
kedro databricks init
```

This command will create the following files:

```
├── databricks.yml # Databricks Asset Bundle configuration
├── conf/
│   └── base/
│       └── databricks.yml # Workflow overrides
```

The `databricks.yml` file is the main configuration file for the Databricks Asset Bundle. The `conf/base/databricks.yml` file is used to override the Kedro workflow configuration for Databricks.


### `kedro databricks bundle`

To generate Asset Bundle resources definition, run:

```bash
kedro databricks bundle
```

This command will generate the following files:

```
├── resources/
│   ├── <project>.yml # Asset Bundle resources definition corresponds to `kedro run`
│   └── <project-pipeline>.yml # Asset Bundle resources definition for each pipeline corresponds to `kedro run --pipeline <pipeline-name>`
```

The generated resources definition files are used to define the resources required to run the Kedro pipeline on Databricks.

### `kedro databricks deploy`

To deploy a Kedro project to Databricks, run:

```bash
kedro databricks deploy
```

This command will deploy the Kedro project to Databricks. The deployment process includes the following steps:

1. Package the Kedro project for a specific environment: by default this is `dev`.
2. Generate Asset Bundle resources definition for that environment
3. Upload environment-specific `/conf` files to Databricks
4. Upload `/data/raw/*` and ensure other `/data` directories are created
5. Deploy Asset Bundle to Databricks