# Overview of Deployment Patterns while working with Databricks

Databricks offers integration with Kedro through three principal workflows
1. Work within Databricks workspace
2. Hybrid workflow combining local IDE with Databricks
3. Deploy a packaged Kedro project to Databricks


## Work within Databricks workspace

<b>Pros & Cons:</b>
- Ideal for developers who prefer developing their projects in notebooks rather than an in an IDE
- Avoids the overhead of setting up and syncing a local environment with Databricks
- Flexibility for quick iteration
- But for production deployment, you need to consider a job-based deployment workflow

<b>Prerequisites:</b>
- An active Databricks account
- A Databricks cluster configured with a recent version (>= 11.3 is recommended) of the Databricks runtime.
- Python >= 3.9 installed.
- A GitHub account.
- Git installed.
- A python env management system installed, venv or conda are popular choices.

<b>Steps:</b>
1. On your machine, Install Kedro in a new virtual env and create a new Kedro project
2. Create a GitHub repo and push your Kedro project to the repo
3. Create a git repo on Databricks linking your Kedro project from step2
4. Create a new Databricks notebook
5. On Databricks, Kedro cannot access data stored directly in your project’s directory. As a result, you’ll need to move your 
project’s data to a location accessible. Copying files to DBFS or Volumes
6. Install project requirements - `%pip install -r "/Workspace/Repos/<databricks_username>/<project-name>/requirements.txt"`
7. Load Kedro IPython extension - `%load_ext kedro.ipython`
8. Load your Kedro project - `%reload_kedro /Workspace/Repos/<databricks_username>/<project-name>`. Loading your Kedro project 
with the `%reload_kedro` line magic will define four global variables in your notebook: context, session, catalog and pipelines. 
You will use the `session` variable to run your project.
9. Run your Kedro project - `session.run()`. You can modify and re-run the Kedro project, sync it with the remote GitHub repo. 

<b>Reference</b>
https://docs.kedro.org/en/stable/deployment/databricks/databricks_notebooks_development_workflow.html


## Hybrid workflow combining local IDE with Databricks

<b>Pros & Cons:</b>
- IDE’s capabilities for faster, error-free development, while testing on Databricks
- Helps with constant adjustments during early stages of learning Kedro
- Auto-completion and suggestions for code, improving your development speed and accuracy
- Linters like Ruff can be integrated to catch potential issues in your code.
- Static type checkers like Mypy can check types in your code, helping to identify potential type-related issues 
early in the development process.
- Use Databricks connect and run your project with Databricks compute.
- Use `kedro-databricks` and Databricks Asset Bundle to package your code for running pipelines on Databricks.
- But for production deployment, you need to consider a job-based deployment workflow

<b>Prerequisites:</b>
- An active Databricks account
- A Databricks cluster configured with a recent version (>= 11.3 is recommended) of the Databricks runtime.
- Python >= 3.9 installed.
- A python env management system installed, venv or conda are popular choices.

<b>Steps:</b>
1. On your machine, Install Kedro in a new virtual env and create a new Kedro project
2. Install and Authenticate the Databricks CLI - `pip install databricks-cli` and `databricks configure --token`. 
You need to get Databricks host and create a personal access token to provide when prompted.
3. Iterate and develop your Kedro project as per your requirements. 
4. Run your kedro project either via local compute or Databricks compute (using Databricks Connect)
5. Create Databricks Asset Bundles using `kedro-databricks` - `pip install kedro-databricks`, `kedro databricks init` 
and `kedro databricks bundle` which creates a Databricks job configuration inside `resource` folder
6. Deploy Databricks job using Databricks Asset Bundles - `kedro databricks deploy`
7. Run Databricks Job with databricks CLI - databricks bundle run

<b>Reference</b>
https://docs.kedro.org/en/stable/deployment/databricks/databricks_ide_databricks_asset_bundles_workflow.html

## Deploy a packaged Kedro project to Databricks

<b>Pros & Cons:</b>
- Go to choice when dealing with complex project requirements and production ready pipelines
- Provides a structured and reproducible way to run your code
- Significantly slower than running it as a notebook on a cluster that has already been started
- No way to change your project’s code once it has been packaged
- Unsuitable for development projects where rapid iteration is necessary

<b>Prerequisites:</b>
- An active Databricks account
- A Databricks cluster configured with a recent version (>= 11.3 is recommended) of the Databricks runtime.
- Python >= 3.9 installed.
- A python env management system installed, venv or conda are popular choices.
    
<b>Steps:</b>
1. On your machine, Install Kedro in a new virtual env and create a new Kedro project
2. Install and Authenticate the Databricks CLI - `pip install databricks-cli` and `databricks configure --token`. 
You need to get Databricks host and create a personal access token to provide when prompted.
3. Iterate and develop your Kedro project as per your requirements. 
4. Run your kedro project either via local compute or Databricks compute (using Databricks Connect)
5. Package your project - `kedro package`, This command generates a .whl file in the dist directory within your project’s root directory.
6. Upload project data and configuration to DBFS or Volumes - A Kedro project’s configuration and data do not get included when it is packaged. 
They must be stored somewhere accessible to allow your packaged project to run.
7. Deploy and run your Kedro project using the Workspace UI - 
https://docs.kedro.org/en/stable/deployment/databricks/databricks_deployment_workflow.html#deploy-and-run-your-kedro-project-using-the-workspace-ui

<b>Reference</b>
https://docs.kedro.org/en/stable/deployment/databricks/databricks_deployment_workflow.html#use-a-databricks-job-to-deploy-a-kedro-project