# Databricks Asset Bundles (DABs) - Part 1
## Configuration & Dev Deployment

**Databricks Asset Bundles (DABs)** allow you to express your Databricks data, analytics, and ML projects as code (Infrastructure as Code). They provide a unified interface to manage your project's resources (Jobs, Pipelines, Notebooks) and deploy them across different environments (Dev, Staging, Prod) using CI/CD.

### Learning Objectives
1.  Understand the structure of a DABs project.
2.  Configure `databricks.yml` manually from scratch.
3.  Define resources (Jobs, DLT Pipelines, Schemas) using YAML.
4.  Understand **Targets**, **Variables**, and **Presets**.
5.  Deploy bundles using the **UI** and **Databricks CLI**.
6.  Understand **Source-Linked Deployment** vs. **Bundle Deployment**.

### Prerequisites
*   Databricks Workspace (with Serverless enabled for UI deployment features).
*   Databricks CLI installed (v0.218.0 or higher recommended).
*   Knowledge of Git and basic YAML syntax.

## 1. Project Structure & Initialization

While you can use `databricks bundle init` to generate a template, understanding the manual structure is crucial for customization.

We will create a project structure within Databricks Repos (Git Folders) as follows:

```text
my_dab_project/
├── databricks.yml          # Main configuration file
├── resources/              # Folder for resource definitions
│   ├── jobs/               # YAML files for Jobs
│   ├── pipelines/          # YAML files for DLT Pipelines
│   ├── schemas/            # YAML files for Schemas
│   └── notebooks/          # Source code notebooks
└── src/                    # Source code folder
    └── env/
        └── variables.yml   # Environment variables
```

**Note:** The root must contain the databricks.yml file for Databricks to recognize it as a Bundle project.

## 2. Configuring `databricks.yml`

The `databricks.yml` is the heart of your bundle. It defines the bundle name, includes other configurations, permissions, and deployment targets.

### Basic Configuration Structure

```yaml
bundle:
  name: dab_demo  # Unique name for the bundle

include:
  - src/env/*.yml           # Include variable files
  - resources/jobs/*.yml    # Include job definitions
  - resources/pipelines/*.yml # Include pipeline definitions
  - resources/schemas/*.yml   # Include schema definitions

permissions:
  - group_name: de_grp      # Grant permissions to specific groups
    level: CAN_MANAGE

targets:
  dev:
    mode: development       # optimized for interactive dev loops
    default: true
    workspace:
      host: https://<your-databricks-instance-url>
    
  qa:
    workspace:
      host: https://<your-qa-databricks-instance-url>
```

### Key Concepts:
1. include: Allows you to split your configuration into multiple files for better organization.
2. targets: Defines the environments (e.g., dev, qa, prod).
3. mode: development:

    Adds a prefix (e.g., [dev <user>]) to resource names to avoid collisions.

    Enables **Source-Linked Deployment** (files are read directly from the Repo/IDE, skipping the upload step for faster iteration)

## 3. Defining Resources (YAML)

Instead of clicking through the UI, we define resources in YAML files under the `resources/` folder.

### A. Defining a Job (`resources/jobs/demo_job.yml`)

```yaml
resources:
  jobs:
    dab_demo_job:
      name: dab_demo_job
      tasks:
        - task_key: demo_notebook
          notebook_task:
            notebook_path: ../notebooks/demo_repo_notebook.py # Relative path
            source: WORKSPACE
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 15.4.x-scala2.12
            node_type_id: Standard_D4ds_v5
            num_workers: 1
```

### B. Defining a DLT Pipeline (resources/pipelines/demo_pipeline.yml)
We can use variables (like ${var.catalog}) to make configurations dynamic across environments.

```yaml
resources:
  pipelines:
    dlt_orders_pipeline:
      name: dlt_orders_pipeline
      target: ${var.catalog}  # Dynamic catalog name
      libraries:
        - notebook:
            path: ../notebooks/dlt_dab_demo_orders.sql
      configuration:
        my_param: value
      development: true
```

### C. Defining a Schema (resources/schemas/bronze_schema.yml)
We can also manage Unity Catalog schemas via DABs.

```yaml
resources:
  schemas:
    bronze_schema:
      name: bronze
      catalog_name: ${var.catalog}
```

## 4. Variables and Environment Specifics

To handle differences between environments (like Catalog names), we use variables.

**1. Define Variables (`src/env/variables.yml`)**
```yaml
variables:
  catalog:
    description: The target catalog for the environment
    default: dev  # Default value for dev target
```

**2. Override in databricks.yml Targets**
```yaml
targets:
  dev:
    # uses default catalog: dev
  
  qa:
    variables:
      catalog: qa_catalog  # Overrides catalog for QA
```

## 5. Development Workflow: Presets & Source Linking

In a shared development environment, multiple developers might work on the same bundle. To prevent conflicts, we use **Presets**.

Add this to your `databricks.yml` under `targets: dev`:

```yaml
targets:
  dev:
    presets:
      name_prefix: "dev_${workspace.current_user.short_name}"
```

#### What this does:
When you deploy to dev, the job dab_demo_job will actually be created as dev_username_dab_demo_job. This allows every developer to have their own copy of the resources.

### Source-Linked Deployment
By default, mode: development enables source-linked deployment.

1. True (Default): Jobs reference the notebook files directly in your Workspace/Rep folder. Changes to code take effect immediately without redeploying the bundle.
2. False: The CLI uploads the files to a hidden .bundle folder. Jobs reference these isolated files.

To force isolation (simulating production behavior in dev), set:

```yaml
targets:
  dev:
    presets:
      source_linked_deployment: false
```

In [None]:
# CLI Commands Reference
# You can run these commands in the Databricks Web Terminal or your local terminal.

# 1. Validate the bundle configuration (checks for syntax errors)
# !databricks bundle validate -t dev

# 2. Deploy the bundle to the 'dev' target
# !databricks bundle deploy -t dev

# 3. View the summary of what is deployed
# !databricks bundle summary -t dev

# 4. Run a specific job defined in the bundle
# !databricks bundle run dab_demo_job -t dev

# 5. Destroy/Cleanup resources created by the bundle
# !databricks bundle destroy -t dev

## 6. Deployment via UI

If **"Collaborate on Databricks Asset Bundles"** is enabled in your Workspace Preview settings, you will see a **Deployments** tab (Rocket icon) in the left sidebar when inside a Bundle folder.

1.  Click the **Deployments** icon.
2.  Select the **Target** (e.g., `dev`).
3.  Click **Deploy**.
4.  You can also inspect variables and override them directly in the UI before deployment.

### Note on Gitignore
Ensure you add the hidden `.databricks` folder to your `.gitignore` file. This folder contains local state information about deployments and should not be committed to the repository.

```text
# .gitignore
.databricks/
```

## 7. Summary & Next Steps

We have successfully:
1.  Created a DABs project structure manually.
2.  Configured `databricks.yml` with targets and permissions.
3.  Defined Jobs, Pipelines, and Schemas as code.
4.  Used Variables for environment-agnostic configuration.
5.  Deployed to a `dev` environment using both source-linking and isolated bundle deployment.

In the **next notebook**, we will take this to the next level by configuring an **Azure DevOps Pipeline** to automate the deployment of this bundle to a **QA environment** using CI/CD.