This project demonstrates how Python DABs (Databricks Asset Bundles with Python resource generation) can replace custom scripting workflows for managing Databricks jobs.
A deployment pattern seen in the field is managing Databricks jobs by exporting them as JSON, then running a script that dynamically injects task blocks based on a config file. This works, but it means maintaining both the config and the script that translates it into job definitions. Environment-specific differences (cluster sizes, parameters) add more conditional logic to the script.
Python DABs lets you write Python code that runs at deploy time to generate Databricks resources. Instead of a separate script that patches JSON exports, the bundle itself reads a YAML config and produces the job definition natively.
The workflow becomes:
- Drop a YAML file in
config/(the filename becomes the job name) - Run
databricks bundle deploy -t <target> - The Python code in
resources/jobs.pydiscovers all YAML files, builds a job for each one with environment-appropriate settings, and deploys them
No intermediate scripts, no JSON patching, no manual environment switching. Adding a new job is just adding a new YAML file.
python-dab-demo/
├── databricks.yml # Bundle config: enables Python resources, defines targets
├── pyproject.toml # Python dependencies
├── config/
│ ├── python_dab_demo_pipeline.yaml # One job per YAML file (filename = job name)
│ └── daily_reporting_pipeline.yaml # Add more YAML files to create more jobs
├── resources/
│ └── jobs.py # Discovers config/*.yaml and generates jobs at deploy time
└── src/
└── sample_task.py # Parameterized notebook (placeholder for real task notebooks)
The python block is what enables Python resource generation:
python:
venv_path: .venv
resources:
- resources.jobs:load_resourcesThis tells the bundle to call load_resources() from resources/jobs.py during validation and deployment. The function returns resource definitions (jobs, in this case) that get merged into the bundle just like YAML-defined resources would.
Three targets are defined: dev, stage, and prod. The target name is passed into the Python code so it can adjust cluster sizing and task parameters per environment.
Each YAML file in config/ defines a separate job. The filename (minus .yaml) becomes the job name. A file just needs a list of tasks:
tasks:
- name: ingest_raw_data
notebook: src/sample_task.py
description: Ingest raw data from source systems into bronze layerThis is the file you'd hand to someone and say "add your tasks here." No Databricks API knowledge needed. Want another job? Create another YAML file.
This is where the generation happens. At deploy time:
load_resources()globs all*.yamlfiles inconfig/- For each file,
build_job()creates a condition task (check_is_monday) that gates the pipeline on whether the trigger day is Monday, using{{job.trigger.time.iso_weekday}} - It loops over the YAML tasks and builds notebook task dicts, each depending on the condition task's
trueoutcome - Cluster sizing scales per target:
dev= 1 worker,stage= 2,prod= 5 - Each task receives
task_nameandenvironmentas notebook parameters Job.from_dict()constructs the job (accepts the same structure as YAML job definitions, making it easy to translate between formats)
A Databricks notebook that receives task_name and environment via widget parameters. In a real pipeline, each task entry in the YAML would point to its own notebook. Here they all share one notebook that branches on the task name for demonstration purposes.
Prerequisites: Databricks CLI and uv installed, with a CLI profile configured.
# Clone and set up
git clone <repo-url>
cd python-dab-demo
uv venv && uv pip install -e .
# Validate (checks that Python resource generation works)
databricks bundle validate -t dev
# Deploy
databricks bundle deploy -t devIf you use a non-default CLI profile, either set it in your environment or pass it as an env var:
DATABRICKS_CONFIG_PROFILE=myprofile databricks bundle deploy -t devAdd a new job: Create a new YAML file in config/ with a tasks list. The filename becomes the job name. Redeploy.
Add a task to an existing job: Add an entry to that job's YAML file and redeploy.
Change cluster sizing: Edit the worker_counts dict in get_cluster_config() inside resources/jobs.py.
Change the condition logic: The check_is_monday condition task uses {{job.trigger.time.iso_weekday}} (1 = Monday, 7 = Sunday). Swap the operator or reference value to gate on a different day, or replace it with {{job.trigger.time.is_weekday}} to run on any weekday.
Different notebooks per task: Update the notebook field in each YAML task entry to point to different notebook paths.
Cloud provider: The default node_type_id is Standard_D4s_v3 (Azure). For AWS, use something like i3.xlarge. For GCP, use n1-standard-4.