Skip to content

getindata/dbt-databricks-factory

Repository files navigation

dbt-databricks-factory

Python Version PyPI Version Downloads

Creates dbt based GCP workflows.

Installation

Use the package manager pip to install dbt-databricks-factory for [dp (data-pipelines-cli)]:

pip install dbt-databricks-factory

Usage

To create a new dbt workflow json schema, run:

python -m dbt_databricks_factory.cli create-job \
    --job-name '<job name>' \
    --project-dir '<dbt project directory>' \
    --profiles-dir '<path to profiles directory>' \
    --git-provider '<git provider>' \
    --git-url 'https://url.to/repo.git' \
    --git-branch 'main' \
    --job-cluster my-cluster-name @path/to/cluster_config.json \
    --default-task-cluster my-cluster-name \
    --library 'dbt-databricks>=1.0.0,<2.0.0' \
    --library 'dbt-bigquery==1.3.0' \
    --pretty \
    path/to/dbt/manifest.json > workflow.json

This workflow will create a json file with the dbt workflow definition. You can then use it to create a new workflow in Databricks by for example post request like here:

curl --fail-with-body -X POST "${DATABRICKS_HOST}api/2.1/jobs/create" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d "@workflow.json" >job_id.json

echo "Job ID:"
cat job_id.json
curl --fail-with-body -X POST "${DATABRICKS_HOST}api/2.1/jobs/run-now" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d @job_id.json >run_id.json

echo "Run ID:"
cat run_id.json
curl --fail-with-body -X GET -G "${DATABRICKS_HOST}api/2.1/jobs/runs/get" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-d "run_id=$(jq -r '.run_id' < run_id.json)" >run_status.json

jq < run_status.json

To get more information about the command, run:

python -m dbt_databricks_factory.cli create-job --help