Creates dbt based GCP workflows.
Use the package manager pip to install dbt-databricks-factory for [dp (data-pipelines-cli)]:
pip install dbt-databricks-factory
To create a new dbt workflow json schema, run:
python -m dbt_databricks_factory.cli create-job \
--job-name '<job name>' \
--project-dir '<dbt project directory>' \
--profiles-dir '<path to profiles directory>' \
--git-provider '<git provider>' \
--git-url 'https://url.to/repo.git' \
--git-branch 'main' \
--job-cluster my-cluster-name @path/to/cluster_config.json \
--default-task-cluster my-cluster-name \
--library 'dbt-databricks>=1.0.0,<2.0.0' \
--library 'dbt-bigquery==1.3.0' \
--pretty \
path/to/dbt/manifest.json > workflow.json
This workflow will create a json file with the dbt workflow definition. You can then use it to create a new workflow in Databricks by for example post request like here:
curl --fail-with-body -X POST "${DATABRICKS_HOST}api/2.1/jobs/create" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d "@workflow.json" >job_id.json
echo "Job ID:"
cat job_id.json
curl --fail-with-body -X POST "${DATABRICKS_HOST}api/2.1/jobs/run-now" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-H "Content-Type: application/json" \
-d @job_id.json >run_id.json
echo "Run ID:"
cat run_id.json
curl --fail-with-body -X GET -G "${DATABRICKS_HOST}api/2.1/jobs/runs/get" \
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" \
-d "run_id=$(jq -r '.run_id' < run_id.json)" >run_status.json
jq < run_status.json
To get more information about the command, run:
python -m dbt_databricks_factory.cli create-job --help