Collection of dbt tips and tricks. Includes links to the dbt slack community which you can join here. See something that needs to be changed? See contributing guidelines.
- New to dbt - start here for beginner tutorials
- Toolbox - basic tools that make working with dbt way easier
- Infrastructure and Deploying dbt
- dbt CLI - tips and tricks for the dbt CLI
- General Command Line - useful tricks for model and yml file manipulation from the command line
- Git Tips - useful in large dbt projects
- Jinja - it's surprisingly hard to learn basic Jinja concepts
- dbt Macros and dbt Specific Jinja - reference for dbt Macros and Jinja functions that come with dbt
- How dbtLabs stuctures dbt projects (prior version)
- dbtLabs dbt style guide
- HowTheydbt - repo of company specific dbt practice publications
- Overview of tests in dbt (updated for 0.20)
- Is Kimball Modeling Still Relevant? and Coalesce 2020: Kimball in the Context of the Modern Data Warehouse
- Make changes to your dbt source on your global project
- Overriding default schema and table names, using model file names as sources for schema and table names
- Creating date dimensions tables with dbt
- Examples of custom schema tests
- Sessionization best practices for large event tables
- Updating a global dictionary value with Jinja
- None and undefined in logical tests in Jinja
- Analyzing dbt project performance with artifacts
- dbt course refactoring legacy SQL to dbt
- dbt analytics engineering guide
- Clean up orphaned tables and views without matching current models
- Use a specific warehouse to run certain models (Snowflake) h/t
- Using macros to manage UDFs (user defined functions)
- dbt slack thread on approaches to CI builds and testing
- The JaffleGaggle Story: Data Modeling for a Customer 360 View
- dbt - start with the Fundamentals Course
- SQL - check out the Mode SQL tutorial
- git - try Think Like Git if you prefer to learn by reading, or Learn Git Branching if you prefer something visual and interactive
- Jinja - check out these tutorials by Prezemek Rogala who wrote the live parser I recommend
- dbt-utils : Many multipurpose macros. If you're writing something complex or custom, there's probably a better way using functionality from dbt-utils
- dbt-completion.bash : autocompletion for the dbt CLI h/t
- dbt-codegen : macros that generate dbt code to the command line h/t
- dbt-audit-helper : Zen and the art of data auditing. This package will change your life.
- pre-commit-dbt : Package of dbt pre-commit hooks that allow you to check quality of dbt project documentation, tests, etc
- dbt-helper : Utility functions to compare WH to dbt, create schema files, and list dependencies
- live Jinja Parser : Useful tool for writing complex Jinja, does not include dbt-specific Jinja functions
- yaml Checker : yaml syntax validator
- palm and palm-dbt: automate dbt workflows, expedite onboarding, and containerize your dbt project
- dagrules : A linter to enforce DAG organization conventions in your project h/t
- countries-states-cities-database: Useful dbt seed file source for geographic data and information h/t
- dbt-github-workflow - example CI / CD pipeline with BigQuery, GCP, and Airflow
- Example Meltano deploment - using AWS and Terraform
- Slim CI in Docker for BigQuery h/t
- dbt-artifacts - Snowflake-specific dbt package that builds dimensional models from dbt artifacts and provides macros to upload artifacts to Snowflake
- slack thread coordinating dbt & Looker deployments
dbt run -m state:modified+ && dbt test -m state:modified+
h/t and dbt's caveats
To list modifications since your last local run use dbt ls -m state:modified --state ./target/
.
dbt run -m model_name_here+1
only run the children one layer downstream in the DAG
or
dbt run -m model_name_here+n
where n is the offset
dbt run -m staging.airtable.*
to run all models in models/staging/airtable h/t or dbt run -m staging/airtable
dbt ls -m mymodel+
dbt run -m +tag:foo
h/t
dbt snapshot --select order_snapshot
dbt snapshot --select model1 model2 ... modeln
dbt --debug run
gives something similar to the logs h/t
dbt test --models source:*
dbt docs
dbt test -m config.materialized:snapshot
or dbt test -m config.materialzied:seed
not guaranteed to be supported h/t and caveat
# dbt_project.yml
...
snapshots:
+tags: ['snapshot']
To run use dbt test -m tag:snapshot
h/t
dbt run-operation my_macro --args '{"myarg1":"arrrgh", "myarg2":"aaaaaaaargh"}'
Command
+/
to block comment / uncomment yml sectionsCommand
+/
to block comment SQL sections in dbt cloudCommand
+ENTER
to preview data in dbt cloud- Use
F1
in dbt cloud to bring up a list of dbt cloud keyboard shortcuts h/t
- Incremental model strategy performance comparisons for BigQuery
- Incrementally --full-refresh an incremental model
- Adding a column to an incremental model
- Performance improving ideas for lagre incremental models
- dbt guide to snapshots
- Handling hard deletes in snapshots
- Handling late arriving records in snapshots
- Adding a column to a snapshot with
check_cols
strategy - Use dynamic schemas for snapshots
- Combine unique_where and combination_of_columns
- Opposite of accepted_values
- Store test failures in a custom schema
This section is stubby for now -- but I'd welcome contributions for how to make VS Code a more effective editor for dbt work!
- Guide to using dbt in VSCode with lots of great setup tips
- vscode-dbt-power-user
- dbt-power-user has commands for
Show Compiled SQL
andShow Run SQL
h/t - brew installed dbt
dbt is not installed
message in VS Code
touch {prefix1,prefix2}_model.sql
Sometimes I'll use a spreadsheet to easily generate the list of prefixes, check out the TEXTJOIN()
function in google sheets.
Find first using find path/to/directories/with*ifneeded -type f -name "example_file_prefix*"
to test your search criteria.
When happy, find path/to/directories/with*ifneeded -type f -name "example_file_prefix*" -delete
to delete. source
First, test with sed -e '/regex_goes_here/{action goes here}' path_to_file(s)_to_act_on
An example action to delete the matching line and subsequent lines could look like {N;N;d}
to delete the line and the line after it. More syntax examples for this use case
When you're happy with your output, add -i ''
to the start of your command. This will replace inplace without creating a new file. This sed -i '' -e '/ - name: my_model_prefix[a-z]*/{N;N;N;d;}' models/staging/*.yml
will delete the matching line and the 3 lines that follow it in the matching file(s) and line(s).
function dbtmrt () { dbt run -m $1 && dbt test -m $1}
for config
dbtmrt model_name
to use h/t
- run dbt docs generate
- open a new terminal window to run dbt docs serve
- just always have the docs open in the background - re run dbt docs generate in first terminal and refresh if needed h/t
# Functions
function dbtr() {
dbt run -m $1 --fail-fast
say done
}
function dbtrt() {
dbt run -m $1 && dbt test -m $1
say done
}
Note that the below has only been tested on a mac, and requires you to create a SQL file called audit_helper_template
in the analysis
folder of your dbt project.
function dbtah() {
# substitute the model name from the argument
gsed -i "s/model_to_audit/$1/" analysis/audit_helper_template.sql
# enable the audit_helper_template
gsed -i 's/enabled = false/enabled = true/' analysis/audit_helper_template.sql
# compile
dbt compile -m audit_helper_template
cat target/compiled/*/analysis/audit_helper_template.sql | awk NF | pbcopy
# modify the template back to the defaults
gsed -i 's/enabled = true/enabled = false/' analysis/audit_helper_template.sql
gsed -i "s/$1/model_to_audit/" analysis/audit_helper_template.sql
say copy pasta
}
-- The model needs to be disabled so it will be ignored while in typical compilation
-- This is required because dbt won't find a node named 'model_to_audit'
-- Substitute the correct production schema and database for your environment
{{
config(
enabled = false,
)
}}
{%- set audit_model = "model_to_audit" -%}
{%- set prod_schema = "prod_schema_name" -%}
{%- set dbt_database = "prod_database_name" -%}
{%- set dbt_relation = ref(audit_model) -%}
{%- set old_etl_relation=adapter.get_relation(
database=dbt_database,
schema=prod_schema,
identifier=audit_model
) -%}
{# Generate the audit query - update primary key as needed #}
{{ audit_helper.compare_relations(
a_relation=old_etl_relation,
b_relation=dbt_relation,
primary_key="primary_key_update_before_running"
) }}
To run use dbtah my_target_model
. Template heavily inspired by a tool created by Lewis Davies
If you need to move files from one directory to another, cd into the parent of the directories where the files are located, then run git mv child_dir1/model_name child_dir2/with_nested_dirs_if_needed
. To make sure it worked, run git status
afterwards and you should see your pending changes with renamed:
ahead of the file name. Commit as usual, keep rocking with your history!
You can find lots of arguments about whether or not this really keeps the history on stack overflow and elsewhere around the internet. I've found that using this method preserves commit history, but ymmv.
git mv old_file_name.sql new_file_name.sql
for example. Same comment about but-does-it-really from above applies.
Remove local branches with git branch -d my-merged-branch-name
for merged branches or git branch -D unmerged-local-branch
for unmerged branches. You can run git remote show origin
which will show all branches locally and on the origin. Running git fetch --prune
will remove all branches in this listing that look like refs/remotes/origin/branch-name
which are local snapshots of remote branches. source
You've been working along, and now git says your branch is out of date with the main branch. To fix this, swap over to the main branch with git checkout main
and then get any new changes with git pull --rebase
. Cool, now you're good locally, so switch over to your branch again with git checkout feature
and use git merge main
to update your branch.
Assuming your branch has not been pushed up to the remote repo, you can get around this! First, create a temporary branch to save your work git checkout -b temp_save
and then switch back to your develpoment branch git checkout my-feat
. Next, reset your branch and your working tree to the last commit you are confident in with git reset --hard 44e447a1
. What this will do is roll your branch and your working tree all the way back to the commit 44e447a1
. Next, you can apply the commits from your saved branch to the reset branch to 'walk forward' in time one by one until you find the problem that was introduced. To apply a commit from one branch to another, use git cherry-pick 1b132878
. This will create a totally new commit on your branch with the changes from the commit 1b132878
. It's not a good idea to do this if you've already pushed your branch, since you're then rewriting history and potentially others' commits.
A live parser is an incredibly useful tool while you're learning and writing jinja. I like this one
- A statement looks like
{% ... %}
- An expression looks like
{{ ... }}
- A comment looks like
{# ... #}
h/t - To escape a sequence you can use
{{ '{{ my_escaped_var}}'~
- To escape longer blocks of code you can use
{% raw %} {% endraw %}
Jinja2 docs
{# You won't see me #}
vs --I will show up all day
vs {# /* I won't show up AND your linter won't freak out /* #}
dbt discourse
{% set my_list = ['one','two',three'] %}
and accessing items from lists {{ my_list[1] }}
will return two
{% set my_dict = {
'key1': 'value1',
'key2': 'value2'
} %}
A general best practice is to do this at the top of your file.
{% set condition = True %}
{% if condition %}
what to execute when true
{% else %}
what to execute otherwise
{% endif %}
{% set my_list = ['one','two',three'] %}
{% for item in my_list %}
{{item}}
{% endfor %}
When you have queries that need to execute during the parse phase!
Example of a Macro Calling Another Macro
For loops for select statements need special handling to prevent a comma after the last item in the select. The hyphens are used to trim the whitespace for better looking compiled SQL.
SELECT
{% for item in list_of_items -%}
{{item}}
{%- if not loop.last -%}
,
{%- endif %}
{% endfor -%}
{% macro my_macro(required_parameter, optional_parameter = default_value) %}
...
{% endmacro %}
adapter performs many useful functions and adapts them to the specific database context you're using
- dispatch - database specific versions of macros
- get_missing_columns - find columns existing in one relation but missing in another, return as a list - can identify new coulmns in sources
- expand_target_column_types - to make one relation match another, some limitations
- get_relation - returns a relation object from the database from the provided database, schema, identifier
- get_columns_in_relation - returns a list of columns in a relation
- create / drop schema or relation - methods for creating, dropping, or renaming schemas or relations in the database