This is a copier template for dbt projects. It's useful for scaffolding out a basic project structure and configuration with modern tooling quickly.
Note
This is a very new project. It's so far only been tested using MacOS with BigQuery, DuckDB, and Snowflake. For it to become robust, it needs to be tested on other platforms and with other data warehouses. I very much welcome Issues, Discussions, and Pull Requests to help make this project better. Even if you're not comfortable contributing code, testing it out with various platforms and warehouses and reporting any issues you encounter is incredibly helpful. When we get further along I'll create a support matrix with platform and warehouse coverage detailed.
You will need python3
, pipx
, and git
installed to use this template. You will also need a database to connect to. If you don't want have or want access to a cloud warehouse I suggest using DuckDB, which runs locally and thus is simpler to connect to.
The following features are implemented or planned:
- dbt Best Practices project structure
- Basic initial dbt Project configuration
- Coverage of all non-password-based authentication methods for the below warehouses1
- Warehouse-aware dbt profile configuration for the following options, check out the dbt docs on warehouse-specific profile configs for more details:
- Snowflake - using
authenticator: externalbrowser
with SSO - BigQuery - using
method: oauth
viagcloud
CLI - Databricks - using
token: <empty>
, you will need to create a personal access token in Databricks, and fill it into that field in to your~/.dbt/profiles.yml
manually once the project is created1 - Redshift - using
method: IAM
viaawscli
- Postgres - still haven't decided the best way to handle Postgres
- DuckDB - local warehouse, no authentication required
- Snowflake - using
- Linting and formatting of SQL with SQLFluff
- Configurable linting rules in the setup process
- Use
sqlfmt
instead ofSQLFluff
for formatting - Option to not use a SQL linter or formatter
- Modern Python tooling from astral.sh:
ruff
anduv
for formatting, linting, and dependency management - Pre-commit hooks for automated linting, formatting, and fixes on commit
- Selection of various pre-commit hooks or not using pre-commit at all
- A selection of recommended dbt packages:
dbt-utils
dbt-expectations
dbt-date
dbtplyr
dbt-codgen
- A selection of useful macros relevant to most projects (e.g.
limit_in_dev
,cents_to_dollars
, etc.) -
dbt-codegen
scripts to build sources and staging models from warehouse metadata on project creation - Support for generating a new dbt Cloud project and setting up the dbt Cloud CLI instead of a
profiles.yml
file - CI/CD configurations for various major git hosting services (GitHub Actions, GitLab CI, Bitbucket Pipelines, etc.)
These are the things at present I don't plan on implementing. I'm quite open to changing my mind on these, but I have reasons for not including them right now which are listed below.
- A selection of models and tests for common use cases:
- You should use dbt packages on the dbt Package Hub for this. We don't want to reinvent the wheel here. This one probably won't change.
- Password-based authentication for warehouses:
- We move the profiles.yml out of the project on template creation, so credentials are never stored or committed to the repo created from this template. Thus, while highly unlikely that this would be a security risk, I just don't feel comfortable supporting it for the time being. I'm very open to hearing from more experienced database security professionals on differing opinions, or any ideas on how to leverage something like environment variables for increased security while still providing a good developer experience. That said, I want to complete the main goals of this project before tackling this.
- Using
pip
instead ofuv
:uv
is a much faster and more modern tool for Python package management, and I'd like to encourage its adoption. While I want to provide as much optionality as possible, changing this would be more complicated and I decided to make a choice β that said you're more than welcome to fork this repo and change it to usepip
if you'd like! Like the rest of the non-goals, once I've accomplished the main goals I'm open to revisiting this.
- All of the above re
uv
forruff
as well. The astral.sh tooling is just really great, handles the job of multiple tools in one, and is so fast, I don't think I can be persuaded to go backwards on this one. Now if only somebody would makeruff
for SQL...
Before embarking on this quick journey: if your data platform has a CLI tool that lets you authenticate, like gcloud
for BigQuery, awscli
for Redshift, etc., make sure you have it installed and authenticated before running the setup process, as we will use these tools to authenticate and configure your ~/.dbt/profiles.yml
file in the most simple and secure way possible.1
-
Install
copier
if you haven't already:pipx install copier
- pipx is like pip, but for installing global Python CLI tools
- It houses each tool in a dedicated virtual environment, so you don't have to worry about dependency conflicts
-
Create a new dbt project from this template:
# read below re the --trust flag copier gh:gwenwindflower/copier-dbt <path/to/project-name> --trust
-
gh:
tells copier to use a GitHub repository as the source for the template -
The directory you specify is where the new project will be created, you don't need to create it beforehand, but do make sure there isn't already a directory with the same name there with work you don't want to mess up
-
π¨
--trust
will allow copier to optionally run a series of commands to set up your project after it templates everything. These are listed in thecopier.yml
at the bottom in the_tasks
list, and they're detailed below. I highly encourage you to look through these to make sure you really do trust and understand them before using the--trust
flag above that will allow them to (potentially) run. These commands are very straightforward and standard, this is very similar to using a project'smake
commands,dbt init
, or other build scripts, but letting somebody's code run commands on your machine should always be considered carefully. They are chunked up logically into sections which can be opted into, they all default toFalse
(no commands run, just templating). The command chunks the template can run for you are:-
virtual_environment
β Create and activate a virtual environment for the project in your newly templated project directory, installuv
, compile arequirements.txt
, and install the dependencies from that file:python3 -m venv <virual_environment_name> source <virual_environment_name>/bin/activate python3 -m pip install --upgrade pip python3 -m pip install uv uv pip compile requirements.in -o requirements.txt uv pip install -r requirements.txt
-
Put the contents of the
profiles.yml
file in the correct place in your home directory then remove the file from your project for security (again, no credentials ever get entered but in case you do edit it and put in credentials I don't want you to accidentally commit it)mkdir -p ~/.dbt && cat profiles.yml >> ~/.dbt/profiles.yml rm profiles.yml
-
Initialize a new git repo in your project and make an initial commit, then install the pre-commit hooks in your project for future commits (we need a
.git
directory to install the pre-commit hooks, so we have to do this after the initial commit)git init git add --all git commit -m "Initial commit." source/<virual_environment_name>/bin/activate && pre-commit install
-
-
If you feel more comfortable with it you can just clone or fork the repo, delete the tasks section, skip the
--trust
flag, and run the commands manually after the project is created β it will accomplish the same thing just with a bit more manual work β in that case the command to run copier would becopier copy <path/to/cloned-repo> <path/to/project-name>
. As mentioned though, the tasks all default toFalse
so you can opt out of any or all of them even with the--trust
flag.
-
-
Follow the prompts to configure your project, depending on your answers to certain prompts, different prompts may appear or disappear (e.g. if you choose your
data_warehouse
asbigquery
you'll get a different set of questions to configure theprofiles.yml
, if you leavevirtual_environment
asFalse
, we won't prompt you for a virtual environment name). -
Your project is now ready to use!
cd
into the newly created project and run:dbt deps dbt debug
dbt deps
will install the dbt packages included in the templatedbt debug
will run a series of tests to ensure that your dbt project is configured correctly and connects to your data warehouse properly
-
Start building your dbt project!
- Consider using the included
dbt-codegen
package to build some initial sources and staging models from your data warehouse metadata. - Once you've got some models built, try running
dbt build
to run and test your models.
- Consider using the included
-
Push it!
- The setup process will have initialized a git repository for you and made an initial commit of the starting state, so you can go right ahead and push your new project to your favorite git hosting service. It will run the pre-commit hooks automatically on commit, so you don't have to worry about linting or formatting your code before you commit it.
-
If you're looking to just explore dbt, try using some of the public datasets potentially available on your platform. Most have a lot of cool ones! For example, BigQuery has a public dataset for the New York City Taxi and Limousine Commission that's really fun to play with. If you use DuckDB and connect to MotherDuck they have a bunch of Hacker News data that's quite fun to play with.
-
This project, thanks to the incredible
uv
and it's native support ofpip-tools
'pip compile
functionality, uses a more readablerequirements.in
file to define top-level dependencies, which then compiles that to a highly detailedrequirements.txt
file which maps all sub-dependencies to the top-level packages they are required by. This makes it much easier to deal with versions and upgrading. Alsouv
is wildly fast. Take a peek at these files to get the gist, and check outuv
's documentation to learn more. -
If you decide you like
uv
, it may be a good idea to install it globally so you can use it for initializing new projects and other things. You can find the installation instructions in theuv
documentation . -
Always make sure you're installing Python packages in a virtual environment to avoid dependency conflicts (or using
pipx
if it really is supposed to be global). Not to be a broken record, but yet another cool thinguv
does is always install your packages into a virtual environment by default, even if it's not activated (unlikepip
), and it will prompt you to create one if one doesn't exist yet. This comes in super handy to save you from accidentally installing a project's dependencies globally.- If you need to update any dependencies you can change the version(s) in the
requirements.in
file and runuv pip compile requirements.in -o requirements.txt
to compile an updatedrequirements.txt
file. Then runuv pip install -r requirements.txt
to install the updated dependencies.
- If you need to update any dependencies you can change the version(s) in the
-
If you don't want to use a cloud warehouse, I recommend using
duckdb
as your local warehouse. It's a really neat database that's super fast on medium-sized data and has one of the best SQL syntaxes in the game right now. It can run completely locally, but you can also easily wire it up to cloud storage like S3 or GCS, or even a cloud warehouse SaaS called MotherDuck.
If you're new to dbt, SQL, or Jinja, I highly recommend the following learning resources:
- dbt Learn - dbt Labs' official learning platform, with a bunch of great free courses to get you started
- Mode's SQL Tutorial - IMO the best free resource to learn SQL from the ground up
- Jinja's official documentation - specifically the Template Designer Docs in the link. Jinja is a really powerful templating language that dbt and many other projects use (including
copier
i.e. this repo!). Once you get the basics of dbt and SQL down, learning Jinja will take your dbt projects to the next level. - dbt Labs' How we structure our dbt projects guide - the standard resource covering the best way to structure your dbt projects and why. This template follows these guidelines.2
If you're looking to deploy the dbt project you create with this template, the best way is with dbt Cloud.2 It includes advanced orchestration, a cloud-based IDE, an interactive visual Explorer with column-level lineage, flexible alerts, auto-deferral, version control, and a lot more. It's the best way to get a dbt project into production quickly, easily, and reliably β and to get multiple people with varied knowledge working on the same project efficiently. If you're interested in trying it out, you can sign up for a free trial and get started in minutes.
There are some really useful command line tools for folks developing dbt projects locally (meaning they're using SQL, Jinja, Python, and the command line a lot). Here are a few I recommend:
zoxide
- a faster, easier-to-use, and more flexible replacement for thecd
command that learns your habits and saves you a lot of typing with a combination of fuzzy search and frecency (frequency + recency) sorting of your directory changing historyrip
- a safer and easier-to-use replacement for therm
command that moves files to the trash instead of deleting them and lets you recover them if you make a mistakefzf
- a fuzzy finder that makes it easy to search through your command history, files, and directories super fastbat
- acat
replacement that adds syntax highlighting and line numbers, alias it tocat
and never look backeza
- a faster and more powerful replacement for thels
commandfd
- a faster and easier-to-use replacement for thefind
commandripgrep
- a much faster and more powerful replacement for thegrep
commandatuin
- a more powerful and magical shell history tool, with fuzzy search and a lot of other cool featuresstarship
- a really cool and fast shell prompt that's highly customizable (using TOML so it's very easy and readable) and has a lot of cool features, and the default settings are great if you don't want to bother customizing itkitty
- a fast, feature-rich (great font, image, and mouse support, for example), and highly customizable terminal emulator that's a joy to use
Typing long commands is a bummer, if you plan on doing a lot of Python and dbt development, I highly recommend setting up aliases for common commands in your shell configuration (~/.bashrc
, ~/.zshrc
, etc.). For example, you could add the following to your shell configuration to make running dbt and python commands easier (just make sure they don't conflict with existing aliases or commands, customize to your liking!):
export EDITOR=<your favorite text editor>
# dbt alias suggestions
alias dbtp="$EDITOR ~/.dbt/profiles.yml"
alias db="dbt build"
alias dbs="dbt build -s"
alias dt="dbt test"
alias dts="dbt test -s"
alias dr="dbt run"
alias drs="dbt run -s"
alias dp="dbt parse"
alias dmv="dbt parse && mf validate-configs"
# Python alias suggestions
alias python="python3"
alias venv="uv venv .venv"
alias va="source .venv/bin/activate"
alias venva="venv && va"
alias pi="uv pip"
alias pir="uv pip install -r"
alias pirr="uv pip install -r requirements.txt"
alias pc="uv pip compile requirements.in -o requirements.txt"
alias piup="uv pip install --upgrade pip"
alias vpi="venva && piup && pirr"
alias vpci="venva && piup && pc && pirr"
# Go to your project, activate the virtual environment, and open it in your text editor
alias <something short and memorable>="cd <path/to/project> && venva && $EDITOR ."
- Notice we can use previously defined aliases in new aliases. For example,
vpci
usesvenva
andpirr
to update the project's dependencies and install them.
Footnotes
-
I've only selected the most secure and simple authentication method for each warehouse for the time being. You can manually configure more complex and specific authentication methods like password-based authentication, SSO, JSON keys, etc. in the
~/.dbt/profiles.yml
file after the setup process is complete. Wherever possible though, I've opted for simplicity and security β for example the configuration for BigQuery requires that you have installed thegcloud
CLI and authenticated using OAuth through that. The Redshift authentication method is also the most secure and simple method available, using IAM roles and theawscli
's~/.aws/config
credentials to authenticate. I highly recommend sticking with these methods and using these tools if it's an option. β© β©2 β©3 -
I work for dbt Labs, I'm very biased! π€·π»ββοΈ Also I wrote the How we structure our dbt projects guide, so y'know, maybe a bit biased there too πΉ. β© β©2