Resources and scripts to start with Dbt
conda env create --file environment.yml
conda env update --name yq e '.name' environment.yml
--file environment.yml --prune
- create the environment with snowflake connector
# create config folder if it doesn't exist
[ -d ~/.dbt ] && echo "dbt config folder already exists" || mkdir ~/.dbt
- Create new project:
dbt init <project-name>
dbt init dbtlearns
Try running the following commands:
- dbt run
- dbt test
This can be done in two ways
- Manually using
dbt init <project_name>
- Using dbt-init
$ dbt-init --client <project_name> --warehouse <warehouse> --target-dir <project_folder_path>/
e.g. dbt-init --client dbtlearn --warehouse <warehouse> --target-dir ~/dbtlearn/
Topic | Purpose | Command |
---|---|---|
Dependencies installation | Install the dependencies based on the packages.yml definition | dbt deps |
Generate documentation | Go through all the yaml files and pick up the documentation and compile it into a HTML page | dbt docs generate |
Documentation serving | lightweight serve the documentation | |
Model execution | Restrict dbt run to only a specific model | dbt run <model_name> |
Model execution | Run a model and every model that builds on it | dbt run -s src_hosts+ |
Model execution | Run a full refresh on a model e.g. fct_reviews | |
TESTING A SINGLE MODEL | dbt test --select dim_listings_w_hosts |
|
Testing individual sources | dbt test --select source:airbnb.listings |
|
Debugging dbt | This will show all the SQL queries that dbt runs against the data-warehouse | dbt --debug test --select dim_listings_w_hosts |
Snapshots | Create snapshot | dbt snapshot |
Tests | Run all tests | dbt test |
Tests | Run only the tests that concern a certain model | dbt test --select dim_listings_cleansed |
Source freshness | Check the sources freshness | dbt source freshness |
Database connections | Check if all the database connections and the YAML files make sense | dbt compile |
dbt Seeds | Populate the seed in e.g. Snowflake | dbt seed |
Source table changes | If you change the schema, a source model or a source table a full refresh will rebuild the whole table. | dbt run --full-refresh |
https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup
- We want to make sure that our expectations about the input data stand all the time
- We need to test the suer facing tables of our pipeline
- The original Great Expectations project on GitHub: https://github.com/great-expectations/great_expectations
- dbt-expectations: https://github.com/calogica/dbt-expectations
- Find problems in dbt project with health checks
- Interpret queries via query explanations
When moving from SQL based environment, you can accelerate the process using the extension for example for generating dbt model from source definition or SQL It enable data lineage view on model and column level
https://github.com/dbt-labs/dbt-completion.bash
dbt seeds are CSV files with static data that can be loaded into your data platform with dbt. They are used for providing initial data or reference data that does not change frequently, such as a list of countries or product categories. Option A is not true as dbt seeds are not SQL scripts used for database migration. Option B is not true as dbt seeds are not raw data but rather preprocessed data. Option D is not accurate as macros are not used in dbt seeds.
Using an adapter for a data platform that supports a fully featured Python runtime
It holds tables and views accessible by the BI tool
dbthub There are also other packages outside of the hub like great expectations
dbt docs serve
Serve the files as static files in an nginx server References
- How to customize and serve DBT documentation in Google Cloud Platform
- https://www.entechlog.com/blog/data/how-to-host-dbt-docs/
- with mage https://docs.mage.ai/dbt/docs
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices
- The Complete dbt Bootcamp Course and Course repo
- dbt certification FAQ
- A curated list of awesome dbt resources
- dbt Python models