Skip to content

slitayem/dbt-practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Resources and scripts to start with Dbt

Setup

Setup virtual environment

conda env create --file environment.yml conda env update --name yq e '.name' environment.yml --file environment.yml --prune

Setup dbt

  1. create the environment with snowflake connector
 # create config folder if it doesn't exist
[ -d ~/.dbt ] && echo "dbt config folder already exists" || mkdir ~/.dbt 
  1. Create new project: dbt init <project-name>
dbt init dbtlearns

Try running the following commands:

  • dbt run
  • dbt test

How to create a dbt project

This can be done in two ways

  • Manually using dbt init <project_name>
  • Using dbt-init
$ dbt-init --client <project_name> --warehouse <warehouse> --target-dir <project_folder_path>/

e.g. dbt-init --client dbtlearn --warehouse <warehouse> --target-dir ~/dbtlearn/

Useful commands

Topic Purpose Command
Dependencies installation Install the dependencies based on the packages.yml definition dbt deps
Generate documentation Go through all the yaml files and pick up the documentation and compile it into a HTML page dbt docs generate
Documentation serving lightweight serve the documentation
Model execution Restrict dbt run to only a specific model dbt run <model_name>
Model execution Run a model and every model that builds on it dbt run -s src_hosts+
Model execution Run a full refresh on a model e.g. fct_reviews
TESTING A SINGLE MODEL dbt test --select dim_listings_w_hosts
Testing individual sources dbt test --select source:airbnb.listings
Debugging dbt This will show all the SQL queries that dbt runs against the data-warehouse dbt --debug test --select dim_listings_w_hosts
Snapshots Create snapshot dbt snapshot
Tests Run all tests dbt test
Tests Run only the tests that concern a certain model dbt test --select dim_listings_cleansed
Source freshness Check the sources freshness dbt source freshness
Database connections Check if all the database connections and the YAML files make sense dbt compile
dbt Seeds Populate the seed in e.g. Snowflake dbt seed
Source table changes If you change the schema, a source model or a source table a full refresh will rebuild the whole table. dbt run --full-refresh

Custom generic tests are nothing but macros with special signature

Interesting dbt packages and tools

https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup

Debugging Tests and Testing with dbt-expectations

dbt Power User extension for VSCode

When moving from SQL based environment, you can accelerate the process using the extension for example for generating dbt model from source definition or SQL It enable data lineage view on model and column level

Autocompletion to the dbt CLI

https://github.com/dbt-labs/dbt-completion.bash

FAQs

What are dbt Seeds?

dbt seeds are CSV files with static data that can be loaded into your data platform with dbt. They are used for providing initial data or reference data that does not change frequently, such as a list of countries or product categories. Option A is not true as dbt seeds are not SQL scripts used for database migration. Option B is not true as dbt seeds are not raw data but rather preprocessed data. Option D is not accurate as macros are not used in dbt seeds.

What are the prerequisites for dbt Python models?

Using an adapter for a data platform that supports a fully featured Python runtime

What is the Mart layer?

It holds tables and views accessible by the BI tool

Where to find dbt third-party packages?

dbthub There are also other packages outside of the hub like great expectations

How to serve documentation?

lightweight server

dbt docs serve

Production grade serving

Serve the files as static files in an nginx server References

Resources: