This repository for course dbt and bigquery in digitalskola
- Install dbt core and dbt-bigquery
- Setup BigQuery connection for dbt
- Testing dbt installation
- Setup dbt with bigquery as data platform
- Testing dbt connection
- ELT Data Architecture & Data Modelling
- Upload raw data files
- Create Models
- Create Test
- Create documentations
- Create a new venv
python -m venv dbt-env # create the environment
- Activate that same virtual environment each time you create a shell window or session:
source dbt-env/bin/activate # activate the environment for Mac and Linux OR dbt-env\Scripts\activate # activate the environment for Windows
There are many data platform adapter that support dbt but in this course is using BigQuery
python -m pip install dbt-bigquery
Please refer to BigQuery Configs
Type command bellow in terminal or command line to testing are dbt is succcessfully installed
dbt --version
- Run dbt cli to init dbt with BigQuery as data platform
dbt init <your project name>
- Choose bigquery for database
- Choose service account file
- Input full path location service account file
- Input GCP Project ID
- Input dataset in BigQuery
- Input threads 1
- Input job execution timeout seconds 300
- Choose US as desired location option
Type command bellow in terminal or command line to testing are dbt can connect to BigQuery or other data platform
dbt debug
Also don't forget to copy dbt_project.yml to your dbt project folder location
In this course, for Data Modelling use Kimbal's Method - Star Schema
- Upload Raw data csv files in folder seeds use dbt cli command
dbt seed
- After that, there are new table in raw dataset
Copy paste all folder staging and warehouse to your local then run
dbt run
- Create schema.yml under models/warehouse and copy paste bellow code
version: 2
models:
- name: dim_product
columns:
- name: product_id
tests:
- not_null
- unique
- Run dbt cli bellow
dbt test
- Modify schema.yml under models/warehouse and copy paste bellow code
version: 2
models:
- name: dim_product
description: Cleansed table which contains list of products.
columns:
- name: product_id
description: Primary key for the product
tests:
- not_null
- unique
- name: product_code
description: code of each product
tests:
- not_null
- name: fact_inventory
columns:
- name: inventory_id
description: Primary key for the fact inventory table
tests:
- not_null
- unique
- Run dbt cli bellow for generate your project's documentation.
dbt docs generate
- Run dbt cli bellow for display dbt webserver in browser. Default port is 8080 but you can use different port using --port flag
dbt docs serve --port 8001