## Data Modeling

Use this notebook to run the dbt commands to build, test and publish the data warehouse models

### What is dbt?

dbt (data build tool) is tool that simplifies data transformation within your data warehouse, allowing you to build, test, and document data models using SQL.  It ensures data quality and consistency through a modular approach and facilitates collaboration between data engineers and analysts.

![Data modeling lineage](../../images/ozkary-data-engineering-process-data-warehouse-lineage.png)


In [3]:
# Make sure the dependencies are in the version you need
!dbt deps

[0m02:30:27  Running with dbt=1.8.2
[0m02:30:28  Updating lock file in file path: /home/ozkary/workspace/de-mta/Step4-Data-Warehouse/dbt/package-lock.yml
[0m02:30:28  Installing dbt-labs/dbt_utils
[0m02:30:29  Installed from version 1.0.0
[0m02:30:29  Updated version available: 1.3.1
[0m02:30:29  
[0m02:30:29  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps


- Builds and tests the model and uses variables to allow for the full dataset to be created (false allows for the entire dataset to be materialized)

```bash
$ dbt build --select stg_station --var 'is_test_run: false'
$ dbt build --select dim_station 
```  

> 👉 **stg_station** to build the view, **dim_station** to build the table

**Understanding --model vs. --select**
- --model: Specifies a single model to run.
- --select: Specifies a pattern to match multiple models.

In [5]:
!dbt run --select dim_station.sql

[0m02:36:33  Running with dbt=1.8.2
An error occurred: module 'importlib.metadata' has no attribute 'packages_distributions'
[0m02:36:35  Registered adapter: bigquery=1.8.2
[0m02:36:35  Unable to do partial parsing because of a version mismatch
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
[0m02:36:38  Found 10 models, 1 seed, 30 data tests, 1 source, 596 macros
[0m02:36:38  
[0m02:36:40  Concurrency: 2 threads (target='dev')
[0m02:36:40  
[0m02:36:40  1 of 1 START sql incremental model mta_data.dim_station ........................ [RUN]
[0m02:36:44  1 of 1 OK created sql incremental model mta_data.dim_station ................... [[32mMERGE (0.0 rows, 85.6 MiB processed)[0m in 4.35s]
[0m02:36:44  
[0m02:36:44  Finished running 1 incremental model in 0 hours 0 minutes and 5.96 seconds (5.96s).
[0m02:36:44  
[0m02:36:44  [32mCompleted successfully[0m
[0m02:36:44  

- Materialized the data for dim_booth

In [6]:
!dbt run --select dim_booth.sql

[0m02:36:58  Running with dbt=1.8.2
An error occurred: module 'importlib.metadata' has no attribute 'packages_distributions'
[0m02:37:00  Registered adapter: bigquery=1.8.2
[0m02:37:00  Found 10 models, 1 seed, 30 data tests, 1 source, 596 macros
[0m02:37:00  
[0m02:37:02  Concurrency: 2 threads (target='dev')
[0m02:37:02  
[0m02:37:02  1 of 1 START sql incremental model mta_data.dim_booth .......................... [RUN]
[0m02:37:07  1 of 1 OK created sql incremental model mta_data.dim_booth ..................... [[32mMERGE (0.0 rows, 85.7 MiB processed)[0m in 4.59s]
[0m02:37:07  
[0m02:37:07  Finished running 1 incremental model in 0 hours 0 minutes and 6.15 seconds (6.15s).
[0m02:37:07  
[0m02:37:07  [32mCompleted successfully[0m
[0m02:37:07  
[0m02:37:07  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1


- Run all the tests in the project

In [7]:
!dbt test

[0m02:37:16  Running with dbt=1.8.2
An error occurred: module 'importlib.metadata' has no attribute 'packages_distributions'
[0m02:37:18  Registered adapter: bigquery=1.8.2
[0m02:37:18  Found 10 models, 1 seed, 30 data tests, 1 source, 596 macros
[0m02:37:18  
[0m02:37:19  Concurrency: 2 threads (target='dev')
[0m02:37:19  
[0m02:37:19  1 of 30 START test not_null_dim_booth_booth_id ................................. [RUN]
[0m02:37:19  2 of 30 START test not_null_dim_booth_booth_name ............................... [RUN]
[0m02:37:20  2 of 30 PASS not_null_dim_booth_booth_name ..................................... [[32mPASS[0m in 1.48s]
[0m02:37:20  3 of 30 START test not_null_dim_booth_remote ................................... [RUN]
[0m02:37:20  1 of 30 PASS not_null_dim_booth_booth_id ....................................... [[32mPASS[0m in 1.58s]
[0m02:37:20  4 of 30 START test not_null_dim_date_date_id ................................... [RUN]
[0m02:37:23  3 of 30 PA

- Run all the models using this pattern. 
  
> 👉 The run command materializes the data only. It does not run any test cases

```bash
$ dbt run --model <model.sql>
```

In [8]:
!dbt run --model fact_turnstile.sql

[0m02:37:59  Running with dbt=1.8.2
An error occurred: module 'importlib.metadata' has no attribute 'packages_distributions'
[0m02:38:00  Registered adapter: bigquery=1.8.2
[0m02:38:01  Found 10 models, 1 seed, 30 data tests, 1 source, 596 macros
[0m02:38:01  
[0m02:38:02  Concurrency: 2 threads (target='dev')
[0m02:38:02  
[0m02:38:02  1 of 1 START sql incremental model mta_data.fact_turnstile ..................... [RUN]
[0m02:38:08  1 of 1 OK created sql incremental model mta_data.fact_turnstile ................ [[32mMERGE (0.0 rows, 297.6 MiB processed)[0m in 6.23s]
[0m02:38:08  
[0m02:38:08  Finished running 1 incremental model in 0 hours 0 minutes and 7.52 seconds (7.52s).
[0m02:38:08  
[0m02:38:08  [32mCompleted successfully[0m
[0m02:38:08  
[0m02:38:08  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1


## show the documentation from the CLI
> Make sure to run this command in the same folder where the dbt_project.yml file is located

```bash
cd ./Step4-Data-Warehouse/dbt
dbt docs serve
```