## Data Modeling

Use this notebook to run the dbt commands to build, test and publish the data warehouse models

### What is dbt?

dbt (data build tool) is tool that simplifies data transformation within your data warehouse, allowing you to build, test, and document data models using SQL.  It ensures data quality and consistency through a modular approach and facilitates collaboration between data engineers and analysts.

![Data modeling lineage](../../images/ozkary-data-engineering-process-data-warehouse-lineage.png)


In [1]:
# Make sure the dependencies are in the version you need
!dbt deps

[0m20:20:04  Running with dbt=1.4.5
[0m20:20:05  Installing dbt-labs/dbt_utils
[0m20:20:06    Installed from version 1.0.0
[0m20:20:06    Updated version available: 1.1.1
[0m20:20:06  
[0m20:20:06  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps


- Builds and tests the model and uses variables to allow for the full dataset to be created (false allows for the entire dataset to be materialized)

```bash
$ dbt build --select stg_station --var 'is_test_run: false'
$ dbt build --select dim_station 
```  

> 👉 **stg_station** to build the view, **dim_station** to build the table

**Understanding --model vs. --select**
- --model: Specifies a single model to run.
- --select: Specifies a pattern to match multiple models.

In [1]:
!dbt run --select dim_station.sql

[0m16:31:16  Running with dbt=1.8.4
[0m16:31:17  Registered adapter: bigquery=1.8.2
[0m16:31:18  Found 10 models, 1 seed, 30 data tests, 1 source, 585 macros
[0m16:31:18  
[0m16:31:19  Concurrency: 2 threads (target='dev')
[0m16:31:19  
[0m16:31:19  1 of 1 START sql incremental model mta_data.dim_station ........................ [RUN]
[0m16:31:24  1 of 1 OK created sql incremental model mta_data.dim_station ................... [[32mMERGE (0.0 rows, 83.8 MiB processed)[0m in 4.86s]
[0m16:31:24  
[0m16:31:24  Finished running 1 incremental model in 0 hours 0 minutes and 6.16 seconds (6.16s).
[0m16:31:24  
[0m16:31:24  [32mCompleted successfully[0m
[0m16:31:24  
[0m16:31:24  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1


- Materialized the data for dim_booth

In [2]:
!dbt run --select dim_booth.sql

[0m16:31:40  Running with dbt=1.8.4
[0m16:31:42  Registered adapter: bigquery=1.8.2
[0m16:31:42  Found 10 models, 1 seed, 30 data tests, 1 source, 585 macros
[0m16:31:42  
[0m16:31:44  Concurrency: 2 threads (target='dev')
[0m16:31:44  
[0m16:31:44  1 of 1 START sql incremental model mta_data.dim_booth .......................... [RUN]
[0m16:31:48  1 of 1 OK created sql incremental model mta_data.dim_booth ..................... [[32mMERGE (0.0 rows, 83.8 MiB processed)[0m in 4.93s]
[0m16:31:48  
[0m16:31:48  Finished running 1 incremental model in 0 hours 0 minutes and 6.35 seconds (6.35s).
[0m16:31:49  
[0m16:31:49  [32mCompleted successfully[0m
[0m16:31:49  
[0m16:31:49  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1


- Run all the tests in the project

In [4]:
!dbt test

[0m23:56:15  Running with dbt=1.4.5
[0m23:56:15  Found 8 models, 21 tests, 0 snapshots, 0 analyses, 449 macros, 0 operations, 1 seed file, 1 source, 0 exposures, 0 metrics
[0m23:56:15  
[0m23:56:15  Concurrency: 2 threads (target='dev')
[0m23:56:15  
[0m23:56:15  1 of 21 START test not_null_dim_booth_booth_id ................................. [RUN]
[0m23:56:15  2 of 21 START test not_null_dim_booth_booth_name ............................... [RUN]
[0m23:56:16  1 of 21 PASS not_null_dim_booth_booth_id ....................................... [[32mPASS[0m in 0.88s]
[0m23:56:16  3 of 21 START test not_null_dim_booth_remote ................................... [RUN]
[0m23:56:16  2 of 21 PASS not_null_dim_booth_booth_name ..................................... [[32mPASS[0m in 0.98s]
[0m23:56:16  4 of 21 START test not_null_dim_station_station_id ............................. [RUN]
[0m23:56:17  3 of 21 PASS not_null_dim_booth_remote ......................................... [[32

- Run all the models using this pattern. 
  
> 👉 The run command materializes the data only. It does not run any test cases

```bash
$ dbt run --model <model.sql>
```

In [3]:
!dbt run --model fact_turnstile.sql

[0m16:32:17  Running with dbt=1.8.4
[0m16:32:18  Registered adapter: bigquery=1.8.2
[0m16:32:19  Found 10 models, 1 seed, 30 data tests, 1 source, 585 macros
[0m16:32:19  
[0m16:32:20  Concurrency: 2 threads (target='dev')
[0m16:32:20  
[0m16:32:20  1 of 1 START sql incremental model mta_data.fact_turnstile ..................... [RUN]
[0m16:32:27  1 of 1 OK created sql incremental model mta_data.fact_turnstile ................ [[32mMERGE (988.3k rows, 255.7 MiB processed)[0m in 6.73s]
[0m16:32:27  
[0m16:32:27  Finished running 1 incremental model in 0 hours 0 minutes and 8.09 seconds (8.09s).
[0m16:32:27  
[0m16:32:27  [32mCompleted successfully[0m
[0m16:32:27  
[0m16:32:27  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1


## show the documentation from the CLI

```bash
cd ./Step4-Data-Warehouse
dbt docs serve
```