## Data Modeling

Use this notebook to run the dbt commands to build, test and publish the data warehouse models

### What is dbt?

dbt (data build tool) is tool that simplifies data transformation within your data warehouse, allowing you to build, test, and document data models using SQL.  It ensures data quality and consistency through a modular approach and facilitates collaboration between data engineers and analysts.

![Data modeling lineage](../../images/ozkary-data-engineering-process-data-warehouse-lineage.png)


In [1]:
# Make sure the dependencies are in the version you need
!dbt deps

[0m20:20:04  Running with dbt=1.4.5
[0m20:20:05  Installing dbt-labs/dbt_utils
[0m20:20:06    Installed from version 1.0.0
[0m20:20:06    Updated version available: 1.1.1
[0m20:20:06  
[0m20:20:06  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps


- Builds and tests the model and uses variables to allow for the full dataset to be created (false allows for the entire dataset to be materialized)

```bash
$ dbt build --select stg_station --var 'is_test_run: false'
$ dbt build --select dim_station 
```  

> 👉 **stg_station** to build the view, **dim_station** to build the table

In [5]:
!dbt build --select dim_station.sql

[0m20:55:43  Running with dbt=1.4.5
[0m20:55:43  Found 8 models, 21 tests, 0 snapshots, 0 analyses, 449 macros, 0 operations, 1 seed file, 2 sources, 0 exposures, 0 metrics
[0m20:55:43  
[0m20:55:45  Concurrency: 2 threads (target='dev')
[0m20:55:45  
[0m20:55:45  1 of 6 START sql incremental model mta_data.dim_station ........................ [RUN]
[0m20:55:50  1 of 6 OK created sql incremental model mta_data.dim_station ................... [[32mMERGE (2.0 rows, 234.7 MB processed)[0m in 4.68s]
[0m20:55:50  2 of 6 START test not_null_dim_station_station_id .............................. [RUN]
[0m20:55:50  3 of 6 START test not_null_dim_station_station_name ............................ [RUN]
[0m20:55:51  3 of 6 PASS not_null_dim_station_station_name .................................. [[32mPASS[0m in 1.31s]
[0m20:55:51  4 of 6 START test unique_dim_station_station_id ................................ [RUN]
[0m20:55:51  2 of 6 PASS not_null_dim_station_station_id .........

- Materialized the data for dim_booth

In [9]:
!dbt build --select dim_booth.sql

[0m21:39:09  Running with dbt=1.4.5
[0m21:39:09  Found 8 models, 21 tests, 0 snapshots, 0 analyses, 449 macros, 0 operations, 1 seed file, 2 sources, 0 exposures, 0 metrics
[0m21:39:09  
[0m21:39:10  Concurrency: 2 threads (target='dev')
[0m21:39:10  
[0m21:39:10  1 of 7 START sql incremental model mta_data.dim_booth .......................... [RUN]
[0m21:39:15  1 of 7 OK created sql incremental model mta_data.dim_booth ..................... [[32mMERGE (0.0 rows, 299.1 MB processed)[0m in 4.62s]
[0m21:39:15  2 of 7 START test not_null_dim_booth_booth_id .................................. [RUN]
[0m21:39:15  3 of 7 START test not_null_dim_booth_booth_name ................................ [RUN]
[0m21:39:16  3 of 7 PASS not_null_dim_booth_booth_name ...................................... [[32mPASS[0m in 1.44s]
[0m21:39:16  4 of 7 START test not_null_dim_booth_remote .................................... [RUN]
[0m21:39:16  2 of 7 PASS not_null_dim_booth_booth_id .............

- Run all the tests in the project

In [2]:
!dbt test

[0m20:46:15  Running with dbt=1.4.5
[0m20:46:15  Found 8 models, 21 tests, 0 snapshots, 0 analyses, 449 macros, 0 operations, 1 seed file, 2 sources, 0 exposures, 0 metrics
[0m20:46:15  
[0m20:46:16  Concurrency: 2 threads (target='dev')
[0m20:46:16  
[0m20:46:16  1 of 21 START test not_null_dim_booth_booth_id ................................. [RUN]
[0m20:46:16  2 of 21 START test not_null_dim_booth_booth_name ............................... [RUN]
[0m20:46:17  2 of 21 PASS not_null_dim_booth_booth_name ..................................... [[32mPASS[0m in 1.61s]
[0m20:46:17  3 of 21 START test not_null_dim_booth_remote ................................... [RUN]
[0m20:46:18  1 of 21 PASS not_null_dim_booth_booth_id ....................................... [[32mPASS[0m in 1.74s]
[0m20:46:18  4 of 21 START test not_null_dim_station_station_id ............................. [RUN]
[0m20:46:19  4 of 21 PASS not_null_dim_station_station_id ................................... [[3

- Run all the models using this pattern. 
  
> 👉 The run command materializes the data only. It does not run any test cases

```bash
$ dbt run --model <model.sql>
```

In [1]:
!dbt run --model fact_turnstile.sql

[0m16:44:34  Running with dbt=1.4.5
[0m16:44:36  Found 8 models, 21 tests, 0 snapshots, 0 analyses, 449 macros, 0 operations, 1 seed file, 2 sources, 0 exposures, 0 metrics
[0m16:44:36  
[0m16:44:39  Concurrency: 2 threads (target='dev')
[0m16:44:39  
[0m16:44:39  1 of 1 START sql incremental model mta_data.fact_turnstile ..................... [RUN]
[0m16:44:47  1 of 1 OK created sql incremental model mta_data.fact_turnstile ................ [[32mMERGE (250.8k rows, 434.8 MB processed)[0m in 7.82s]
[0m16:44:47  
[0m16:44:47  Finished running 1 incremental model in 0 hours 0 minutes and 11.20 seconds (11.20s).
[0m16:44:47  
[0m16:44:47  [32mCompleted successfully[0m
[0m16:44:47  
[0m16:44:47  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
