![Title](images/title-page.png)

![Title](images/title-qr.png)

# What is the TPC?
### The TPC is a non-profit corporation focused on developing data-centric benchmark standards and disseminating objective, verifiable data to the industry.

# What is TPC-DI?
### The TPC-DI benchmark combines and transforms data extracted from an On-Line Transaction Processing (OTLP) system along with other sources of data, and loads it into a data warehouse.

![ETL Diagram](images/tpc-di-etl-diagram.png)

![ETL Diagram](images/tpc-di-logical-model.png)

# First we generate the source files.
### The JAR file is dated, so we have to use a 1.8 JDK.

In [12]:
!jenv local 1.8
!java -jar ~/dev/Tools/DIGen.jar --help

usage: DIGen
 -h                   print this message
 -jvm <JVM options>   JVM options. E.g. -jvm "-Xms1g -Xmx2g"
 -o <directory>       Specify output directory.  Default is output.
 -sf <sf>             Scale factor.  Default value is 5. (range: 3 -
                      2147483647
 -v                   print DIGen version


In [22]:
!rm -rf ~/dev/tpcdi-output
!mkdir -p ~/dev/tpcdi-output
!cd ~/dev/Tools && java -jar ~/dev/Tools/DIGen.jar -o ~/dev/tpcdi-output -sf 5

/Users/stewartbryson/dev/tpcdi-output
########################################################################################################################
                                                  PDGF v2.5_#1343_b4177
                                            Parallel Data Generation Framework
                (c)bankmark UG (haftungsbeschraenkt), Frank M., Danisch M., Rabl T. http://www.bankmark.de
########################################################################################################################
                                                   License information
                            The Software is provided to you as part of the TPC Benchmark DI. 
 When using this software you must agree to the license provided in LICENSE.TXT of this package. Use is restricted to TPC
DI benchmarking purposes as specified in LICENSE.TXT. If you would like to use the software for other purposes, you must
contact bankmark UG (haftungsbeschraenkt) (http://www.

# When we search on Google for "dbt dynamic tables":

![Google Search](images/dbt-dynamic-tables.png)

# Is it as simple as this?

![Conflict](images/refresh-conflict.png)

# Remember, there's more to dbt than just scheduling refresh jobs. There's a DAG to consider.

# Dynamic Tables need to be (re)created in the correct order. This can become very complex as the number of tables and dependencies increases.

In [2]:
!dbt docs generate
!dbt docs serve

[0m16:40:18  Running with dbt=1.6.6
[0m16:40:18  Registered adapter: snowflake=1.6.4
[0m16:40:18  Unable to do partial parsing because of a version mismatch
[0m16:40:19  Found 45 models, 17 sources, 0 exposures, 0 metrics, 489 macros, 0 groups, 0 semantic models
[0m16:40:19  
[0m16:40:20  Concurrency: 20 threads (target='dev')
[0m16:40:20  
[0m16:40:20  Building catalog
[0m16:40:26  Catalog written to /Users/stewartbryson/Source/dbt-tpcdi/target/catalog.json
[0m16:40:27  Running with dbt=1.6.6
Serving docs at 8080
To access from your browser, navigate to: http://localhost:8080



Press Ctrl+C to exit.
127.0.0.1 - - [27/Nov/2023 11:40:28] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/Nov/2023 11:40:28] "GET /manifest.json?cb=1701103228480 HTTP/1.1" 200 -
127.0.0.1 - - [27/Nov/2023 11:40:28] "GET /catalog.json?cb=1701103228480 HTTP/1.1" 200 -
^C
[0m16:41:38  Encountered an error:

[0m16:41:38  Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/tp

# Let's take a look at a standard dbt project, using the TPC-DI dataset.

# I'll just pull the `dbt_project.yml` file from another branch:

In [27]:
!git restore --source standard-tables -- dbt_project.yml
!dbt build

[0m23:19:20  Running with dbt=1.6.6
[0m23:19:20  Registered adapter: snowflake=1.6.4
[0m23:19:20  Found 45 models, 1 test, 17 sources, 0 exposures, 0 metrics, 489 macros, 0 groups, 0 semantic models
[0m23:19:20  
[0m23:19:22  Concurrency: 20 threads (target='dev')
[0m23:19:22  
[0m23:19:22  1 of 45 START sql table model dl_bronze.brokerage_cash_transaction ............. [RUN]
[0m23:19:22  2 of 45 START sql table model dl_bronze.brokerage_daily_market ................. [RUN]
[0m23:19:22  3 of 45 START sql table model dl_bronze.brokerage_holding_history .............. [RUN]
[0m23:19:22  4 of 45 START sql table model dl_bronze.brokerage_trade ........................ [RUN]
[0m23:19:22  5 of 45 START sql table model dl_bronze.brokerage_trade_history ................ [RUN]
[0m23:19:22  6 of 45 START sql table model dl_bronze.brokerage_watch_history ................ [RUN]
[0m23:19:22  7 of 45 START sql table model dl_bronze.crm_customer_mgmt ...................... [RUN]
[0m23:1

# Now we'll restore our original `dbt_project.yml` file.

In [28]:
!git checkout dbt_project.yml

Updated 1 path from the index


# We can see all that's required to enable dynamic tables in our `dbt_project.yml` file:

```yaml
models:
  dbt_tpcdi:
    example:
      +materialized: view
    bronze:
      +schema: bronze
      +materialized: dynamic_table
      +snowflake_warehouse: tpcdi_xlarge
      +target_lag: '10 minutes'
    silver:
      +schema: silver
      +materialized: dynamic_table
      +snowflake_warehouse: tpcdi_xlarge
      +target_lag: '10 minutes'
    gold:
      +schema: gold
      +materialized: dynamic_table
      +snowflake_warehouse: tpcdi_xlarge
      +target_lag: '20 minutes'
    work:
      +schema: work
      +materialized: dynamic_table
      +snowflake_warehouse: tpcdi_xlarge
      +target_lag: downstream
```

In [29]:
!dbt build

[0m23:23:01  Running with dbt=1.6.6
[0m23:23:01  Registered adapter: snowflake=1.6.4
[0m23:23:01  Unable to do partial parsing because a project config has changed
[0m23:23:02  Found 45 models, 1 test, 17 sources, 0 exposures, 0 metrics, 489 macros, 0 groups, 0 semantic models
[0m23:23:02  
[0m23:23:05  Concurrency: 20 threads (target='dev')
[0m23:23:05  
[0m23:23:05  1 of 46 START sql dynamic_table model dl_bronze.brokerage_cash_transaction ..... [RUN]
[0m23:23:05  2 of 46 START sql dynamic_table model dl_bronze.brokerage_daily_market ......... [RUN]
[0m23:23:05  3 of 46 START sql dynamic_table model dl_bronze.brokerage_holding_history ...... [RUN]
[0m23:23:05  4 of 46 START sql dynamic_table model dl_bronze.brokerage_trade ................ [RUN]
[0m23:23:05  5 of 46 START sql dynamic_table model dl_bronze.brokerage_trade_history ........ [RUN]
[0m23:23:05  6 of 46 START sql dynamic_table model dl_bronze.brokerage_watch_history ........ [RUN]
[0m23:23:05  7 of 46 START s

Click this link to open results:

[Snowflake UI](https://app.snowflake.com/cxmdykz/hib36835/#/data/databases/TPCDI_DT)

# dbt also has Tests.

# We can run them when we create the Dynamic Table:

In [30]:
!dbt build --select fact_trade

[0m23:28:07  Running with dbt=1.6.6
[0m23:28:08  Registered adapter: snowflake=1.6.4
[0m23:28:08  Found 45 models, 1 test, 17 sources, 0 exposures, 0 metrics, 489 macros, 0 groups, 0 semantic models
[0m23:28:08  
[0m23:28:10  Concurrency: 20 threads (target='dev')
[0m23:28:10  
[0m23:28:10  1 of 2 START sql dynamic_table model dl_gold.fact_trade ........................ [RUN]
[0m23:28:12  1 of 2 OK created sql dynamic_table model dl_gold.fact_trade ................... [[32mSUCCESS 1[0m in 2.19s]
[0m23:28:12  2 of 2 START test fact_trade__unique_trade ..................................... [RUN]
[0m23:28:12  2 of 2 PASS fact_trade__unique_trade ........................................... [[32mPASS[0m in 0.66s]
[0m23:28:12  
[0m23:28:12  Finished running 1 dynamic_table model, 1 test in 0 hours 0 minutes and 4.60 seconds (4.60s).
[0m23:28:12  
[0m23:28:12  [32mCompleted successfully[0m
[0m23:28:12  
[0m23:28:12  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2


# Or we can schedule them to run periodically:

In [31]:
!dbt test

[0m23:29:18  Running with dbt=1.6.6
[0m23:29:18  Registered adapter: snowflake=1.6.4
[0m23:29:19  Found 45 models, 1 test, 17 sources, 0 exposures, 0 metrics, 489 macros, 0 groups, 0 semantic models
[0m23:29:19  
[0m23:29:19  Concurrency: 20 threads (target='dev')
[0m23:29:19  
[0m23:29:19  1 of 1 START test fact_trade__unique_trade ..................................... [RUN]
[0m23:29:20  1 of 1 PASS fact_trade__unique_trade ........................................... [[32mPASS[0m in 0.57s]
[0m23:29:20  
[0m23:29:20  Finished running 1 test in 0 hours 0 minutes and 1.29 seconds (1.29s).
[0m23:29:20  
[0m23:29:20  [32mCompleted successfully[0m
[0m23:29:20  
[0m23:29:20  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1


# dbt Cloud will need to be more than just a job scheduler, which it already is.

1. Cloud development environment for those that prefer it (Needs to get better).
1. CI/CD workflows for promoting Dynamic Table changes into Production.
1. Perhaps there's promise in the Semantic Layer.

# Clean-up

In [1]:
!python tpcdi.py drop-schema --schema dl_gold
!python tpcdi.py drop-schema --schema dl_silver
!python tpcdi.py drop-schema --schema dl_bronze
!python tpcdi.py drop-schema --schema dl_work

Schema dl_gold dropped.
Schema dl_silver dropped.
Schema dl_bronze dropped.
Schema dl_work dropped.
