Using local environment. 

This would create two new tables (Silver and Gold Layer) in Databricks cluster.

## Setup environment

In [None]:
%%capture
!pip install dbt
!apt-get --quiet install tree
!pip install ipython-sql
!pip install dbt-databricks
%reload_ext sql

In [None]:
PROJECT_NAME = "my_new_dbt_project"
HOST = "dbc-4bac4aa4-ed1b.cloud.databricks.com"
HTTP_PATH = "sql/protocolv1/o/4195076978496399/0702-053640-85snnqsk"
TOKEN = "dapi***"

In [None]:
# initiate a project
!dbt init $PROJECT_NAME

Running with dbt=0.21.1
Creating dbt configuration folder at /root/.dbt
With sample profiles.yml for postgres

Your new dbt project "my_new_dbt_project" was created! If this is your first time
using dbt, you'll need to set up your profiles.yml file -- this file will tell dbt how
to connect to your database. You can find this file by running:

  xdg-open /root/.dbt

For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:

  https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

  https://community.getdbt.com/

Happy modeling!

[0m

In [None]:
# go into the newly created directory
%cd $PROJECT_NAME

/content/my_new_dbt_project


In [None]:
profiles = f"""
default:
  outputs:
    dev:
      host: {HOST}
      http_path: {HTTP_PATH}
      schema: default
      threads: 1
      token: {TOKEN}
      type: databricks
  target: dev
"""

%store profiles > ~/.dbt/profiles.yml

Writing 'profiles' (str) to file '/root/.dbt/profiles.yml'.


In [None]:
!dbt debug

07:35:51  Running with dbt=1.1.1
dbt version: 1.1.1
python version: 3.7.13
python path: /usr/bin/python3
os info: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
Using profiles.yml file at /root/.dbt/profiles.yml
Using dbt_project.yml file at /content/my_new_dbt_project/dbt_project.yml

The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
Configuration:
  profiles.yml file [[32mOK found and valid[0m]
  dbt_project.yml file [[32mOK found and valid[0m]

Required dependencies:
 - git [[32mOK found[0m]

Connection:
  host: dbc-4bac4aa4-ed1b.cloud.databricks.com
  http_path: sql/protocolv1/o/4195076978496399/0702-053640-85snnqsk
  schema: default
  Connection test: [[32mOK connection ok[0m]

[32mAll checks passed![0m


## Create models

In [None]:
!rm -r ./models/example

In [None]:
%%writefile ./models/schema.yml
version: 2

models:
  - name: zzz_game_details
    columns:
      - name: game_id
        tests:
          - unique
          - not_null
      - name: home
        tests:
          - not_null
          - accepted_values:
              values: ['Amsterdam', 'San Francisco', 'Seattle']
      - name: visitor
        tests:
          - not_null
          - accepted_values:
              values: ['Amsterdam', 'San Francisco', 'Seattle']
      - name: home_score
        tests:
          - not_null
      - name: visitor_score
        tests:
          - not_null
      - name: winner
        tests:
          - not_null
          - accepted_values:
              values: ['Amsterdam', 'San Francisco', 'Seattle']
      - name: date
        tests:
          - not_null
  - name: zzz_win_loss_records
    columns:
      - name: team
        tests:
          - unique
          - not_null
          - relationships:
              to: ref('zzz_game_details')
              field: home
      - name: wins
        tests:
          - not_null
      - name: losses
        tests:
          - not_null

Writing ./models/schema.yml


In [None]:
%%writefile ./models/zzz_win_loss_records.sql
-- Create a view that summarizes the season's win and loss records by team.

-- Step 2 of 2: Calculate the number of wins and losses for each team.
select
  winner as team,
  count(winner) as wins,
  -- Each team played in 4 games.
  (4 - count(winner)) as losses
from (
  -- Step 1 of 2: Determine the winner and loser for each game.
  select
    game_id,
    winner,
    case
      when
        home = winner
          then
            visitor
      else
        home
    end as loser
  from {{ ref('zzz_game_details') }}
)
group by winner
order by wins desc

Writing ./models/zzz_win_loss_records.sql


In [None]:
%%writefile ./models/zzz_game_details.sql
{{ config(
  materialized='table',
  file_format='delta'
) }}

-- Step 4 of 4: Replace the visitor team IDs with their city names.
select
  game_id,
  home,
  t.team_city as visitor,
  home_score,
  visitor_score,
  -- Step 3 of 4: Display the city name for each game's winner.
  case
    when
      home_score > visitor_score
        then
          home
    when
      visitor_score > home_score
        then
          t.team_city
  end as winner,
  game_date as date
from (
  -- Step 2 of 4: Replace the home team IDs with their actual city names.
  select
    game_id,
    t.team_city as home,
    home_score,
    visitor_team_id,
    visitor_score,
    game_date
  from (
    -- Step 1 of 4: Combine data from various tables (for example, game and team IDs, scores, dates).
    select
      g.game_id,
      gop.home_team_id,
      gs.home_team_score as home_score,
      gop.visitor_team_id,
      gs.visitor_team_score as visitor_score,
      g.game_date
    from
      default.zzz_games as g,
      default.zzz_game_opponents as gop,
      default.zzz_game_scores as gs
    where
      g.game_id = gop.game_id and
      g.game_id = gs.game_id
  ) as all_ids,
    default.zzz_teams as t
  where
    all_ids.home_team_id = t.team_id
) as visitor_ids,
  default.zzz_teams as t
where
  visitor_ids.visitor_team_id = t.team_id
order by game_date desc

Overwriting ./models/zzz_game_details.sql


## Configure project

In [None]:
%%writefile dbt_project.yml
name: 'my_dbt_demo'
version: '1.0.0'
config-version: 2
profile: 'default'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target"
clean-targets:
  - "target"
  - "dbt_packages"
models:
  my_dbt_demo:
    example:
      +materialized: view

Overwriting dbt_project.yml


In [None]:
!dbt run --model models/zzz_game_details.sql models/zzz_win_loss_records.sql

07:42:27  Running with dbt=1.1.1
There are 1 unused configuration paths:
- models.my_dbt_demo.example

07:42:27  Found 2 models, 16 tests, 0 snapshots, 0 analyses, 218 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:42:27  
07:42:29  Concurrency: 1 threads (target='dev')
07:42:29  
07:42:29  1 of 2 START table model default.zzz_game_details .............................. [RUN]
07:42:45  1 of 2 OK created table model default.zzz_game_details ......................... [[32mOK[0m in 15.97s]
07:42:45  2 of 2 START view model default.zzz_win_loss_records ........................... [RUN]
07:42:46  2 of 2 OK created view model default.zzz_win_loss_records ...................... [[32mOK[0m in 1.44s]
07:42:47  
07:42:47  Finished running 1 table model, 1 view model in 19.56s.
07:42:47  
07:42:47  [32mCompleted successfully[0m
07:42:47  
07:42:47  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2


## Test models

In [None]:
!dbt test

07:52:21  Running with dbt=1.1.1
There are 1 unused configuration paths:
- models.my_dbt_demo.example

07:52:21  Found 2 models, 16 tests, 0 snapshots, 0 analyses, 218 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:52:21  
07:52:23  Concurrency: 1 threads (target='dev')
07:52:23  
07:52:23  1 of 16 START test accepted_values_zzz_game_details_home__Amsterdam__San_Francisco__Seattle  [RUN]
07:52:26  1 of 16 PASS accepted_values_zzz_game_details_home__Amsterdam__San_Francisco__Seattle  [[32mPASS[0m in 2.43s]
07:52:26  2 of 16 START test accepted_values_zzz_game_details_visitor__Amsterdam__San_Francisco__Seattle  [RUN]
07:52:27  2 of 16 PASS accepted_values_zzz_game_details_visitor__Amsterdam__San_Francisco__Seattle  [[32mPASS[0m in 1.19s]
07:52:27  3 of 16 START test accepted_values_zzz_game_details_winner__Amsterdam__San_Francisco__Seattle  [RUN]
07:52:28  3 of 16 PASS accepted_values_zzz_game_details_winner__Amsterdam__San_Francisco__Seattle  [[32mPASS[0m

## Generate docs

In [None]:
!dbt docs generate

07:46:47  Running with dbt=1.1.1
There are 1 unused configuration paths:
- models.my_dbt_demo.example

07:46:47  Found 2 models, 16 tests, 0 snapshots, 0 analyses, 218 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
07:46:47  
07:46:49  Concurrency: 1 threads (target='dev')
07:46:49  
07:46:49  Done.
07:46:49  Building catalog
07:46:49  Catalog written to /content/my_new_dbt_project/target/catalog.json


In [None]:
import portpicker
from google.colab.output import eval_js
port = portpicker.pick_unused_port()
print(port)
print(eval_js("google.colab.kernel.proxyPort({})".format(port)))

In [None]:
!dbt docs serve --port $port --no-browser

07:50:04  Running with dbt=1.1.1
07:50:04  Serving docs at 0.0.0.0:24038
07:50:04  To access from your browser, navigate to:  http://localhost:24038
07:50:04  
07:50:04  
07:50:04  Press Ctrl+C to exit.
127.0.0.1 - - [07/Jul/2022 07:50:10] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [07/Jul/2022 07:50:15] "GET /manifest.json?cb=1657180214358 HTTP/1.1" 200 -
127.0.0.1 - - [07/Jul/2022 07:50:15] "GET /catalog.json?cb=1657180214358 HTTP/1.1" 200 -
07:51:40  ctrl-c
