# dbt CLI and Amazon Redshift

This workshop contains a series of hands-on labs to help you get started on dbt CLI with Amazon Redshift. Each lab is designed to focus on a particular feature in dbt at an introductory level.

Features covered in this workshop will help you gain an understanding of dbt and how it can be used to manage data transformations in Amazon Redshift with features including modular programming and data lineage documentation.

## Environment Setup

Tip: Instead of this setup, you can also use Redshift Serverless and local machine instead. It will be a lot cheaper and faster.

In [None]:
!aws s3 sync cfn s3://wysde-assets/cfn

In [8]:
!aws cloudformation create-stack --stack-name dbt-redshift-workshop \
            --template-url https://wysde-assets.s3.us-east-1.amazonaws.com/cfn/dbt-redshift.json \
            --capabilities CAPABILITY_NAMED_IAM \
            --parameters \
            ParameterKey=Cloud9LoginUser,ParameterValue=arn:aws:iam::684199068947:user/sparsh \
            --region us-east-2

{
    "StackId": "arn:aws:cloudformation:us-east-2:684199068947:stack/dbt-redshift-workshop/f1298c20-6d95-11ed-b1bd-024ac3f5b3dc"
}


## Setup dbt

### Verify data set

As part of this workshop, you will be using TICKIT sample data set. Before starting on the other labs, let's verify that TICKIT sample data set is available in your Amazon Redshift cluster.

1. If TICKIT sample data set is not available in your Amazon Redshift cluster, follow the steps in [this](https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-create-sample-db.html) guide to load it.
2. In Amazon Redshift Query Editor V2, on the left navigation panel, Connect to database with user id and password. You will find these credentials in the AWS Secret Manager.
3. Check that the following 7 tables are in your cluster under database dev and schema public.
    ```
    users
    venue
    category
    date
    event
    listing
    sales
    ```
4. You can also query a table to verify the data.
    ```sql
    select * from dev.public.category order by catid;
    ```

### Install dbt

As part of this workshop, you will be using Python for both dbt and general automation scripting.

1. In AWS Cloud9, select Open IDE.
2. Run pip3 install dbt-core and pip3 install dbt-redshift using a terminal to install both Python packages for dbt.
    ```
    pip3 install dbt-core
    pip3 install dbt-redshift
    ```
3. Post installation, you can run dbt --version to show installed version.
    ```
    dbt --version
    ```

### Create dbt project

Creating a dbt project requires you to provide a number of inputs. Following are values used by this workshop if you've used provided AWS CloudFormation template to satisfy the prerequisites. Change the values accordingly to reflect your environment.

| Key                   | Value                | Comment                |
| --------------------- | -------------------- | ---------------------- |
| Project name          | demo\_project        |                        |
| Database              | redshift             |                        |
| Redshift host         |                      | Check your environment |
| Redshift port         | 5439                 | Check your environment |
| Redshift user         | admin                | Check your environment |
| Authentication method | iam                  | Check your environment |
| Default database      | dev                  | Check your environment |
| Default schema        | dbt                  |                        |
| Threads               | 1                    |                        |
| Cluster ID            | dbt-redshift-cluster | Check your environment |

For Redshift host, Redshift port, Redshift user, Default database, and Cluster ID, you can get these values from Amazon Redshift console.

You can also choose to authenticate with password by indicating password option for authentication method.

**Steps**

1. In AWS Cloud9, run dbt init to start creating your dbt project. After creating demo_project, in the left navigation panel, you can expand demo_project to view the default structures and contents of a dbt project.
    ```
    dbt init
    ```
2. You might have noticed as part of dbt init's output is the mention of profiles.yml. This file stores the inputs you provided during dbt init. You have to manually add cluster_id into profiles.yml. If you are comfortable with using vim, you can use it to edit profiles.yml.
3. Alternatively, in AWS Cloud9, as profiles.yml is in a hidden folder located under your home directory, you can run npm install -g c9 to install c9 and use it to open profiles.yml by running c9 ~/.dbt/profiles.yml. Remember to save profiles.yml after adding cluster_id.
    ```
    npm install --location=global c9
    c9 ~/.dbt/profiles.yml
    ```
4. After cluster_id configuration is added into profiles.yml, you can run dbt debug to test the connection. You have to be in demo_project when running the command. Note: If you encounter an error after running dbt debug, check that you have 1) completed step 2 to add cluster_id into profiles.yml and 2) configured inbound rule for your Amazon Redshift cluster to accept inbound traffic from AWS Cloud9 or from your local environment.
    ```
    dbt debug
    ```
5. When dealing with multiple environments - development, staging, and production - you can have multiple profiles configured in profiles.yml. In this example, a single cluster is used and multiple environments are differentiated at the database level. Note: This is an example and there is no need to configure profiles.yml to reflect this for the workshop.
    ```
   + demo_project:
   +   outputs:
   +     dev:
   +       cluster_id: YYY-cluster
   +       dbname: dev
   +       host: YYY-cluster.YYY.ap-southeast-1.redshift.amazonaws.com
   +       method: iam
   +       port: 5439
   +       schema: dbt
   +       threads: 1
   +       type: redshift
   +       user: YYY
   +     stg:
   +       cluster_id: YYY-cluster
   +       dbname: stg
   +       host: YYY-cluster.YYY.ap-southeast-1.redshift.amazonaws.com
   +       method: iam
   +       port: 5439
   +       schema: dbt
   +       threads: 1
   +       type: redshift
   +       user: YYY
   +     prd:
   +       cluster_id: YYY-cluster
   +       dbname: prd
   +       host: YYY-cluster.YYY.ap-southeast-1.redshift.amazonaws.com
   +       method: iam
   +       port: 5439
   +       schema: dbt
   +       threads: 1
   +       type: redshift
   +       user: YYY
   +   target: dev
   ```

## Setup base layer

In this lab, you will setup a base layer for your models to reference. What is a base layer and what are models? Models are SQL Select statements that represents your data transformation logic including usage of case statements and joins. Base layer falls under the category of models and represents existing objects (tables and views) in your Amazon Redshift cluster.

In addition, depending on the nature of your dbt project, a base layer can be made up of different objects. For instance, a data engineer's base layer likely relates to tables containing raw data while a data analyst's base layer likely relates to tables containing cleaned data.

### Semi automated

The first area you can automate is the writing of SQL Select statements to represent tables in schema public. Manually writing is time consuming and prone to accidental mistakes. An automated approach to generate the required SQL Select statements is available using a package created by dbt-labs.

1. To use dbt-labs/codegen package, create packages.yml under demo_project containing dbt-labs/codegen package information. Run provided curl command to download precreated packages.yml into the right location.
   ```
   curl 'https://static.us-east-1.prod.workshops.aws/public/ba38c170-1f4d-4335-b8ce-bdbd4d76abff/static/lab_2a/code/packages.yml' --output packages.yml
    ```
2. After packages.yml is created, install dbt-labs/codegen package by running dbt deps. The output indicates installation of dbt-labs/codegen and dbt-labs/dbt_utils. If this is your first time installing a dbt package, dbt_packages folder will be created and is the location where installed dbt packages will be saved.
    ```
    dbt deps
    ```
3. After dbt-labs/codegen package is installed, run dbt run-operation generate_source --args "{'schema_name': 'public'}" to generate schema information. This is useful as it tells you how many tables are in schema public for inclusion as models under the base layer. Schema information will be displayed in YAML.
    ```
    dbt run-operation generate_source --args "{'schema_name': 'public'}"
    ```
4. To save generated schema information and SQL Select statements, create folder base_public under demo_project/models. Note: You can change base_public to a name of your choice but for this workshop, continue with base_public. Note: There is another folder named example. example folder and its contents are created as part of dbt init and can be deleted. For this workshop, leave it as it is.
    ```
    mkdir models/base_public
    ```
5. Save generated schema information into a file called schema.yml under demo_project/models/base_public. Include version: 2 at the start of schema.yml as it is a requirement of dbt;
6. With schema.yml created, run dbt run-operation generate_base_model --args "{'source_name': 'public', 'table_name': 'category'}" to generate SQL Select statement for table category in schema public. This achieves the objective of automating writing of SQL Select statements for tables in schema public.
    ```
    dbt run-operation generate_base_model --args "{'source_name': 'public', 'table_name': 'category'}"
    ```
7. Save generated SQL Select statement into a file called base_public_category.sql under demo_project/models/base_public.
8. You need to manually repeat this process for the remaining tables in schema public. You can skip this manual step. Later, you will use a Python script is used to further automate the setup of a base layer.
9.  Update dbt_project.yml to configure this new layer. The concept of materialized is important and for base_public, configure materialized as ephemeral.

Briefly explaining, dbt provides four options to configure materialization:

- view - result in a dbt model being created as a view
- table - result in a dbt model being created as a table with full refresh
- incremental - result in a dbt model being created as a table with incremental refresh
- emphemeral - does not result in a dbt model being created as a view or table and is used for referencing

Note: example configuration is created as part of dbt init and can be deleted. For this workshop, leave it as it is.

### Fully automated

To fully automate process of setting up a base layer, use a language of your preference to iteratively carry out the process. A Python script `generate_base_tables.py` is available and you can modified it to better suit your needs beyond the scope of this workshop.

1. Run generate_base_tables.py with the required arguments
   1. 1st argument - name of your dbt project
   2. 2nd argument - name of schema in your Amazon Redshift cluster
2. After generate_base_tables.py run completes, you can check that remaining models under demo_project/models/base_public are created.
3. Note: generate_base_tables.py is located outside of demo_project as it is meant to be a generic automation script that can be used to build the base layer of any dbt projects. To run generate_base_tables.py, navigate to where it is located.
   ```
   python3 generate_base_tables.py demo_project public
   ```

## Create models

To explore dbt's ability for objects to be referenced by other objects, in this lab, you will simulate a Finance deparment that maintains two models where the second model references the first model. To reference means to reuse and not have to duplicate code.

- Model 1 - Quarterly Total Sales By Event
- Model 2 - Quarterly Top Events By Sales (references Model 1 to rank and filter for top 3 events by sales for each quarter)


1. Create folder dept_finance, under demo_project/models, to represent Finance models layer.
    ```
    mkdir models/dept_finance
    ```
2. Create file rpt_finance_qtr_total_sales_by_event.sql, under demo_project/models/dept_finance, containing SQL Select statement for Model 1. In addition, instead of indicating table names directly, adopt dbt's syntax of {{ref('file_name')}} to reference models in demo_project/models/base_public.
   ```sql
    select
        date_part('year', a.saletime) as year,
        date_part('quarter', a.saletime) as quarter,
        b.eventname,
        count(a.salesid) as sales_made,
        sum(a.pricepaid) as sales_revenue,
        sum(a.commission) as staff_commission,
        staff_commission / sales_revenue as commission_pcnt
    from {{ref('base_public_sales')}} a
    left join {{ref('base_public_event')}} b on a.eventid = b.eventid
    group by
        year,
        quarter,
        b.eventname
    order by
        year,
        quarter,
        b.eventname
   ```
3. Create file rpt_finance_qtr_top_events_by_sales.sql, under demo_project/models/dept_finance, containing SQL Select statement for Model 2 which references Model 1.
   ```sql
    select *
    from
    (
        select
            *,
            rank() over (partition by year, quarter order by sales_revenue desc) as row_num
        from {{ref('rpt_finance_qtr_total_sales_by_event')}}
    )
    where row_num <= 3
   ```
4. After Finance related folder and files are created in models folders, update dbt_project.yml to configure this new model layer. For dept_finance, configure materialized as view.
5. Run dbt run --models dept_finance --target dev to create schema (dbt_dept_finance) and views (rpt_finance_qtr_total_sales_by_event and rpt_finance_qtr_top_events_by_sales) for Finance model layer.
   ```
   dbt run --models dept_finance --target dev
   ```
6. You can query dbt_dept_finance.rpt_finance_qtr_total_sales_by_event and dbt_dept_finance.rpt_finance_qtr_top_events_by_sales to verify that both views are created in your targeted Amazon Redshift cluster.
   ```sql
   select * from dbt.rpt_finance_qtr_total_sales_by_event;
   select * from dbt.rpt_finance_qtr_top_events_by_sales order by year, quarter;
   ```

## Create macros

In this lab, you will use macros to create a piece of reusable data transformation logic and also to manage users and grants. Macros are a great way in dbt to create reusable pieces of SQL codes like a function in Python. 

### Macros 101

Are you new to dbt's macro and Jinja? This lab covers a series of macro examples that aims to help you understand a specific piece of Jinja code by running a sample macro and relating the Jinja code to its output.

File macro_hello_world.sql, under demo_project/macros, containing macro examples.

1. Run `dbt run-operation macro_hello_world_1` to showcase log which is used to print a defined output.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_1 #}
    {% macro macro_hello_world_1() %}

        {{ log('Congrats on running your 1st macro in dbt', True) }}

    {% endmacro %}
    ```
2. Run `dbt run-operation macro_hello_world_2` to showcase set, [], and for which are used to create a variable, represent a list and loop a list respectively. It is interesting to note that when a variable defined outside of a loop is updated inside the loop, the variable retains its original value.
   ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_2 #}
    {% macro macro_hello_world_2() %}

        {{ log('Congrats on running your 2nd macro in dbt', True) }}

        {% set external_var = False %}

        {{ log('external_var before loop: ' ~ external_var, True) }}

        {% for i in [1]%}
            {% set external_var = True %}
            {{ log('external_var within loop: ' ~ external_var, True) }}
        {% endfor %}

        {{ log('external_var after  loop: ' ~ external_var, True) }}

    {% endmacro %}
   ```
3. Run `dbt run-operation macro_hello_world_3` to showcase namespace() which is used to hold variables created outside of a loop that when updated inside a loop, reflects the updated value. Think of variables held in namespace as global variables.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_3 #}
    {% macro macro_hello_world_3() %}

        {{ log('Congrats on running your 3rd macro in dbt', True) }}

        {% set ns = namespace() %}

        {% set ns.namespace_var = False %}

        {{ log('namespace_var before loop: ' ~ ns.namespace_var, True) }}

        {% for i in [1]%}
            {% set ns.namespace_var = True %}
            {{ log('namespace_var within loop: ' ~ ns.namespace_var, True) }}
        {% endfor %}

        {{ log('namespace_var after  loop: ' ~ ns.namespace_var, True) }}

    {% endmacro %}
    ```
4. Run `dbt run-operation macro_hello_world_4` to showcase if, elif, and else which are used to test conditions.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_4 #}
    {% macro macro_hello_world_4() %}

        {{ log('Congrats on running your 4th macro in dbt', True) }}

        {% for i in [1, 2, 3]%}
            {% if i == 1 %}
                {{ log('if   i == 1, A', True) }}
            {% elif i == 2 %}
                {{ log('elif i == 2, B', True) }}
            {% else %}
                {{ log('else i == 3, C', True) }}
            {% endif %}
        {% endfor %}

    {% endmacro %}
    ```
5. Run `dbt run-operation macro_hello_world_5` to showcase run_query which sends a direct query to Amazon Redshift for processing.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_5 #}
    {% macro macro_hello_world_5() %}

        {{ log('Congrats on running your 5th macro in dbt', True) }}

        {% set test_sql %}
            select 'row_1_col_1' as col_1, 'row_1_col_2' as col_2 union select 'row_2_col_1' as col_1, 'row_2_col_2' as col_2;
        {% endset %}

        {% set results = run_query(test_sql) %}

        {% if execute %}
            {% for row in results.rows %}
                {{ log('Query record: ' ~ row.col_1 ~ ' ' ~ row.col_2, True) }}
            {% endfor %}
        {% endif %}

    {% endmacro %}
    ```
6. Run `dbt run-operation macro_hello_world_6 --args "{'input': 'ABC'}"` to showcase the defining and passing of an input.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_6 --args "{input: 'ABC'}" #}
    {% macro macro_hello_world_6(input) %}

        {{ log('Congrats on running your 6th macro in dbt', True) }}

        {{ log('Input parameter value: ' ~ input, True) }}

    {% endmacro %}
    ```
7. Run `dbt run-operation macro_hello_world_7` to showcase the usage of dictionary data structure.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_7 #}
    {% macro macro_hello_world_7(input) %}

        {{ log('Congrats on running your 7th macro in dbt', True) }}

        {% set dict_var = {'key_1': 'val_1', 'key_2': 'val_2'} %}

        {% for key in dict_var.keys() %}
            {{ log('dict_var key and value: ' ~ key ~ ' ' ~ dict_var[key], True) }}
        {% endfor %}

    {% endmacro %}
    ```
8. Run `dbt run-operation macro_hello_world_8`, try to infer what this macro is doing, then run to validate.
    ```sql
    {# RUN COMMAND: dbt run-operation macro_hello_world_8 #}
    {% macro macro_hello_world_8() %}

        {{ log('Congrats on running your 8th macro in dbt', True) }}

        {% set target_dict = target %}

        {% for key in target_dict %}
            {{ log('Key:Value ' ~ key ~ ':' ~ target_dict[key], True) }}
        {% endfor %}

    {% endmacro %}
    ```

### Basic

In this lab, you will simulate a Technology department with one model that contains data masking logic.

- Model 3 - User information with Personally Identifiable Information (PII) masked

1. Create file macro_pii_masking.sql, under demo_project/macros, containing two macros with text and numeric masking logic. The masking logic for both macros is to mask data for all users other than a user called unmasked_user. Text data will be masked with the value 'MASKED' and numeric data will be masked with the value 9999.
    ```sql
    {% macro macro_pii_masking_text(field) %}
        case
            when lower(current_user) in ('unmasked_user') then {{ field }}
            when {{ field }} is null then null
            else 'MASKED'
        end
    {% endmacro %}


    {% macro macro_pii_masking_numeric(field) %}
        case
            when lower(current_user) in ('unmasked_user') then {{ field }}
            when {{ field }} is null then null
            else 9999
        end
    {% endmacro %}
    ```
2. Create folder dept_tech, under demo_project/models, to represent Technology models layer.
    ```
    mkdir models/dept_tech
    ```
3. Create file rpt_tech_all_users.sql, demo_project/models/dept_tech, containing SQL Select statement for Model 3 which extracts user information with masking applied via created macro - macro_pii_masking_text and macro_pii_masking_numeric.
    ```sql
    select
        {{ macro_pii_masking_numeric('userid') }} as userid,
        {{ macro_pii_masking_text('username') }} as username,
        {{ macro_pii_masking_text('firstname') }} as firstname,
        {{ macro_pii_masking_text('lastname') }} as lastname,
        city,
        state,
        {{ macro_pii_masking_text('email') }} as email,
        phone,
        likesports,
        liketheatre,
        likeconcerts,
        likejazz,
        likeclassical,
        likeopera,
        likerock,
        likevegas,
        likebroadway,
        likemusicals
    from {{ref('base_public_users')}}
    ```
4. After Technology related folder and file are created in models folder, update dbt_project.yml to configure this new model layer. For dept_tech, configure materialized as view.
5. Run `dbt run --models dept_tech --target dev` to create schema (dbt_dept_tech) and view (rpt_tech_all_users) for Technology model layer.

### Advanced

Macros can be used to send queries to Amazon Redshift. In this lab, you will use macros to manage users and grants. Similar to the Python script earlier, you can modify the macros created in this lab to better suit your needs beyond the scope of this workshop. For example, you might want to include a new macro to manage Role-based access control (RBAC) in Amazon Redshift.

1. File `demo_project/macros/macro_manage_access.sql` contains two macros, macro_manage_users and macro_manage_users_grants, to manage users and grants.
2. Update dbt_project.yml to create folliwng variables. These are configurations used by macro_manage_users and macro_manage_users_grants.
   1. new_user_default_pwd (default password for newly created users)
   2. dbt_managed_users (list of users)
   3. dbt_managed_grants (list of schemas with users that should have access)
   ```yml
    vars:
        new_user_default_pwd: 'Password123'
        dbt_managed_users: [
            'finance_user',
            'masked_user',
            'unmasked_user'
        ]
        dbt_managed_grants: {
            'dbt_dept_finance': ['does_not_exist_user', 'finance_user'],
            'dbt_dept_tech': ['masked_user', 'unmasked_user'],
            'dbtraw': ['finance_user', 'masked_user', 'unmasked_user'],
            'dbt': ['finance_user', 'masked_user', 'unmasked_user']
        }
   ```
3. Run `dbt run-operation macro_manage_users --target dev` to create users. This command runs macro_manage_users which relies on variables new_user_default_pwd and dbt_managed_users in dbt_project.yml.
4. Running `dbt run-operation macro_manage_users --target dev` again informs that users exist.
5. Run `dbt run-operation macro_manage_users_grants --args "{'schema_list': ['dbtraw', 'dbt', 'dbt_dept_finance', 'dbt_dept_tech']}" --target dev` to grant access to created users. This command runs macro_manage_users_grants which relies on variable dbt_managed_grants in dbt_project.yml. Note: If you get permission denied error on raw schema, you can add that schema also in this grant list command.
6. Note: macro_manage_users_grants is customised to handle missing user as does_not_exist_user is configured in variable dbt_managed_grants but does_not_exist_user was never created as it is not configured in variable dbt_managed_users.
7. You can query dbt_dept_tech.rpt_tech_all_users using created users to experience data masking. To switch users, use command "set session authorization [user]".
    ```sql
    set session authorization 'unmasked_user';
    select * from dbt_dept_tech.rpt_tech_all_users;
    ```
    ```sql
    set session authorization 'masked_user';
    select * from dbt_dept_tech.rpt_tech_all_users;
    ```

## Create hooks

As models are added or updated, you will constantly need to grant access to new views and regrant access to existing views. Regranting access to an existing view is required as dbt updates by dropping existing view and creating a new view.

This introduces the operational challenge of you having to remember to run macro macro_manage_users_grants. However the process of running macro macro_manage_users_grants can be automated by hooks in dbt.

1. You can simulate this scenario by re-running `dbt run --models dept_finance` and using finance_user to query `dbt_dept_finance.rpt_finance_qtr_total_sales_by_event`. You will experience a permission denied error.
2. To configure a hook that runs macro macro_manage_users_grants after running dbt run, update dbt_project.yml by adding the following at the botton:
   ```yaml
    on-run-end:
        - "{{ macro_manage_users_grants(schemas) }}"
   ```
3. After dbt_project.yml is updated, re-run `dbt run --models dept_finance`. One hook is triggered.
4. Use finance_user to query `dbt_dept_finance.rpt_finance_qtr_total_sales_by_event`; there will be no permission denied error.
    ```sql
    set session authorization 'finance_user';
    select * from dbt_dept_finance.rpt_finance_qtr_total_sales_by_event;
    ```

## Create seeds

Seeds are a convenient way in dbt for you to manage manual files. A common use case for manual files is to introduce data mappings. Manual files allows data mappings to be easily maintained and reused as compared to the usage of case statements in SQL which requires code changes when data mappings are changed.

In this Lab, you will simulate a Marketing department that maintains a custom data mapping csv that a model uses.

- Model 4 - Category information with custom data mappings


1. Folder dept_marketing, under demo_project/seeds, represents Marketing seeds layer.
2. File mnl_category_desc.csv, under project_demo/seeds/dept_marketing, containing custom data mappings.
3. Update dbt_project.yml to configure this new seed layer, by adding:
    ```yaml
    seeds:
        demo_project:
            dept_marketing:
            schema: dept_marketing
    ```
4. Run `dbt seed --models dept_marketing` to create schema (dbt_dept_marketing) and table (mnl_category_desc) for Marketing seed layer. Not to worry about macro_manage_users_grants informing of grants unconfigured as there isn't any grants configuration for dbt_dept_marketing in dbt_project.yml.
5. You can query dbt_dept_marketing.mnl_category_desc to verify that table is created in your targeted Amazon Redshift cluster. You might have to switch back to admin as finance_user was used and finance_user does not have the permission to query dbt_dept_marketing.mnl_category_desc.
    ```sql
    set session authorization 'admin';
    select * from dbt_dept_marketing.mnl_category_desc;
    ```
6. Folder dept_marketing, under demo_project/models, represents Marketing models layer.
7. File rpt_marketing_category_full_desc.sql, under demo_project/modes/dept_marketing, containing SQL Select statement for Model 4. Notice the SQL Select statement references the custom data mappings mnl_category_desc.
8. Update dbt_project.yml to configure this new model layer. For dept_marketing, configure materialized as view.
    ```yaml
    models:
        ...
        dept_marketing:
            materialized: view
            schema: dept_marketing
    ```
9. Run `dbt run --models dept_marketing` to create view (rpt_marketing_category_full_desc) for Marketing model layer. Schema (dbt_dept_marketing) was already created in step 4.
10. You can query `dbt_dept_marketing.rpt_marketing_category_full_desc` to verify that view is created in your targeted Amazon Redshift cluster.
    ```sql
    select * from dbt_dept_marketing.rpt_marketing_category_full_desc;
    ```

## Create documentations

The ability for objects to reference other objects improves code reusability but can result in widespread negative impact when an erroneous change is introduced to an object that is referenced by a large number of objects. dbt provides you with an interface to visualize all models and its dependencies on other models which is useful for impact analysis.

1. Run `dbt docs generate` to generate documentation of demo_project.
2. Run `dbt docs serve` to launch a locally hosted website to navigate generated documentation that also comes with visualization of models and its dependencies on other models. Note: Press Ctrl+C to stop locally hosted website.
3. If you are running dbt on AWS Cloud9, on the top menu bar, select Preview >> Preview Running Application to access the locally hosted website.
4. In the locally hosted website, you can select on models to view information including direct dependencies.
5. To view indirect dependencies, you can use the lineage graph.

## Materialized views

Materialized view stores precomputed results to reduce processing time for complex queries involving multi-table joins and aggregations.

In this Lab, you will simulate an Experimental department that is exploring a dbt experimental feature.

- Model 5 - Same as Model 1 but to be materialized as a materialized view instead of a view

1. Add dbt-labs-experimental-features package information into packages.yml under demo_project.
    ```yaml
    packages:
    - package: dbt-labs/codegen
    version: 0.6.0

    - git: https://github.com/dbt-labs/dbt-labs-experimental-features
    subdirectory: materialized-views
    ```
2. Before package can be used, it needs to be installed by running dbt deps. Take this opportunity to run `dbt clean` to clear any previously installed packages.
3. Run `dbt deps`. The warning informs latest version of dbt-labs-experimental-features will always be pulled which can introduce breaking changes. Given this an experimental feature that you will be testing, the warning is fine. However we do not recommend you to use it for production.
4. In addition to installation of dbt-labs-experimental-features, there is a need to use a macro to overwrite the builtin versions of some adapter macros. Analyze `macro_overwrite_for_mv.sql` macro to understand how it is working.
5. Folder dept_experimental, under demo_project/models, represents Experimental models layer.
6. File rpt_experimental_qtr_total_sales_by_event.sql, under demo_project/models/dept_experimental, containing SQL Select statement for Model 5 which is similar to Model 1.
7. Update dbt_project.yml to configure this new model layer. Notice the use of materialized_view value for materialized key.
    ```yaml
    models:
        ...
        dept_experimental:
            materialized: materialized_view
            schema: dept_experimental
    ```
8. Run `dbt run --models dept_experimental --target dev` to create schema (dbt_dept_experimental) and materialized view (rpt_experimental_qtr_total_sales_by_event) for Experimental model layer.
9. You can query `dbt_dept_experimental.rpt_experimental_qtr_total_sales_by_event` to verify that materialized view is created in your targeted Amazon Redshift cluster.
    ```sql
    select * from dbt_dept_experimental.rpt_experimental_qtr_total_sales_by_event;
    ```

Thank you for completing this workshop on managing data transformations with dbt in Amazon Redshift.

In this workshop, you covered

1. Installation of dbt CLI
2. Use of dbt dbt-labs/codegen package and Python script to automate creation of a base layer
3. Use of dbt models to maintain data transformations with referencing capabilities
4. Use of dbt macros to maintain common logic as functions and to administer user creation and grants
5. Use of dbt hooks to automate continuous execution of grants
6. Use of dbt seeds to manage manual files
7. Use of dbt docs to generate documentation with visualization
8. Use of dbt experimental package explore materialization of materialize views

There are other features in dbt not covered in this workshop that you can explore by visiting [What is dbt?](https://docs.getdbt.com/docs/introduction).