# 06.1 - Introduction to transformation processes

In the Silver layer of the lakehouse, the data from the Bronze layer is matched, merged, conformed and cleansed ("just-enough") so that the Silver layer can provide an "Enterprise view" of all its key business entities, concepts and transactions. (e.g. master customers, stores, non-duplicated transactions and cross-reference tables).

The Silver layer brings the data from different sources into an Enterprise view and enables self-service analytics for ad-hoc reporting, advanced analytics and ML. It serves as a source for Departmental Analysts, Data Engineers and Data Scientists to further create projects and analysis to answer business problems via enterprise and departmental data projects in the Gold Layer.

In the lakehouse data engineering paradigm, typically the ELT methodology is followed vs. ETL - which means only minimal or "just-enough" transformations and data cleansing rules are applied while loading the Silver layer. Speed and agility to ingest and deliver the data in the data lake is prioritized, and a lot of project-specific complex transformations and business rules are applied while loading the data from the Silver to Gold layer. From a data modeling perspective, the Silver Layer has more 3rd-Normal Form like data models. Data Vault-like, write-performant data models can be used in this layer.

## 1. Dbt and transformation engines


At the most basic level, dbt has two components: a compiler and a runner. Users write dbt code in their text editor of choice and then invoke dbt from the command line. dbt compiles all code into raw SQL and executes that code against the configured data warehouse. This allows us to swap the different transformation engines (i.e. Spark, Trino, PostgreSQL) without having to migrate the pipelines code to the new engine.

Each dbt project is organized into different folders like:


| Folder         | Purpose                                                                                   |
| -------------- | ----------------------------------------------------------------------------------------- |
| **models/**    | The heart of dbt: all your `.sql` models. Each becomes a table or view in your warehouse. |
| **macros/**    | Reusable SQL logic using Jinja — for surrogate keys, pivoting, dynamic filters, etc.      |
| **seeds/**     | CSVs that dbt loads into the database as tables. Useful for reference data.               |
| **snapshots/** | Handle slowly changing dimensions (track history of changes).                             |
| **tests/**     | Schema and data tests — either YAML-based or custom SQL.                                  |
| **analyses/**  | Saved queries or analysis code not materialized as models.                                |
| **target/**    | Auto-generated during compilation/runs — contains compiled SQL and logs.                  |


### 1.1 Using dbt to transform the data from one layer to another one

In the `project/dbt/silver` directory you can find the dbt project that contains all the logic to transform the bronze layer data into the silver layer data.


In the `models` folder you can found two important files:

- `source.yml`: this contains the description of all the bronze layer tables, including basic data integrity checks
- `schema.yml`: this contains the description of all the silver layer tables, including basic data integrity checks

Then for each one of the tables defined in the `schema.yml` file there should be a `<table_name>.sql` file with the dbt code to produce the given table

For this task you should 

1. Define all the bronze tables in the `source.yml` file with all integrity constraints that may apply
2. Define all the silver tables in the `schema.yml` file with all integrity constrainst that may apply
3. For each table in the silver layer create a `.sql` file to create the table

Once you have all the tables defined you can compile the dbt project using the `dbt compile` command at the projects location (`/var/lib/adventureworks/dbt/silver`)

In [None]:
!cd /var/lib/adventureworks/dbt/silver && dbt clean # always run this

In [None]:
!cd /var/lib/adventureworks/dbt/silver && dbt deps # you can run this only one time to get the dependencies

In [None]:
!cd /var/lib/adventureworks/dbt/silver && dbt compile

## Running dbt with Dagster

To run dbt with Dagster we create an asset in the `assets/silver` folder that runs the dbt CLI commands to build the tables. This
resources come from a `dagster-dbt` package that provides dbt support for Dagster.

To materialize the dbt assets in the project you can use the `silver_assets_job`