## What is Analytics Engineering?
- The **analytics** engineer role sites between a data engineer, who prepares and maintains the data infrastructure and the data analyst, who is using data to answer business questions and solve problems.
- The analytics engineer introduces good software engineering practices to the efforts of data analysts and data engineers.

Remember the tradition data process looks something like this:
```mermaid
flowchart LR
    A[Data loading] --> B[Data storing]
    B[Data storing] --> C[Data modeling]
    C[Data modeling] --> D[Data presentation]
```

Analytics engineers are often focused on the data modeling and data presentation steps. But what does it mean to "model" data? Today there are several popular approaches to modeling data and building data warehouses.

1. Kimball Dimensional Modeling (Star Schema)
- Ralph Kimball is a prominent figure in the field of data warehousing and business intelligence. He pioneered dimensional modeling and wrote the book "The Data Warehouse Toolkit", an essential reference for creating and maintaining data warehouses.
- Kimball focuses on building simple, practical, and accessible data warehouses, optimized for query performance and ease of use by end users. It employs a bottom-up approach, where smaller, subject-specific data marts are built first and then integrated into a larger enterprise data warehouse.
- The data model relies on a **star schema** design, where data is organized into facts (measureable, quantitative data) and related dimensions (descriptive attributes).
- Kimball allows data denormalization where needed, which can lead to data redundancy but enables simplified querying and improved performance.
- It emphasizes iterative development and delivering business value in smaller increments.

2. Inmon Methodology
- Bill Inmon is another prominent figure in the field of data warehousing and business intelligence, often called the "father of data warehousing". He coined the term "data warehouse" and wrote several influential books, including "Building the Data Warehouse".
- Inmon prioritizes data integration and consistency across an organization. It employes a top-down approach, where the enterprise data warehouse is built first as a centralized repository for all organizational data and then data marts are created from this source as needed.
- Inmon emphasizes normalizing data in third normal form (3NF) to reduce data redundancy and ensure data integrity.
- This approach allows data warehouses and data models to be more scalable and adaptable to changing business needs due to its centralized nature.
- However this approach is usually more complex and time-consuming to implement when compared to Kimball's approach.

3. Data Vault
- Data Vault was created by Dan Linstedt in the late 1990s as a reaction to real world challenges he encountered while working on large data warehouse projects. His book "Building a Scalable Data Warehouse with Data Vault 2.0" describes the data vault approach.
- A data vault is set up in a hub-and-spoke architecture, with three core components: hubs (containing business keys), links (joining keys between hubs), and satellites (containing descriptive attributes).
- This approach excels in capturing historical data changes and providing a detailed audit trail for data.
- Its highly adaptable to changing business needs and is highly scalable due to its modular design.
- However data vault modeling can be complex to understand and implement, especially for business users.

## Elements of dimensional modeling
In this course we will use the dimensional modeling approach. Dimensional modeling includes fact tables and dimension tables.

* Fact tables
    - Record measurements, metrics, or facts that correspond to a business process. Ex. "verbs" like sales, orders, transactions.
 * Dimension tables
    - Corresponds to business entities that provide context to a business process. Ex. "nouns" like customer, product, regions.

The architecture containing a dimensional model is usually composed of a staging area, a processing area, and a presentation area:

```mermaid
flowchart TB
    A["Stage Area (raw data)"] --> B["Processing Area (data models)"]
    B["Processing Area (data models)"] --> C["Presentation Area (reports, dashboards)"]
```

## Data modeling with dbt
**dbt** (**d**ata **b**uild **t**ool) is a transformation workflow tool that modularizes SQL code into discrete units called **models**. These models represent individual transformations or business logic applied to the data.

Models are written using SQL within Jinja templates and are then compiled into `*.sql` files.

dbt provides several other tools for:
- Dependency management: users can define dependencies between models to ensure they are executed in the correct order
- Version control: dbt integrates with version control systems like git, allowing data transformations and business logic to be tracked across time.
- Testing: it includes a testing framework to enable users to test their data models.

dbt usually sits on top of a data warehouse, processing data as its ingested as well as throughout the warehouse.

```mermaid
flowchart LR
    c[Data loaders] --> a
    subgraph s1 [dbt]
        subgraph ss1 [Data warehouse]
            a[Raw data] --> b[Transformed data]
        end
    end
    b --> d[BI Tools]
```

There are two ways to use dbt:
1. dbt Core
    - dbt Core is an open-source command line tool used to set up, build, and run dbt projects (which are made up of `.sql` and `.yml` files).
    - --> [Installation instructions](https://docs.getdbt.com/docs/core/installation-overview)

2. dbt Cloud
    - [dbt Cloud](https://cloud.getdbt.com/) is a web-based IDE application used to develop, test, and run dbt projects.