### "dbt" "Data Build Tool".

### Index <a id="index"></a>

### [ERROR, WARNING AND, QUARANTINE SCOPE](#mark_00)

### [How can we surface test outcomes as data?](#mark_01)

### ERROR, WARNING, AND QUARANTINE SCOPE: <a id="mark_00"></a> [Back to Index](#index)

**Error broken pipeline:** 

    - Serious failure during the data ingesting or its transformation and it doesn't allow finalize a job.
    - Types of Errors: Syntax, Logic, connections, performance/ time-out, corrupted data/ files not well-formed, set-up.
    - Action: Fix the Error.

**Warning:**

    - Potential issues during the data ingesting or its transformation, but the job can be finalized.
    - Type of Warnings: Data Quality Issues.
    - Action: Fix the bug.

*   **Quarantine** (question: quarantine "concept" will be used during (in models) or after (in tests) the jobs):

    - Corrupted data/ files not well-formed.

    - Identify the corrupted data during the test (test flag: corrupted_data).

        - logic: If Identify corrupted data --> retrieve [source_name, column_names, rows, corrupted data] --> 
            - replicate same table --> logic: No corrupted data = null, corrupted data = data it self
            - Or create a new csv file --> [id, column_names, its corrupted data]
            - Or create a table with an specific schema "id, column_name_01, column_name_02,...."  
        
        - Send the Corrupted data to the "quarantine_table"

    - positive cases [test-online](https://regexr.com/): 

        mi.email_01@host01.com

        mi.email_01%@host01.more.com

        mi.email_01%+@host01.more-andmore.com

        mi.email_01%+-@host.more-more_more.com

    - negative cases [test-online](https://regexr.com/):

        mi .email_01%+-@host.com

        mi.,email_01%+-@host.com

        mi.;email_01%+-@host.com

        miémail_01@host.com

        miëmail_01@host.com

        mìemail_01@host.com

    - in models [documentation](https://docs.getdbt.com/docs/build/unit-tests#unit-testing-a-model): 
```sql
-- introducing this logic in models
coalesce (regexp_like(
            customers.email, '^[A-Za-z0-9._%+-]+@[a-z0-9._-]+\.com$'
        ), 'invalid_email')
```
    - in tests:
```sql
-- dbt_tests.sql

WITH valid_emails AS (
  SELECT email
  FROM customers
  WHERE dbt_utils.assert_regexp_matches(email, r'^[A-Za-z0-9._%+-]+@[a-z0-9._-]+\.com$')
)

--if we use different table schema
INSERT INTO quarantine_table (column1, column2, column3, ...)
SELECT column1, column2, column3, ...
FROM valid_emails;

--if we have same table schema
INSERT INTO quarantine_table
SELECT *
FROM valid_emails;
```
        
        
        
        - Files that don't achieve the expected format.
        - Action: Re-ingest/ Re-Transform.

### How can we surface test outcomes as data?": <a id="mark_01"></a> [Back to Index](#index)

**Scope:**

    - After testing runs.
    
**Scenarios:**

        - outcomes transformation (as structured data) --> **"dbt-artifacts"**, **"dbt-artifacts-cli"**, and or **"custom scripts"**.
        - outcomes storage --> **"dbt-artifacts"**, **"dbt-artifacts-cli"**, and or **"custom scripts"** , to extraction and storage in DDBB.
        - outcomes analysis.
        - outcomes visualization --> Integration with **Tableau**/ **Looker**, or **custom visualization** (PD DataFrames --> matplotlib, seaborn).

**configuration, incremental**: https://docs.getdbt.com/docs/build/incremental-models

models/stg_events.sql

```sql
{{ config(materialized='incremental') }}

select
    *,
    my_slow_function(my_column)

from {{ ref('app_data_events') }}

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  -- (uses >= to include records whose timestamp occurred since the last run of this model)
  where event_time > (select max(event_time) from {{ this }})

{% endif %}

```
**1. dbt-artifacts:**

