# Run Tests on Delivery Delay Duration

In [None]:
import os
from pathlib import Path

from dotenv import find_dotenv, load_dotenv
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL

In [None]:
PROJ_ROOT = Path().resolve().parents[3]
env_file_dir = PROJ_ROOT / '.env'
_ = load_dotenv(env_file_dir, verbose=True)

## About

Check the calculation of `delivery_delay_seconds` in the `marts/core/fct_orders` model.

### Notes

1. This notebook supports <kbd>Run</kbd> > <kbd>Run All Cells</kbd>.

## User Inputs

In [None]:
#

In [None]:
engine = create_engine(
    URL(
        drivername="driver",
        account=os.getenv("UPLIMIT_SNOWFLAKE_ACCOUNT"),
        user=os.getenv("UPLIMIT_SNOWFLAKE_USER"),
        password=os.getenv("UPLIMIT_SNOWFLAKE_PASS"),
        warehouse=os.getenv("UPLIMIT_SNOWFLAKE_WAREHOUSE"),
        role=os.getenv("UPLIMIT_SNOWFLAKE_ROLE"),
        database=os.getenv("UPLIMIT_SNOWFLAKE_DB_NAME"),
        schema=os.getenv("UPLIMIT_SNOWFLAKE_SCHEMA"),
        timezone='US/Eastern'
    )
)

## Connect

Load Jupyter SQL extension

In [None]:
%load_ext sql

Set the maximum number of rows to be displayed to `None` (shows all rows)

In [None]:
%config SqlMagic.displaylimit = None

Connect to database

In [None]:
%sql engine --alias connection

## Queries

### Checking Accuracy of `delivery_delay_seconds` Column in `fct_orders` Using Tests

#### Sample Data

In [None]:
%%sql
SELECT delivered_at,
       estimated_delivery_at,
       datediff(
           second, created_at, estimated_delivery_at
       ) AS estimated_delivery_time_seconds,
       datediff(second, created_at, delivered_at) AS delivery_time_seconds,
       (
           CASE
               WHEN delivered_at > estimated_delivery_at
               THEN ABS(
                   DATEDIFF(second, delivered_at, estimated_delivery_at)
               )
               ELSE NULL
           END
       ) AS delivery_delay_seconds
FROM stg_postgres_orders
WHERE delivered_at IN (
    '2021-02-17 23:30:34',
    '2021-02-13 15:13:09'
)
OR estimated_delivery_at IN ('2021-02-14 23:35:14', '2021-02-16 07:08:04')
OR order_id = '8385cfcd-2b3f-443a-a676-9756f7eb5404'

#### Expected Outputs Captured in Tests

If an order is delivered
1. with a delay
   - the calculation of `delivery_delay_seconds` produces a correct non-`NULL` value
2. on time (no delay)
   - the calculation of `delivery_delay_seconds` produces a `NULL` value

With the above in mind, the following three tests were run for this `delivery_delay_seconds` column
```sql
 - name: delivery_delay_seconds
   - dbt_utils.expression_is_true:
       name: unexpected_null_if_no_delay_from_timestamps
       expression: "IS NULL"
       where: "delivered_at < estimated_delivery_at"
   - dbt_utils.expression_is_true:
       name: unexpected_null_if_no_delay_from_timedeltas
       expression: "IS NULL"
       where: "delivery_time_seconds < estimated_delivery_time_seconds"
   - dbt_utils.expression_is_true:
       name: unexpected_null_if_timedelta_value
       expression: "IS NULL"
       where: "delivery_time_seconds IS NOT NULL"
```

#### Checking Test Outcomes (Pass/Fail) Using Sample Data

If an order is not yet delivered then the following occurs
1. the `delivered_at` timestamp column is `NULL` (however, this `NULL` does not mean the order was delivered on time)
   - the first test passes since the `<` operator only compares non-`NULL` timestamp values
2. the `delivery_delay_seconds` is `NULL` (however, this `NULL` also does not mean the order was delivered on time)
   - the second test passes since the `<` operator only compares non-`NULL` timedelta values
   - the passing of this test is expected but reveals a flaw in the logic of assigning `NULL`s in the `delivery_delay_seconds` calculation
     - this flaw will be exposed in the third test (next)
3. the `delivery_delay_seconds` column is `NULL`
   - this column is `NULL` (as expected) since the timedelta (using `DATEDIFF('second', created_at, estimated_delivery_at)`) cannot be calculated between the `NULL` in the `delivered_at` column and the non-`NULL` value in the `estimated_delivery_at` column
   - the third test fails since the `CASE WHEN` logic
     - expects this `delivery_delay_seconds` column to only be `NULL` for on-time deliveries (when the `delivery_time_seconds` column can be calculated and does not contain a `NULL` value)
     - does not expect this column to be `NULL` in other scenarios (i.e. when the order has not yet been delivered and the the `delivery_time_seconds` column cannot be calculated)

## Conclusion

In the absence of the third test, any downstream statistics calculated on the delivery delay in `delivery_delay_seconds` (eg. what is the average delivery delay for Greenery orders?) would be incorrect.

## Disconnect

Close connection

In [None]:
%sql --close connection