#### Semantic Model Support – Bridge Table (Order ↔ Date)
**Purpose of this notebook**

This notebook creates a semantic support bridge table to enable correct calendar slicing for order-level BI measures, without modifying any promoted or locked Gold dimension tables.

It is intentionally separated from the main Gold dimension build notebooks to ensure:
- No re-execution of promoted dim_* tables
- No changes to validated Gold business logic
- Clear separation between data modeling and semantic enablement

**Background**

In the current Gold model:
- dim_order contains order_purchase_timestamp (DateTime)
- dim_date uses a date_key (YYYYMMDD integer)
- Order-level measures (e.g. average delivery duration) iterate over dim_order

When BI visuals use dim_date as the x-axis, dim_date must be able to filter dim_order.
Because dim_order does not contain a compatible date key, this filtering cannot occur directly in the semantic model.

This results in order-level measures being evaluated over all orders, producing incorrect or flat trends when plotted by date.

**Solution: Bridge table approach**

Instead of modifying the locked dim_order table, we introduce a small bridge table:
bridge_order_purchase_date

This table:
- Maps each order_key to a corresponding purchase_date_key (YYYYMMDD)
- Is derived from dim_order
- Contains only keys, no business attributes

This allows the semantic model to establish the following relationships:
- dim_date[date_key] (1) → bridge_order_purchase_date[purchase_date_key] (*)
- dim_order[order_key] (1) → bridge_order_purchase_date[order_key] (*)

Through this bridge, calendar filters from dim_date can correctly propagate to dim_order.

**What this notebook does**

Creates the bridge_order_purchase_date table using Spark SQL

Validates:
- One row per order_key
- Correct date key range

Does not:
- Rebuild any dim_* tables
- Alter promoted Gold schemas
- Change any fact or dimension logic

This change is additive and reversible, and affects only the semantic layer behavior.

**Impact on BI**

With this bridge in place:
- Order-level measures can be safely plotted using calendar dates
- Issues such as flat or misleading trends are resolved
- Existing KPIs and dashboards remain unchanged unless they intentionally use the new relationship

**Governance note**

This notebook is classified as semantic support infrastructure, not Gold business logic.

It exists to support:
- Correct BI slicing behavior
- Clean semantic model design
- Minimal risk to promoted datasets


In [1]:
%%sql
CREATE TABLE IF NOT EXISTS bridge_order_purchase_date
USING DELTA
AS
SELECT
    order_key,
    CAST(date_format(order_purchase_timestamp, 'yyyyMMdd') AS INT) AS purchase_date_key,
    current_timestamp() AS row_insert_timestamp,
    current_timestamp() AS row_update_timestamp
FROM dim_order
WHERE order_purchase_timestamp IS NOT NULL;

StatementMeta(, 727d73d6-8346-4a36-8236-5c847671350c, 2, Finished, Available, Finished)

<Spark SQL result set with 0 rows and 0 fields>

In [2]:
%%sql
SELECT
  COUNT(*) AS total_rows,
  COUNT(DISTINCT order_key) AS distinct_order_key,
  MIN(purchase_date_key) AS min_date_key,
  MAX(purchase_date_key) AS max_date_key
FROM bridge_order_purchase_date;

StatementMeta(, 727d73d6-8346-4a36-8236-5c847671350c, 3, Finished, Available, Finished)

<Spark SQL result set with 1 rows and 4 fields>

In [1]:
%%sql
DESCRIBE TABLE dim_order;
DESCRIBE TABLE bridge_order_purchase_date;

StatementMeta(, 9a977c85-d7c4-49bf-9bb5-36947dc0b213, 3, Finished, Available, Finished)

<Spark SQL result set with 13 rows and 3 fields>

<Spark SQL result set with 4 rows and 3 fields>

In [2]:
%%sql
SELECT
  COUNT(*) AS rows,
  COUNT(DISTINCT order_key) AS distinct_order_key
FROM dim_order;

SELECT
  COUNT(*) AS rows,
  COUNT(DISTINCT order_key) AS distinct_order_key
FROM bridge_order_purchase_date;

StatementMeta(, 9a977c85-d7c4-49bf-9bb5-36947dc0b213, 5, Finished, Available, Finished)

<Spark SQL result set with 1 rows and 2 fields>

<Spark SQL result set with 1 rows and 2 fields>