# Unity Catalog Metric Views - Building a Semantic Model

**Metric Views** in Unity Catalog provide a centralized way to define and manage consistent, reusable, and governed core business metrics. They act as a **Semantic Layer** on top of your data tables.

**Why use Metric Views?**
1.  **Abstraction:** Hides complex SQL logic (Joins, Window functions, Aggregations) from end-users.
2.  **Consistency:** Defines a "Single Source of Truth". Everyone calculates "Profit" or "Total Orders" the exact same way.
3.  **Reusability:** Defined once, used everywhere (SQL queries, Dashboards, BI Tools).
4.  **Governance:** You can control permissions on who can access specific metrics.

**In this notebook, we will:**
1.  Understand the structure of a Metric View (YAML).
2.  Define Dimensions and Measures.
3.  Implement complex logic like **Year-over-Year (YoY) Growth** using Window functions.
4.  Join Fact and Dimension tables within the view.
5.  Query the Metric View using the special `MEASURE()` syntax.

### Prerequisites
*   A Unity Catalog enabled Workspace.
*   The `orders_raw` and `customer` tables in your catalog/schema (created in previous sessions).

In [None]:
# Setup: Define Catalog and Schema
catalog_name = "dev"
schema_name = "bronze"

spark.sql(f"USE CATALOG {catalog_name}")
spark.sql(f"USE SCHEMA {schema_name}")

print(f"Using Catalog: {catalog_name}, Schema: {schema_name}")

## 1. Understanding the Source Data
We will build our Metric View on top of the `orders_raw` table (Fact Table) and join it with the `customer` table (Dimension Table).

In [None]:
-- Preview the Fact Table
SELECT * FROM orders_raw LIMIT 5;

In [None]:
-- Preview the Dimension Table
SELECT * FROM customer_raw LIMIT 5;

## 2. Structure of a Metric View (YAML)

Metric Views are defined using **YAML**. While you create them via the Databricks UI (Catalog Explorer -> Create -> Metric View), understanding the syntax is crucial.

### Basic Components:
1.  **Source:** The underlying table.
2.  **Dimensions:** Attributes to group by (e.g., Date, Status).
3.  **Measures:** Aggregations (e.g., Sum, Count).

### Example YAML Definition:
Below is the logic used to create the initial version of `orders_raw_metric_view`.

```yaml
version: 1
source: dev.bronze.orders_raw
model_name: orders_raw_metric_view

dimensions:
  - name: Order Date
    expr: o_orderdate
  
  - name: Order Year
    expr: DATE_TRUNC('YEAR', o_orderdate)
    
  - name: Order Status
    expr: o_orderstatus
    
  - name: Order Status Readable
    expr: |
      case 
        when o_orderstatus = 'O' then 'Open'
        when o_orderstatus = 'F' then 'Fulfilled'
        when o_orderstatus = 'P' then 'Processing'
      end

measures:
  - name: Total Price
    expr: SUM(o_totalprice)
    
  - name: Total Orders
    expr: COUNT(DISTINCT o_orderkey)

**Note:** You cannot run YAML cells in this notebook directly to create the view. You typically paste this into the Databricks SQL Editor or Catalog Explorer UI. However, once created, we query it using SQL.

## 3. Querying Metric Views with SQL

Unlike standard views, you **cannot** run `SELECT *` on a Metric View.
*   You must specify **Dimensions** directly.
*   You must wrap **Measures** in the `MEASURE()` function.
*   Grouping is handled automatically based on the selected dimensions.

*Assuming the view `orders_raw_metric_view` has been created in the UI with the YAML above.*

In [None]:
-- Calculate Total Orders and Total Price grouped by Order Status
-- Notice we don't need to write SUM() or COUNT() or GROUP BY logic here.
-- We just ask for the metric.

SELECT 
  "Order Status Readable", 
  MEASURE("Total Orders"), 
  MEASURE("Total Price")
FROM 
  orders_raw_metric_view
ORDER BY 
  1;

## 4. Advanced Logic: Time Intelligence & Window Functions

Metric Views allow creating complex time-series metrics like "Current Year Sales" vs "Last Year Sales" without writing complex SQL self-joins or CTEs.

### Logic Explained:
We use the `window` property in YAML to define the range.

**YAML Snippet for Logic:**
```yaml
  - name: Total Orders Current Year
    expr: COUNT(DISTINCT o_orderkey)
    window: 
      order: Order Year
      range: current
      semiadditive: last

  - name: Total Orders Last Year
    expr: COUNT(DISTINCT o_orderkey)
    window: 
      order: Order Year
      range: trailing 1 year
      semiadditive: last
      
  - name: Year on Year Growth %
    expr: 100 * (MEASURE("Total Orders Current Year") - MEASURE("Total Orders Last Year")) / MEASURE("Total Orders Last Year")

In [None]:
-- Querying Year-over-Year Growth
-- This query automatically computes current year, previous year (lag), and growth percentage
-- simply by selecting the pre-defined metrics.

SELECT 
  "Order Year",
  MEASURE("Total Orders Current Year"),
  MEASURE("Total Orders Last Year"),
  MEASURE("Year on Year Growth %")
FROM 
  orders_raw_metric_view
ORDER BY 
  "Order Year";

## 5. Joining Tables in Metric Views

You can enrich your metric view by joining dimension tables directly in the YAML definition. This avoids the need for downstream users to know join keys or join types.

**YAML Snippet for Join:**
```yaml
joins:
  - name: cust_dim
    source: dev.bronze.customer_raw
    on: cust_dim.c_custkey = source.o_custkey

dimensions:
  - name: Customer Market Segment
    expr: |
      case 
        when cust_dim.c_mktsegment is NULL then 'OTHERS'
        else cust_dim.c_mktsegment
      end

In [None]:
-- Querying with Joined Dimensions
-- We can now slice and dice our measures by 'Customer Market Segment'
-- even though that column doesn't exist in the orders table.

SELECT 
  "Order Year",
  "Customer Market Segment",
  MEASURE("Year on Year Growth %")
FROM 
  orders_raw_metric_view
WHERE 
  "Order Year" >= '1994-01-01'
ORDER BY 
  1, 2;

## 6. Summary of Benefits

1.  **Code Reduction:** The SQL query in Cell 12 is incredibly simple. To write that in raw SQL, you would need:
    *   Joins between Order and Customer tables.
    *   `CASE WHEN` logic for null handling.
    *   `DATE_TRUNC` logic.
    *   Window functions (`LAG`) for previous year calculation.
    *   Group By clauses.
2.  **Self-Service:** Business analysts can use these metrics in PowerBI or Tableau without needing to know the underlying complex SQL.
3.  **Agility:** If the logic for "Total Orders" changes, you update the Metric View YAML once, and all reports update automatically.