# Semantic Modeling with Metric Views

## The Challenge

Leadership wants metrics! They need to see **standardized KPIs** across dashboards, reports, and Genie conversations. But you know the drill - every analyst has their own version of "defect rate," and your team is drowning in inconsistent metric definitions.

## Note
Use a SQL Warehouse to run this notebook

## What is Semantic Modeling?

**Semantic modeling** creates a business-friendly layer on top of your data warehouse. Instead of writing SQL every time someone asks for "average temperature" or "defect rate," you define these metrics once in a **metric view**.

**Metric Views** in Unity Catalog allow you to:
- **Define metrics once**, use everywhere (dashboards, Genie, queries)
- **Standardize calculations** so everyone uses the same logic
- **Document relationships** between tables with semantic definitions
- **Enable natural language queries** in Genie that understand your business terms

Think of it as creating a "business dictionary" that both humans and AI can understand.

## Your Mission

You have IOT sensor data from airplane parts manufactured in multiple factories. You need to define key metrics like:
- Defect rates by factory and model
- Average sensor readings (temperature, rotation speed, pressure)
- Device counts and statuses
- Time-based trends

Instead of rebuilding everything from scratch, you'll follow Databricks' excellent hands-on demo and adapt it to your airplane IOT data.


## Follow the Databricks Demo

Databricks has created an excellent interactive demo that walks you through creating metric views step-by-step. Follow along with the `Databricks Assistant` version of the demo

**üëâ [Open the Metric Views with UC Demo](https://www.databricks.com/resources/demos/tours/governance/metric-views-with-uc?itm_data=demo_center)**

### Your Adaptation Instructions

Follow along with the demo, but adapt it to **your airplane IOT data**:

**Your Tables:**
- `sensor_bronze` - Raw sensor readings (temperature, rotation_speed, air_pressure, etc.)
- `inspection_bronze` / `inspection_gold` - Device inspection records with defects
- `dim_factories` - Factory reference data (factory_name, region, city, state)
- `dim_devices` - Device master data (links to factories and models)
- `dim_models` - Aircraft part model information (model_family, model_category)

**Your Metrics to Define:**
1. **defect_rate** - Percentage of inspections with defects
2. **avg_temperature** - Average temperature readings across devices
3. **avg_rotation_speed** - Average rotation speed
4. **device_count** - Count of active devices
5. **total_inspections** - Count of all inspection records

**Your Dimensions:**
- **factory_name** - Group metrics by factory
- **region** - Aggregate by geographic region
- **model_family** - Analyze by aircraft part model family
- **date** - Time-based grouping from timestamps

The demo will show you how to define these in YAML format and deploy them to Unity Catalog.


## Example: Completed Metric View YAML

Below is a complete example of a metric view for your airplane IOT data. After completing the Databricks demo, you can use this as a reference or starting point for your own metric views.

This example defines a `device_performance_metrics` view that includes quality and sensor performance metrics grouped by factory, model, and time.


## Building Your First Metric View

Let's start simple and build up complexity. We'll create three metric views, each adding more features:

1. **Basic** - One measure, one dimension (your "hello world")
2. **Intermediate** - Add more measures and dimensions
3. **Advanced** - Include joins to dimension tables

Each example is a complete, working metric view you can create in your schema.

### Example 1: Basic Metric View (Hello World)

The simplest possible metric view: one data source, one measure, one dimension.

**What this does:** Lets you calculate the average temperature by device.

Make sure you replace the {schema_name} with your own schema (e.g. john_doe)

In [0]:
%sql
-- Basic Metric View: Just the essentials
-- Replace {schema_name} with YOUR username (e.g., jane_smith)

CREATE OR REPLACE VIEW dwx_airops_insights_platform_dev_working.{schema_name}.sensor_metrics_basic WITH METRICS LANGUAGE YAML AS $$
version: 1.1
source: dwx_airops_insights_platform_dev_working.db_crash_course.sensor_bronze
dimensions:
  - name: Device ID
    expr: device_id
measures:
  - name: Average Temperature
    expr: AVG(temperature)
$$

### Example 2: Intermediate Metric View

Add more measures, dimensions, and your first join.

**What this adds:** Multiple sensor metrics (temperature, airflow, delay), time-based grouping, and a join to the factories dimension table.

In [0]:
%sql
-- Intermediate Metric View: Multiple measures and dimensions with a join
-- Replace {schema_name} with YOUR username (e.g., jane_smith)

CREATE OR REPLACE VIEW dwx_airops_insights_platform_dev_working.{schema_name}.sensor_metrics_intermediate WITH METRICS LANGUAGE YAML AS $$
version: 1.1
source: dwx_airops_insights_platform_dev_working.db_crash_course.sensor_bronze
joins:
  - name: factories
    source: dwx_airops_insights_platform_dev_working.db_crash_course.dim_factories
    on: source.factory_id = factories.factory_id
dimensions:
  - name: Device ID
    expr: device_id
  - name: Factory ID
    expr: factory_id
  - name: Model ID
    expr: model_id
  - name: Date
    expr: DATE(timestamp)
measures:
  - name: Average Temperature
    expr: AVG(temperature)
  - name: Average Airflow Rate
    expr: AVG(airflow_rate)
  - name: Total Delay
    expr: SUM(delay)
  - name: Record Count
    expr: COUNT(1)
$$

### Example 3: Advanced Metric View with Multiple Joins

The full-featured metric view with joins to multiple dimension tables.

**What this adds:** Joins to both factories AND models dimension tables, enabling rich analysis across multiple business dimensions without complex SQL.

In [0]:
%sql
-- Example Metric View: IoT Sensor Metrics with Joins
-- This demonstrates the correct YAML syntax for metric views with joins
-- 
-- IMPORTANT: Before running, replace:
-- - {catalog_name} with: dwx_airops_insights_platform_dev_working
-- - {schema_name} with: YOUR_USERNAME (extract username before @ symbol)
--   Example: if your email is jane.smith@company.com, use: jane_smith
--
-- This metric view will be created in YOUR personal schema so you don't conflict with others

CREATE OR REPLACE VIEW dwx_airops_insights_platform_dev_working.{schema_name}.sensor_metrics WITH METRICS LANGUAGE YAML AS $$
version: 1.1
source: dwx_airops_insights_platform_dev_working.db_crash_course.sensor_bronze
joins:
  - name: factories
    source: dwx_airops_insights_platform_dev_working.db_crash_course.dim_factories
    on: source.factory_id = factories.factory_id
  - name: models
    source: dwx_airops_insights_platform_dev_working.db_crash_course.dim_models
    on: source.model_id = models.model_id
dimensions:
  - name: Factory ID
    expr: factory_id
  - name: Model ID
    expr: model_id
  - name: Trip ID
    expr: trip_id
  - name: Device ID
    expr: device_id
  - name: Date
    expr: DATE(timestamp)
measures:
  - name: Average Airflow Rate
    expr: AVG(airflow_rate)
  - name: Max Temperature
    expr: MAX(temperature)
  - name: Total Delay
    expr: SUM(delay)
  - name: Record Count
    expr: COUNT(1)
$$

**Note:** The above example uses the correct YAML syntax for metric views. Here are the key components:

- **source**: The primary table to query from (reads from shared `db_crash_course` schema)
- **joins**: Define how to join with dimension tables (left joins to `dim_factories` and `dim_models`)
- **dimensions**: Business-friendly names for grouping data (factory name, region, model category, etc.)
- **measures**: Aggregated metrics (counts, averages, etc.)

This metric view enables:
- Genie to understand business terms like "Factory Name" and "Average Temperature"
- Dashboards to use pre-defined metrics consistently
- Automatic joins when users query by factory or model dimensions

**To create this in your environment:**
1. Replace `{catalog_name}` with `dwx_airops_insights_platform_dev_working`
2. Replace `{schema_name}` with your personal schema (extract username before @ symbol)
   - Example: if your email is `jane.smith@company.com`, use `jane_smith`
3. Run the SQL in a **SQL Warehouse** (metric views require SQL Warehouse, not compute clusters)
4. The metric view will be created in YOUR schema, reading data from the shared `db_crash_course` schema

**Documentation:** See [Metric Views](https://docs.databricks.com/aws/en/metric-views/create/sql) and [Data Modeling with Joins](https://docs.databricks.com/aws/en/metric-views/data-modeling/joins)

## Querying Your Metric View

**The power of metric views:** Once you've defined your metrics, anyone can query them consistently without knowing the underlying join logic or calculation details.

### Why This Matters

**Without metric views (the old way):**
```sql
-- Every analyst writes their own version of "average temperature by factory"
-- Version 1 (from Jane):
SELECT f.factory_name, AVG(s.temperature) as avg_temp
FROM sensor_bronze s
JOIN dim_factories f ON s.factory_id = f.factory_id
WHERE s.timestamp >= CURRENT_DATE - 30
GROUP BY f.factory_name

-- Version 2 (from Bob):
SELECT factories.name, AVG(sensors.temp) as average_temperature
FROM sensors
LEFT JOIN factories ON sensors.fid = factories.id
GROUP BY factories.name

-- Version 3 (from Dashboard Team):
-- They hardcoded the joins differently and got different results!
```

**Problems:**
- ‚ùå Three different SQL queries for the "same" metric
- ‚ùå Different join logic (INNER vs LEFT JOIN)
- ‚ùå Inconsistent results across reports
- ‚ùå No one knows which version is "correct"
- ‚ùå When the business logic changes, you have to update dozens of queries

**With metric views (the new way):**
```sql
-- Everyone uses the same standardized metric
SELECT 
  `Factory ID`,
  MEASURE(`Average Temperature`)
FROM sensor_metrics
GROUP BY `Factory ID`
```

**Benefits:**
- ‚úÖ One source of truth for "Average Temperature"
- ‚úÖ Joins defined once in the metric view
- ‚úÖ Everyone gets the same results
- ‚úÖ Change the logic once, updates everywhere
- ‚úÖ Self-documenting with business-friendly names

### Query Examples

**Important:** Metric views require the `MEASURE()` function to access measures. You cannot use `SELECT *` - you must explicitly name dimensions and measures.

Here are practical examples you can run after creating your metric view:

In [0]:
%sql
-- Example 1: Simple aggregation by factory
-- Shows average airflow rate and max temperature for each factory
-- Note: No JOIN logic needed - the metric view handles it!

SELECT 
  `Factory ID`,
  MEASURE(`Average Airflow Rate`) as avg_airflow,
  MEASURE(`Max Temperature`) as max_temp,
  MEASURE(`Record Count`) as total_records
FROM dwx_airops_insights_platform_dev_working.{your_username}.sensor_metrics
GROUP BY `Factory ID`
ORDER BY avg_airflow DESC

In [0]:
%sql
-- Example 2: Time-based trend analysis
-- Track daily metrics over time
-- Perfect for dashboards showing trends!

SELECT 
  `Date`,
  MEASURE(`Average Airflow Rate`) as avg_airflow,
  MEASURE(`Total Delay`) as total_delay,
  MEASURE(`Record Count`) as readings_count
FROM dwx_airops_insights_platform_dev_working.{your_username}.sensor_metrics
GROUP BY `Date`
ORDER BY `Date` DESC
LIMIT 30

## What You've Accomplished

Once you've created your metric views following the demo, you've built a **semantic layer** that:

‚úÖ Defines standardized metrics (defect_rate, avg_temperature, etc.) that everyone uses consistently  
‚úÖ Documents business logic in code (not scattered across emails and wikis)  
‚úÖ Powers natural language queries in Genie - users can ask "What's the defect rate by factory?" and get accurate results  
‚úÖ Enables faster dashboard creation - designers can drag and drop pre-defined metrics  
‚úÖ Maintains data lineage - Unity Catalog tracks where each metric comes from  

## What's Next?

Your metric views are the **foundation** for the rest of the week:

- **Next: Dashboard Deep Dive** - You'll build executive dashboards using these standardized metrics
- **Then: Genie Deep Dive** - These metrics enable Genie to answer complex business questions in natural language
- **Later: Predictive Models** - Your clean metrics feed into ML models for anomaly detection

The time you invest in semantic modeling pays off exponentially as your team builds on this foundation.

---

**Pro Tip:** Start simple! Define 3-5 key metrics that solve immediate pain points, then expand as you see value. Don't try to boil the ocean on day one.


## Try This Out üöÄ

Want to go deeper with semantic modeling? Try these exercises:

### 1. **Create Time-Based Metrics**
Define rolling averages and trends:
- 7-day rolling average temperature
- Month-over-month defect rate change
- Peak usage hours by factory

### 2. **Add Derived Dimensions**
Create calculated dimensions like:
- Device age buckets (0-1 year, 1-3 years, 3+ years)
- Temperature categories (Normal, Warning, Critical)
- Factory performance tiers based on defect rates

### 3. **Build Multiple Metric Views**
Create focused views for different use cases:
- `factory_kpis` - Management dashboard metrics
- `sensor_health` - Maintenance team metrics
- `quality_metrics` - Quality assurance metrics

### 4. **Test in Genie**
Go back to Genie and ask questions using your new metrics:
- "Show me the defect rate trend by model family for the last 30 days"
- "Which factory has the highest average temperature?"
- "Compare rotation speed across all regions"

### 5. **Document for Your Team**
Add rich documentation to your metric views:
- Business definitions for each metric
- Calculation logic and assumptions
- Data quality notes
- Links to subject matter experts

**Resources:**
- [Unity Catalog Metric Views Documentation](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-metric-view.html)
- [Semantic Modeling Best Practices](https://www.databricks.com/blog/introducing-metrics-views-unity-catalog)

