# Semantic Modeling with Unity Catalog Metric Views

**Duration:** 30 minutes  
**Prerequisites:** Day 1 complete, tables created in Unity Catalog

---

## The Scenario

Leadership just dropped a bombshell: **"We need dashboards, interactive data Q&A, predictive models, and LLM systems for our aircraft IOT data by end of week."**

You have sensor data streaming from planes across multiple factories, inspection records, and anomaly detections. Business users want to ask questions like:
- "What's our average temperature by factory this month?"
- "Show me defect rates by model family"
- "Which devices are running hot?"

**The Problem:** Every dashboard, every Genie Space, every analyst is writing the same metric definitions differently:
- One person calculates `defect_rate` as `SUM(defects)/COUNT(*)`
- Another uses `AVG(CASE WHEN defect=1 THEN 1 ELSE 0 END)`
- Results don't match. Trust is lost. Deadlines slip.

**The Solution:** Build a **semantic layer** once, reuse everywhere.

---

## What is Semantic Modeling?

A **semantic layer** is a business-friendly abstraction over your raw data that:

1. **Defines metrics consistently** - One source of truth for "defect_rate", "avg_temperature", etc.
2. **Powers everything** - Dashboards, Genie, BI tools, and AI agents all consume the same definitions
3. **Enables self-service** - Business users can explore without writing SQL
4. **Ensures governance** - Metrics are documented, versioned, and controlled

### Unity Catalog Metric Views

Databricks implements semantic modeling through **Metric Views** - YAML files that define:

- **Entities**: Your core tables (e.g., `sensor_bronze`, `dim_devices`)
- **Measures**: Aggregations (e.g., `AVG(temperature)`, `COUNT(device_id)`)
- **Dimensions**: How to slice data (e.g., by factory, by model, by time)
- **Metrics**: Business KPIs composed from measures (e.g., `defect_rate`, `anomaly_percentage`)

Once defined, these metrics can be queried in:
- SQL: `SELECT * FROM metric_view WHERE factory='A06'`
- Dashboards: Visual drag-and-drop
- Genie: "Show me average temperature by factory"
- AI Agents: Natural language to data

---

## Follow Along with the Demo

Rather than reinvent the wheel, Databricks has an **excellent interactive demo** that walks through creating metric views step-by-step.

### ðŸŽ¯ Your Task

**Follow along with this demo:** [Metric Views with Unity Catalog Demo](https://www.databricks.com/resources/demos/tours/governance/metric-views-with-uc?itm_data=demo_center)

**BUT** - adapt it to **your IOT aircraft data** instead of their sample data.

### Use These Tables/Metrics:

| Demo Concept | Your IOT Tables | Your Metrics |
|--------------|-----------------|--------------|
| **Main Entity** | `sensor_bronze` | `avg_temperature`, `avg_rotation_speed`, `device_count` |
| **Dimension Table** | `dim_factories` | `factory_name`, `region` |
| **Dimension Table** | `dim_models` | `model_family`, `model_category` |
| **Time Dimension** | `timestamp` column | Date hierarchies (day, week, month) |
| **Advanced Metric** | `inspection_gold` | `defect_rate` = `defects / total_inspections` |

### Key Concepts to Watch For

As you work through the demo, pay attention to:

1. **YAML Structure** - How entities, measures, dimensions, and metrics are defined
2. **Type Annotations** - `type: count`, `type: average`, `type: dimension`
3. **Aggregation Attributes** - `agg: sum`, `agg: avg`, `agg: count`
4. **Dimension Types** - Categorical vs time-based
5. **Composability** - How complex metrics build on simpler ones
6. **Querying** - Both SQL and natural language access


## Example: Complete IOT Sensor Metric View

Below is a **complete example** of a metric view YAML for your aircraft IOT data. Use this as a reference while following the demo.

**File name:** `iot_sensor_metrics.yaml`

```yaml
# IOT Aircraft Sensor Metrics
# Unity Catalog Metric View Definition

metrics:
  # Sensor Performance Metrics
  - name: avg_temperature
    description: Average temperature reading across all sensors
    type: metric
    sql: AVG(temperature)
    sql_table: sensor_bronze
    
  - name: avg_rotation_speed
    description: Average rotation speed of devices
    type: metric
    sql: AVG(rotation_speed)
    sql_table: sensor_bronze
    
  - name: avg_air_pressure
    description: Average air pressure reading
    type: metric
    sql: AVG(air_pressure)
    sql_table: sensor_bronze
    
  # Count Metrics
  - name: device_count
    description: Total number of unique devices reporting data
    type: metric
    sql: COUNT(DISTINCT device_id)
    sql_table: sensor_bronze
    
  - name: reading_count
    description: Total number of sensor readings
    type: metric
    sql: COUNT(*)
    sql_table: sensor_bronze
    
  # Anomaly Metrics
  - name: anomaly_count
    description: Number of temperature anomalies detected
    type: metric
    sql: COUNT(*)
    sql_table: anomaly_detected
    
  - name: anomaly_rate
    description: Percentage of readings that are anomalies
    type: metric
    sql: |
      (SELECT COUNT(*) FROM anomaly_detected) * 100.0 / 
      (SELECT COUNT(*) FROM sensor_bronze)
    
  # Defect Metrics
  - name: defect_count
    description: Number of defects detected in inspections
    type: metric
    sql: SUM(CASE WHEN defect = 1 THEN count ELSE 0 END)
    sql_table: inspection_gold
    
  - name: total_inspections
    description: Total number of inspections performed
    type: metric
    sql: SUM(count)
    sql_table: inspection_gold
    
  - name: defect_rate
    description: Percentage of inspections that found defects
    type: metric
    sql: |
      ROUND(100.0 * SUM(CASE WHEN defect = 1 THEN count ELSE 0 END) / 
      SUM(count), 2)
    sql_table: inspection_gold

dimensions:
  # Factory Dimensions
  - name: factory_id
    description: Factory identifier
    type: dimension
    sql: factory_id
    sql_table: dim_factories
    
  - name: factory_name
    description: Full name of the factory
    type: dimension
    sql: factory_name
    sql_table: dim_factories
    
  - name: region
    description: Geographic region of factory
    type: dimension
    sql: region
    sql_table: dim_factories
    
  - name: city
    description: City where factory is located
    type: dimension
    sql: city
    sql_table: dim_factories
    
  # Model Dimensions
  - name: model_family
    description: Product family of aircraft model
    type: dimension
    sql: model_family
    sql_table: dim_models
    
  - name: model_category
    description: Category of aircraft model
    type: dimension
    sql: model_category
    sql_table: dim_models
    
  - name: model_name
    description: Full model name
    type: dimension
    sql: model_name
    sql_table: dim_models
    
  # Time Dimensions
  - name: reading_date
    description: Date of sensor reading
    type: dimension
    sql: DATE(timestamp)
    sql_table: sensor_bronze
    
  - name: reading_hour
    description: Hour of sensor reading
    type: dimension
    sql: DATE_TRUNC('hour', timestamp)
    sql_table: sensor_bronze
    
  - name: reading_week
    description: Week of sensor reading
    type: dimension
    sql: DATE_TRUNC('week', timestamp)
    sql_table: sensor_bronze
    
  - name: reading_month
    description: Month of sensor reading
    type: dimension
    sql: DATE_TRUNC('month', timestamp)
    sql_table: sensor_bronze
    
  # Device Dimensions
  - name: device_status
    description: Current device status (Active/Maintenance)
    type: dimension
    sql: status
    sql_table: dim_devices

relationships:
  # Connect fact tables to dimensions via foreign keys
  - from_table: sensor_bronze
    to_table: dim_devices
    join_type: left
    join_condition: sensor_bronze.device_id = dim_devices.device_id
    
  - from_table: dim_devices
    to_table: dim_factories
    join_type: left
    join_condition: dim_devices.factory_id = dim_factories.factory_id
    
  - from_table: dim_devices
    to_table: dim_models
    join_type: left
    join_condition: dim_devices.model_id = dim_models.model_id
    
  - from_table: inspection_gold
    to_table: dim_devices
    join_type: left
    join_condition: inspection_gold.device_id = dim_devices.device_id
```

### How to Use This YAML

1. **In Databricks Workspace:**
   - Navigate to **Data** > **Create** > **Metric View**
   - Paste the YAML above
   - Update catalog/schema names to match your environment
   - Save as `iot_sensor_metrics`

2. **Query It:**
   ```sql
   -- Get average temperature by factory
   SELECT 
     factory_name,
     region,
     avg_temperature,
     device_count
   FROM iot_sensor_metrics
   WHERE reading_date >= CURRENT_DATE() - INTERVAL 7 DAYS
   GROUP BY factory_name, region
   ORDER BY avg_temperature DESC
   ```

3. **Use in Genie:**
   - "Show me defect rates by model family"
   - "What's the anomaly rate for Factory A06 this week?"
   - "Which devices have the highest rotation speed?"


## Why This Matters for Your Week

By creating this semantic layer **NOW**, you've just set yourself up for success:

### âœ… Tuesday - Dashboards Deep Dive
Your dashboards will reference `iot_sensor_metrics` - all charts show consistent numbers.

### âœ… Wednesday - Genie Deep Dive  
Your Genie Space will use these trusted metrics - natural language queries will be accurate.

### âœ… Thursday - Agent Bricks
Your AI agents will pull from the semantic layer - answers are trustworthy.

### âœ… Friday - Demo to Leadership
All systems use the same metrics - no discrepancies, full confidence.

---

## Next Steps

1. **Complete the demo walkthrough** using your IOT tables
2. **Create your first metric view** in your Databricks workspace
3. **Test queries** both in SQL and natural language
4. **Document your metrics** for your team

---

## Try This Out (Extended Practice)

If you finish early or want to go deeper:

### 1. Add More Complex Metrics

Create composite metrics that combine multiple measures:

```yaml
- name: operational_health_score
  description: Combined score of temperature, pressure, and defect rate
  type: metric
  sql: |
    100 - (
      (avg_temperature / 100) * 30 +
      (defect_rate) * 50 +
      (anomaly_rate) * 20
    )
```

### 2. Create Device-Level Metrics

Add metrics focused on individual device performance:

```yaml
- name: device_uptime_days
  description: Days since device installation
  type: metric
  sql: DATEDIFF(CURRENT_DATE(), installation_date)
  sql_table: dim_devices
```

### 3. Add Conditional Metrics

Create metrics with business logic:

```yaml
- name: high_temp_alerts
  description: Count of readings exceeding 95 degrees
  type: metric
  sql: COUNT(CASE WHEN temperature > 95 THEN 1 END)
  sql_table: sensor_bronze
```

### 4. Explore Time Intelligence

Add period-over-period comparisons:

```sql
-- Week-over-week temperature change
SELECT 
  factory_name,
  reading_week,
  avg_temperature,
  avg_temperature - LAG(avg_temperature) OVER (
    PARTITION BY factory_name 
    ORDER BY reading_week
  ) as temp_change
FROM iot_sensor_metrics
```

### 5. Test Different Aggregation Levels

Query the same metrics at different granularities:

```sql
-- Factory level
SELECT factory_name, avg_temperature, defect_rate
FROM iot_sensor_metrics
GROUP BY factory_name

-- Model level
SELECT model_family, avg_temperature, defect_rate
FROM iot_sensor_metrics
GROUP BY model_family

-- Time level
SELECT reading_date, avg_temperature, defect_rate
FROM iot_sensor_metrics
GROUP BY reading_date
ORDER BY reading_date DESC
```


## Key Takeaways

âœ… **Semantic layers ensure consistency** - One source of truth for all metrics  
âœ… **Metric views are reusable** - Power dashboards, Genie, AI agents, and more  
âœ… **YAML is declarative** - Define once, query anywhere  
âœ… **Dimensions enable slicing** - Analyze by factory, model, time, etc.  
âœ… **Composability is powerful** - Build complex metrics from simple ones  

---

## Resources

- **Demo Center:** [Metric Views with Unity Catalog](https://www.databricks.com/resources/demos/tours/governance/metric-views-with-uc?itm_data=demo_center)
- **Documentation:** [Unity Catalog Metric Views](https://docs.databricks.com/aws/en/metric-views/)
- **Best Practices:** [Semantic Layer Design Patterns](https://docs.databricks.com/aws/en/metric-views/best-practices.html)
- **Video:** [Building Semantic Layers on Databricks](https://www.youtube.com/databricks)

---

**Next Notebook:** Day 2.2 - Dashboards Deep Dive (where you'll use these metrics!)
