# Semantic Modeling with Unity Catalog Metric Views

**Semantic modeling** enables you to define reusable business metrics and dimensions once, ensuring consistency across all analytics and dashboards. Unity Catalog **Metric Views** provide a declarative way to define your semantic layer using YAML.

## What You'll Learn

✅ Create semantic models with metric views  
✅ Use Databricks Assistant to generate YAML definitions  
✅ Define metrics, dimensions, and measures  
✅ Implement composability and reusability  
✅ Query semantic models with natural language  

---

## Why Semantic Modeling?

**Problem without semantic modeling:**
- Same metrics calculated differently across teams
- Complex SQL repeated in every dashboard
- Inconsistent business definitions
- Hard to maintain and update

**Solution with semantic modeling:**
- Single source of truth for metrics
- Reusable business logic
- Natural language queries
- Automatic joins and aggregations

---

## Use Case: IoT Semantic Layer

We'll create a semantic layer for our IoT dataset that:
- Defines standard metrics (avg temperature, device count, uptime %)
- Creates reusable dimensions (factory, model, time)
- Enables natural language queries
- Ensures metric consistency across dashboards

---

## Table of Contents

1. [Understanding Metric Views](#understanding)
2. [Using Databricks Assistant](#assistant)
3. [Creating Your First Metric View](#first-metric)
4. [Defining Metrics and Measures](#metrics)
5. [Adding Dimensions](#dimensions)
6. [Composability and Relationships](#composability)
7. [Querying Metric Views](#querying)
8. [Best Practices](#best-practices)

---

**References:**
- [Demo: Metric Views with UC](https://www.databricks.com/resources/demos/tours/governance/metric-views-with-uc)
- [Create Metric Views UI](https://docs.databricks.com/aws/en/metric-views/create/ui)
- [Metric Views Documentation](https://docs.databricks.com/aws/en/metric-views/)
- [Data Modeling](https://docs.databricks.com/aws/en/metric-views/data-modeling/)
- [Composability](https://docs.databricks.com/aws/en/metric-views/data-modeling/composability)

## 1. Understanding Metric Views <a id="understanding"></a>

### What are Metric Views?

Metric views are Unity Catalog objects that define:
- **Metrics**: Business KPIs (revenue, count, average)
- **Dimensions**: Attributes for slicing data (time, location, category)
- **Measures**: Base calculations for metrics
- **Relationships**: How entities connect

### Key Concepts

**Semantic Layer Benefits:**
- **Consistency**: One definition, used everywhere
- **Discoverability**: Business users can find metrics easily
- **Governance**: Control who can see and use metrics
- **Performance**: Pre-aggregated and optimized
- **Natural Language**: Query with plain English

**Architecture:**

```
Physical Tables (Unity Catalog)
         ↓
Metric Views (YAML Definitions)
         ↓
Dashboards / Genie / SQL Queries
```

### Metric View Components

**1. Entities**: Data models mapped to tables
```yaml
entities:
  - name: sensor_readings
    sql_table: default.db_crash_course.sensor_enriched
```

**2. Measures**: Calculations on entity columns
```yaml
measures:
  - name: avg_temperature
    expr: AVG(temperature)
```

**3. Dimensions**: Attributes for grouping
```yaml
dimensions:
  - name: factory_name
    expr: factory_name
```

**4. Metrics**: High-level business KPIs
```yaml
metrics:
  - name: average_temperature
    type: simple
    measure: avg_temperature
```

## 2. Using Databricks Assistant <a id="assistant"></a>

### AI-Assisted Semantic Modeling

**Databricks Assistant** can automatically generate metric view YAML definitions from natural language descriptions or existing tables.

**This is the recommended approach** - let AI do the heavy lifting instead of manually typing YAML!

### How to Use the Assistant

**Step 1: Open Catalog Explorer**
1. Click **Catalog** in the sidebar
2. Navigate to your schema: `default.db_crash_course`
3. Click **Create** → **Metric View**

**Step 2: Use AI to Generate Definition**

In the metric view editor, you can:

**Option A: Ask Assistant to Generate from Table**
```
Prompt: "Create a metric view for the sensor_enriched table with metrics 
for average temperature, device count, and total readings. Include dimensions 
for factory, model, and date."
```

**Option B: Describe Your Business Metrics**
```
Prompt: "I need a semantic model for IoT sensor analytics. 
Key metrics: average temperature by factory, device uptime percentage, 
anomaly rate. Dimensions: factory location, device model, time (daily/weekly)."
```

**Option C: Generate from Example Query**
```
Prompt: "Convert this SQL to a metric view:

SELECT 
  factory_name,
  AVG(temperature) as avg_temp,
  COUNT(DISTINCT device_id) as device_count
FROM sensor_enriched
GROUP BY factory_name
"
```

### Benefits of AI Generation

✅ **Fast**: Generate complete YAML in seconds  
✅ **Accurate**: Based on actual table schemas  
✅ **Learning**: See best practices in generated code  
✅ **Iterative**: Refine with follow-up prompts  

### Assistant Best Practices

**Be specific with:**
- Table names and columns
- Aggregation types (SUM, AVG, COUNT)
- Dimension granularity
- Business context

**Example good prompt:**
```
"Create a metric view for factory performance analysis using the 
sensor_enriched table. Include:
- Metrics: average temperature (AVG), max temperature, device count (COUNT DISTINCT)
- Dimensions: factory_name, region, model_category
- Time dimension: by day from timestamp column
- Calculate temperature_celsius = (temperature - 32) * 5/9 as a derived dimension"
```

## 3. Creating Your First Metric View <a id="first-metric"></a>

### Example: IoT Sensor Metrics

Here's a complete example of a metric view for our IoT dataset.

**In practice, use Databricks Assistant to generate this** - this is shown for educational purposes.

In [None]:
# Configuration
CATALOG = "default"
SCHEMA = "db_crash_course"

# Example YAML structure for a metric view
# In practice: Use Databricks Assistant in Catalog Explorer to generate this

yaml_example = """
version: 1
name: iot_sensor_metrics
description: "Semantic model for IoT sensor analytics"

entities:
  - name: sensors
    sql_table: default.db_crash_course.sensor_enriched
    primary_key: device_id
    
    measures:
      - name: total_readings
        expr: COUNT(*)
      - name: avg_temperature
        expr: AVG(temperature)
      - name: max_temperature
        expr: MAX(temperature)
      - name: device_count
        expr: COUNT(DISTINCT device_id)
    
    dimensions:
      - name: factory_name
        expr: factory_name
      - name: region
        expr: region
      - name: model_category
        expr: model_category
      - name: reading_date
        expr: DATE(timestamp)
        type: time
        granularity: day

metrics:
  - name: average_temperature
    description: "Average temperature across all sensors"
    type: simple
    entity: sensors
    measure: avg_temperature
  - name: total_sensor_readings
    description: "Total number of sensor readings"
    type: simple
    entity: sensors
    measure: total_readings
  - name: active_devices
    description: "Number of distinct active devices"
    type: simple
    entity: sensors
    measure: device_count
"""

print("Basic Metric View Structure:")
print(yaml_example)
print("\n✅ Use Databricks Assistant to generate this automatically!")


## 4. Defining Metrics and Measures <a id="metrics"></a>

### Types of Metrics

**1. Simple Metrics**: Direct measure aggregations
```yaml
metrics:
  - name: average_temperature
    type: simple
    entity: sensors
    measure: avg_temperature
```

**2. Derived Metrics**: Calculated from other metrics
```yaml
metrics:
  - name: anomaly_rate
    type: derived
    expr: anomaly_count / total_readings * 100
    description: "Percentage of readings that are anomalies"
```

**3. Ratio Metrics**: Proportions and rates
```yaml
metrics:
  - name: critical_reading_rate
    type: ratio
    numerator: critical_readings
    denominator: total_readings
    format: percent
```

### Common Measure Patterns

```yaml
measures:
  # Counts
  - name: row_count
    expr: COUNT(*)
  - name: unique_devices
    expr: COUNT(DISTINCT device_id)
  
  # Aggregations
  - name: avg_temp
    expr: AVG(temperature)
  - name: max_temp
    expr: MAX(temperature)
  
  # Conditional
  - name: critical_count
    expr: COUNT(CASE WHEN temperature > 85 THEN 1 END)
```

## 5. Adding Dimensions <a id="dimensions"></a>

### Types of Dimensions

**Categorical Dimensions:**
```yaml
dimensions:
  - name: factory_name
    expr: factory_name
  - name: temperature_zone
    expr: |
      CASE 
        WHEN temperature > 85 THEN 'Critical'
        WHEN temperature > 75 THEN 'Warning'
        ELSE 'Normal'
      END
```

**Time Dimensions:**
```yaml
dimensions:
  - name: reading_date
    expr: DATE(timestamp)
    type: time
    granularity: day
  - name: reading_month
    expr: DATE_TRUNC('month', timestamp)
    type: time
    granularity: month
```

**Hierarchical Dimensions:**
```yaml
dimensions:
  # Geographic hierarchy
  - name: region
    hierarchy_level: 1
  - name: state
    hierarchy_level: 2
    parent: region
  - name: city
    hierarchy_level: 3
    parent: state
  - name: factory_name
    hierarchy_level: 4
    parent: city
```


## 6. Composability and Relationships <a id="composability"></a>

### Entity Relationships

Define how entities connect for automatic joins:

```yaml
entities:
  - name: sensor_readings
    sql_table: sensor_enriched
    primary_key: [device_id, timestamp]
  - name: factories
    sql_table: dim_factories
    primary_key: factory_id

relationships:
  - from: sensor_readings
    to: factories
    type: many_to_one
    join_on:
      - from_column: factory_id
        to_column: factory_id
```

### Composable Metrics

Build complex metrics from simpler ones:

```yaml
metrics:
  # Base metrics
  - name: total_readings
    type: simple
    measure: row_count
  - name: anomaly_count
    type: simple
    measure: anomaly_readings
  
  # Composed from base metrics
  - name: anomaly_rate
    type: derived
    expr: anomaly_count / total_readings * 100
```

**Benefits:**
- Define once, reuse everywhere
- Update in one place propagates to all consumers
- Build sophisticated metrics incrementally

## 7. Querying Metric Views <a id="querying"></a>

### SQL Queries

Once created, query metric views like regular tables:


In [None]:
# Query metric views using standard SQL
# The semantic layer handles aggregations and joins automatically

query_examples = """
-- Simple metric query
SELECT 
  average_temperature,
  device_count,
  total_readings
FROM default.db_crash_course.iot_sensor_metrics;

-- Slice by dimension
SELECT 
  factory_name,
  average_temperature,
  device_count
FROM default.db_crash_course.iot_sensor_metrics
WHERE region = 'West'
ORDER BY average_temperature DESC;

-- Time series
SELECT 
  reading_date,
  factory_name,
  average_temperature
FROM default.db_crash_course.iot_sensor_metrics
WHERE reading_date >= CURRENT_DATE - INTERVAL 30 DAYS
ORDER BY reading_date, factory_name;
"""

print("SQL Query Examples:")
print(query_examples)
print("\n✅ Automatic aggregations and joins handled by the semantic layer!")

# Uncomment to run actual queries after creating the metric view
# spark.sql("SELECT * FROM default.db_crash_course.iot_sensor_metrics LIMIT 10").display()


## 8. Best Practices <a id="best-practices"></a>

### Naming Conventions

**Good Examples:**
- `average_temperature_fahrenheit`
- `total_sensor_readings`
- `active_device_count`
- `anomaly_detection_rate`

**Avoid:**
- `temp` (too vague)
- `total` (what total?)
- `count` (count of what?)
- `rate` (which rate?)

### Performance Optimization

1. **Use Pre-Aggregated Tables**
   - Create gold layer aggregates
   - Point metric views to pre-aggregated data

2. **Partition Large Tables**
   ```yaml
   entities:
     - name: sensors
       sql_table: sensor_enriched
       partition_columns: [reading_date]
   ```

3. **Define Filters Early**
   ```yaml
   entities:
     - name: recent_sensors
       sql_table: sensor_enriched
       filter: reading_date >= CURRENT_DATE - 90
   ```

### Governance

1. **Document Everything**
   ```yaml
   metrics:
     - name: average_temperature
       description: |
         Average temperature in Fahrenheit.
         Calculation: AVG(temperature)
         Business Owner: Operations Team
         Last Updated: 2025-01-15
   ```

2. **Version Control**
   - Store YAML in Git
   - Use semantic versioning
   - Document changes

3. **Access Control**
   - Grant appropriate permissions via Unity Catalog
   - Limit write access to definitions

## Summary

In this notebook, you learned:

✅ **Semantic modeling** - Create reusable business metrics  
✅ **Metric views** - Unity Catalog objects for consistent definitions  
✅ **Databricks Assistant** - AI-generated YAML definitions  
✅ **Metrics and measures** - Building blocks of semantic layer  
✅ **Dimensions** - Categorical, time-based, hierarchical  
✅ **Composability** - Build complex from simple metrics  
✅ **Querying** - SQL and natural language access  

### Key Takeaways:

1. **Use Databricks Assistant** - Don't manually type YAML
2. **Single source of truth** - Define metrics once, use everywhere
3. **Composable** - Build complex metrics from simple ones
4. **Automatic joins** - Define relationships once
5. **Natural language** - Enable business user self-service

### Next Steps:

- Use Assistant to create your first metric view
- Query it with SQL and Genie
- Build dashboards using metric views
- Implement hierarchical dimensions

---

**Additional Resources:**
- [Metric Views Demo](https://www.databricks.com/resources/demos/tours/governance/metric-views-with-uc)
- [Create Metric Views](https://docs.databricks.com/aws/en/metric-views/create/ui)
- [Data Modeling](https://docs.databricks.com/aws/en/metric-views/data-modeling/)
- [Composability](https://docs.databricks.com/aws/en/metric-views/data-modeling/composability)
