# Genie Deep Dive: Making Your Semantic Model Conversational

Now that you've created a semantic model with metric views, it's time to make it conversational with Genie. Leadership wants you to enable natural language queries so anyone can ask questions about the IoT data without writing SQL.

## What You'll Learn

âœ… Create a Genie Space for your semantic model  
âœ… Add instructions to guide query generation  
âœ… Define filters for common questions  
âœ… Test with a benchmark query  

---

## The Scenario

Your semantic model (`iot_sensor_metrics`) is ready, but the operations team doesn't know SQL. They need to ask questions like:
- "What's the average temperature in the West region?"
- "Show me devices with high anomaly rates"
- "Which factory has the most critical readings?"

Genie will translate these natural language questions into queries against your semantic model.

---

## What is Genie?

**Genie** is Databricks' AI-powered natural language interface for querying data. It uses your semantic model and additional context to generate accurate SQL queries.

**Key Components for This Exercise:**
- **Genie Space**: Container pointing to your metric view
- **Instructions**: Business context and terminology
- **Filters**: Pre-defined query shortcuts
- **Benchmark**: Test question to verify accuracy

---

**References:**
- [Genie Setup](https://docs.databricks.com/aws/en/genie/set-up)
- [Best Practices](https://docs.databricks.com/aws/en/genie/best-practices)

## Step 1: Create Your Genie Space  

### What is a Genie Space?

A **Genie Space** is a conversational AI interface built on top of your data. It combines your semantic model (metric views) with business context to enable natural language querying. Think of it as creating a chatbot specifically trained on your IoT sensor data.

### Create a Semantic Model Genie

Let's build a Genie Space that connects to the metric view you created in the previous notebook.

**In the Databricks UI:**

1. **Open Genie**
   - Click **Genie** in the left sidebar

2. **Start a new space**
   - Click **New** to create a new Genie Space

3. **Select your data source**
   - Choose **Metric views** as the data source type
   - Browse to and select: `dwx_airops_insights_platform_dev_working.{username}.sensor_metrics`
     - This is the metric view you created in Notebook 1
     - Replace `{username}` with your actual schema name

4. **Create the space**
   - Give your space a name (e.g., "IoT Sensor Analysis")
   - Click **Create**

### Why Use the Metric View as Your Foundation?

Using your metric view instead of raw tables provides several advantages:

- âœ… **Pre-defined metrics**: Genie already understands "Average Airflow Rate" and "Max Temperature"
- âœ… **Consistent business logic**: Everyone gets the same calculations
- âœ… **Faster queries**: Metric views are optimized for aggregations
- âœ… **Better accuracy**: Clear metric definitions reduce ambiguity

**What's next:** Now that your Genie Space is created, we'll teach it about your business context and terminology.

---

**ðŸ’¡ Tip**: Keep it simple at first. Start with just the metric view - you can always add more tables later as needs evolve.

## Step 2: Add Instructions and Business Context

### Why Instructions Matter

While Genie can understand SQL and your table schemas, it doesn't automatically know your business rules, thresholds, or domain-specific terminology. **Instructions** bridge this gap by teaching Genie about your specific business context.

**Think of instructions as:**
- A training manual for Genie
- Business rules and definitions
- Domain-specific terminology
- Guardrails for query generation

### Add Business Context to Your Genie Space

Let's teach Genie about aircraft sensor thresholds and operating ranges so it can answer questions accurately.

**In your Genie Space:**

1. **Navigate to Instructions**
   - Click the **Instructions** tab at the top of your Genie Space

2. **Add text instructions**
   - In the instructions editor, add the following context:

```
You are helping analyze data from aircraft sensors deployed across models.

### Business Rules
- Critical threshold: Temperature > 85Â°F
- Normal operating range: 60-75Â°F
```

3. **Save your instructions**
   - Click **Save** to apply the business context

### What This Accomplishes

With these instructions, Genie now understands:

- âœ… **Context**: This is aircraft sensor data, not generic IoT data
- âœ… **Critical thresholds**: What "critical" or "at-risk" means for temperature
- âœ… **Normal ranges**: What constitutes normal vs. abnormal readings

**Example impact:** When a user asks "Show me critical devices," Genie will automatically filter for temperature > 85Â°F without the user needing to specify the threshold.

---

**ðŸ’¡ Best Practice**: Keep instructions concise and focused. Too much information can confuse Genie. Add the most important business rules first, then iterate based on actual usage.

## Step 3: Define Example SQL Queries

### Why Provide Example Queries?

Example SQL queries serve as **templates** that teach Genie how to write queries for common business questions. When users ask questions similar to your examples, Genie can adapt the pattern to generate accurate SQL.

**Benefits of example queries:**
- Establishes patterns for complex queries
- Defines business terminology in SQL terms
- Improves accuracy for domain-specific questions
- Creates reusable query templates

### Add a Business-Specific Query Example

Let's teach Genie what "at-risk devices" means in the context of your IoT system. This will enable users to ask questions using business terminology instead of technical SQL filters.

**In your Genie Space:**

1. **Navigate to SQL Query examples**
   - Stay in the **Instructions** tab
   - Select **SQL Queries** from the instruction types

2. **Add the example query**
   - Give it a descriptive name: **"What are our at-risk devices?"**
   - Paste the following SQL:

```sql
SELECT 
  `Device ID`,
  `Model ID`,
  `Factory ID`,
  `Date`,
  MEASURE(`Max Temperature`) AS `Max Temperature`,
  MEASURE(`Record Count`) AS `Record Count`,
  MEASURE(`Total Delay`) AS `Total Delay`,
  MEASURE(`Average Airflow Rate`) AS `Average Airflow Rate`
FROM sensor_metrics 
GROUP BY ALL
HAVING `Max Temperature` > 85
ORDER BY `Max Temperature` DESC;
```

3. **Save the example**
   - Click **Save** to add this query as a trusted example

### How This Example Helps

With this example defined, Genie learns:

- âœ… **Business terminology**: "At-risk devices" = devices with max temperature > 85Â°F
- âœ… **Metric view syntax**: How to use `MEASURE()` with your metric view
- âœ… **Query pattern**: The structure for filtering and sorting sensor metrics

**Real-world impact:** Now when users ask "Show me at-risk devices" or "Which devices are having issues?", Genie will understand they mean devices exceeding the critical temperature threshold and will generate SQL similar to your example.

---

**ðŸ’¡ Pro Tip**: Add 2-3 example queries that represent your most common analysis patterns. Don't add too many - focus on the questions users ask repeatedly.

## Step 4: Configure Table Relationships

### Why Add Joins?

Your metric view contains sensor measurements, but users will often ask questions that require additional context like "Which model category has the highest temperature?" To answer these questions, Genie needs to understand how to join your metric view to dimension tables.

**What this accomplishes:**
- Enables queries that span multiple tables (e.g., "Show me temperature by model category")
- Teaches Genie the relationships between your data tables
- Allows natural language questions about dimensions not in the metric view

### Add the Model Dimension Table

Let's teach Genie how to join the `dim_models` table so users can ask questions about model categories, families, and other model attributes.

**In the Genie Space UI:**

1. **Navigate to the Data tab**
   - Click the **Data** tab at the top of your Genie Space

2. **Add the dimension table**
   - Click **+ Add** to add a new data source
   - Select the **dim_models** table from the catalog browser
   - Click **Add** to include it in your Genie Space

3. **Configure the join relationship**
   - Click the **Instructions** tab at the top
   - In the left sidebar, click **Joins**
   - Configure the join with these settings:
     - **Left Table**: `sensor_metrics` (your metric view)
     - **Right Table**: `dim_models` (the dimension table you just added)
     - **Join Condition**: `Model ID` = `model_id`
     - **Relationship Type**: Many to One
     - **Instructions**: "get more information about the models from the metric view"

**What you've configured:**
- **Many to One relationship**: Many sensor readings can belong to one model
- **Join condition**: Links the Model ID field in your metric view to the model_id primary key in dim_models
- **Instructions**: Tells Genie when and why to use this join

### Test the Join

Try asking Genie a question that requires this join:
- "What is the average temperature by model category?"
- "Which model family has the highest airflow rate?"

Genie will now automatically join to `dim_models` when needed to answer these questions!

## Step 5: Create and Run Benchmarks

### What are Benchmarks?

**Benchmarks** are test questions with expected answers that validate Genie's accuracy. Think of them as unit tests for your Genie Space - they help you measure quality and catch issues before users encounter them.

**Why benchmarks matter:**
- Verify Genie generates correct SQL for complex questions
- Catch accuracy regressions when you update instructions
- Document expected behavior for common queries
- Build confidence before rolling out to users

### Create Your First Benchmark

Let's create a benchmark that tests whether Genie can correctly combine multiple concepts: joins, business terminology, and aggregations.

**In your Genie Space:**

1. **Navigate to Benchmarks**
   - Click the **Benchmarks** tab at the top of your Genie Space

2. **Add a new benchmark**
   - Click **+ Add Benchmark** (or similar button to create a test)

3. **Configure the benchmark question**
   - **Question**: `What was the most common model category for at-risk devices?`
   - **Expected Answer Type**: One model category and the total number of at-risk devices
   - **Why this tests multiple features**:
     - Uses the metric view to identify temperature thresholds
     - Requires a join to `dim_models` to get model category
     - Tests understanding of your "at-risk" business terminology from the example query
     - Validates aggregation and grouping logic

4. **Save the benchmark**
   - Click **Save** to add this test to your benchmark suite

### Run and Evaluate the Benchmark

Now let's see how Genie performs:

1. **Execute the benchmark**
   - Click **Run Benchmark** to have Genie generate SQL and execute the query

2. **Review the generated SQL**
   - Examine the query Genie created
   - Verify it joins to `dim_models` correctly
   - Check that it filters for temperature > 85Â°F (your "at-risk" definition)
   - Confirm it groups by model category

3. **Validate the results**
   - Check if the answer makes sense given your data
   - Does it return one model category as expected?
   - Is the count of at-risk devices reasonable?

### If the Benchmark Fails

If Genie doesn't generate the correct query or returns unexpected results:

1. **Analyze the issue**
   - Did it miss the join to dim_models?
   - Did it not understand "at-risk"?
   - Was the aggregation wrong?

2. **Update your instructions**
   - Add more context about model categories
   - Clarify the at-risk definition
   - Provide additional example queries

3. **Re-run the benchmark**
   - Test again after making changes
   - Iterate until the benchmark passes

**Success Criteria**: Genie generates semantically correct SQL and returns a reasonable answer (e.g., "Turbine: 23 devices" or similar).

---

**ðŸ’¡ Best Practice**: Create 3-5 benchmarks covering your most important query patterns. Run them regularly, especially after updating instructions or adding new data sources.

## Summary and Next Steps

### What You Accomplished

âœ… **Created a Genie Space** pointing to your semantic model  
âœ… **Added instructions** with business context and terminology  
âœ… **Defined a join key** for common query patterns  
âœ… **Tested with a benchmark** to verify accuracy  

### Your Genie Space is Now Ready!

Users can now ask questions like:
- "What's the average temperature by factory?"
- "What was the most common model_category for at-risk devices?"
- "What are the critical temperatures in the last week?"

No SQL required! Genie translates these to queries against your `iot_sensor_metrics` semantic model.

---

## Try This Out (Optional Extensions)

Want to go further? Try these enhancements:

### 1. Add More Filters
Create filters for:
- Different regions (East, Central, International)
- Time ranges (last 30 days, last month)
- Model categories

### 2. Create Additional Benchmarks
Test more complex scenarios:
- "Which factory has the highest average temperature?"
- "Show me device count trends over time"
- "Compare West vs East regions"

### 3. Add Sample Q&A
In the Instructions, add example questions and expected results to improve accuracy.

### 4. Add SQL Query Examples as Trusted Assets

SQL Query Examples help Genie understand how to write queries for your specific data model. Add them as trusted assets:

**Reference:** [SQL Query Examples Documentation](https://docs.databricks.com/aws/en/genie/trusted-assets)

### 5. Join to More Tables
If you add `dim_factories`, test questions that need factory details:
- "What's the temperature at factories in California?"
- "Show me metrics for factories with more than 100 devices"

---

## Key Takeaways

**Semantic Model + Genie = Self-Service Analytics**
1. **Semantic model** (notebook 1) defines consistent metrics
2. **Genie** (this notebook) makes them conversational
3. **Result**: Non-technical users can explore data

**Start Simple, Iterate**
- Begin with essential context
- Add filters as users need them
- Use benchmarks to measure quality
- Improve based on actual usage


---

**Additional Resources:**
- [Genie Documentation](https://docs.databricks.com/aws/en/genie/)
- [Best Practices](https://docs.databricks.com/aws/en/genie/best-practices)
- [Knowledge Store Guide](https://docs.databricks.com/aws/en/genie/knowledge-store)
