#Use Genie to query data with natural language

##Genie on GOLD Layer

Step-1 Objective

Demonstrate:

Natural language analytics

KPI-level reasoning

Trust in AI-generated SQL

##Open Genie 

Go to SQL

Open Genie

Select:

Catalog / Schema containing gold.events

SQL Warehouse (default)

⚠️ Do NOT write SQL manually.

#Explore Mosaic AI Features

Objective

Understand Mosaic AI and its role in Databricks

Demonstrate AI-assisted workflow integration using Free Edition compatible methods


Mosaic AI is Databricks’ framework for applying  AI tasks on structured and unstructured data. It allows analysts to gain insights (e.g., text classification, sentiment analysis) without training models from scratch. It can integrate with MLflow and lakehouse tables for reproducible analytics.

Key Points to Mention:

Works on top of curated tables

Enables AI-assisted analysis

Supports MLflow logging for traceability

#Mosaic AI / NLP Setup for Databricks

In [0]:
# Install Hugging Face transformers and PyTorch
%pip install --quiet transformers torch


In [0]:
%restart_python 

In [0]:
import torch
from transformers import pipeline
import mlflow


In [0]:
# Example product reviews
reviews = [
    "This product is amazing!",
    "Terrible quality, waste of money"
]

# Initialize sentiment-analysis pipeline
classifier = pipeline("sentiment-analysis")
results = classifier(reviews)

# Print results
print("Sentiment Analysis Results:", results)


In [0]:
# Log results to MLflow
with mlflow.start_run(run_name="mosaic_ai_demo"):
    mlflow.log_param("model", "distilbert-sentiment")
    # Log each review score as metric
    for i, r in enumerate(results):
        mlflow.log_metric(f"review_{i}_score", r['score'])


#Create AI-Powered Insights

## Step 4 – AI-Powered Insights & AI-Assisted Analysis

**Objective:** Combine Genie outputs and NLP/Mosaic AI results to generate actionable business insights.

### 1. Revenue & Conversion Analysis (Genie / Gold Table)
- **Business Questions:** Which products generate the most revenue? Which have the highest conversion rate? Which have views but no purchases?
- **AI Tool:** Databricks Genie
- **Insights:**
    - Focus marketing and inventory on high-revenue/high-conversion products
    - Investigate products with high views but no purchases to improve the funnel
- **Key Learning:** Genie accelerates KPI exploration and reduces manual SQL effort.

### 2. Customer Sentiment Analysis (Mosaic AI Simulation / NLP)
- **Business Question:** What is the sentiment of product reviews for November?
- **AI Tool:** Hugging Face sentiment-analysis pipeline (MLflow logged)
- **Insights:**
    - Positive reviews indicate customer satisfaction
    - Negative reviews highlight areas for product or service improvement
- **Key Learning:** Generative AI integration provides quick insights from unstructured data and ensures reproducibility via MLflow.

### 3. Final Reflection
- AI-assisted analysis **reduces manual effort**, accelerates insights, and enables analysts to focus on decision-making.
- Reliance on curated **lakehouse tables** ensures accurate and trustworthy outputs.
- MLflow integration ensures **reproducibility and governance**.


#Collect Genie Outputs
If you already ran Genie on Gold, you likely have outputs like Top products by revenue, conversion rate, and products with views but no purchases.

For demonstration, will simulate this in a small table:

In [0]:
# Example: Genie Gold Table Output (simulated)
import pandas as pd

genie_kpis = pd.DataFrame({
    "product_id": [17800248, 28718071, 10503474],
    "revenue": [590.05, 0.0, 0.0],
    "conversion_rate": [1.55, 0.0, 0.0],
    "views": [323, 7, 1],
    "purchases": [5, 0, 0]
})

genie_kpis


Insight:

High revenue + high conversion → focus on marketing & inventory

Views but no purchases → identify funnel drop-offs

#Collect NLP / Mosaic AI Results

Simulate the Results

In [0]:
# Example: NLP/Mosaic AI Sentiment Results
nlp_results = [
    {"review": "This product is amazing!", "label": "POSITIVE", "score": 0.999},
    {"review": "Terrible quality, waste of money", "label": "NEGATIVE", "score": 0.998}
]

nlp_df = pd.DataFrame(nlp_results)
nlp_df


Insight:

Positive reviews → product satisfaction

Negative reviews → areas to improve

#Simulated Combine Insights for Demo Purpose only

You can write simple logic to generate actionable insights automatically:

In [0]:
import pandas as pd

# Step 1: Genie Gold outputs (KPIs per product)
genie_kpis = pd.DataFrame({
    "product_id": [17800248, 28718071, 10503474],
    "revenue": [590.05, 0.0, 0.0],
    "conversion_rate": [1.55, 0.0, 0.0],
    "views": [323, 7, 1],
    "purchases": [5, 0, 0]
})

# Step 2: NLP / Mosaic AI sentiment per product (simulated)
nlp_reviews = pd.DataFrame({
    "product_id": [17800248, 28718071, 10503474, 10503474],
    "review": [
        "This product is amazing!", 
        "Terrible quality, waste of money", 
        "Not worth the price", 
        "Loved the design and features"
    ],
    "label": ["POSITIVE", "NEGATIVE", "NEGATIVE", "POSITIVE"],
    "score": [0.999, 0.998, 0.995, 0.990]
})

# Step 3: Aggregate sentiment per product
sentiment_summary = nlp_reviews.groupby('product_id').agg(
    total_reviews=('review','count'),
    positive_count=('label', lambda x: sum(x=="POSITIVE")),
    negative_count=('label', lambda x: sum(x=="NEGATIVE"))
).reset_index()

# Merge Genie KPIs with sentiment summary
combined = genie_kpis.merge(sentiment_summary, on="product_id", how="left").fillna(0)

# Step 4: Generate actionable insights per product
actionable_insights = []

for _, row in combined.iterrows():
    insight = f"Product {row['product_id']}: revenue={row['revenue']}, conversion_rate={row['conversion_rate']}, views={row['views']}, purchases={row['purchases']}. "
    if row['conversion_rate'] > 1.0 and row['negative_count'] > 0:
        insight += f"High conversion but negative reviews ({row['negative_count']}/{row['total_reviews']}), investigate product quality."
    elif row['conversion_rate'] > 1.0:
        insight += "High conversion and positive reception, continue promotion."
    elif row['purchases'] == 0 and row['views'] > 0:
        insight += "No purchases despite views, investigate funnel drop-off."
    else:
        insight += f"Mixed performance with {row['positive_count']} positive and {row['negative_count']} negative reviews."
    
    actionable_insights.append(insight)

# Step 5: Show insights
for i, insight in enumerate(actionable_insights,1):
    print(f"{i}. {insight}")
