# Babblr Learning Analytics Dashboard

This notebook provides an interactive dashboard for monitoring student progress, lesson effectiveness, and learning patterns.

**Dashboard Sections:**
1. Executive Summary KPIs
2. Learning Activity Trends
3. CEFR Level Distribution
4. Lesson Effectiveness
5. Student Segments
6. Topic Engagement

---

**Prerequisites:** Run notebooks 01-03 first to create all required tables.

## Executive Summary

In [None]:
%%sql
-- Key Performance Indicators
SELECT
    'Total Users' as metric,
    COUNT(DISTINCT user_id) as value
FROM babblr_silver.user_profiles

UNION ALL

SELECT
    'Total Conversations',
    COUNT(*)
FROM babblr_silver.conversations

UNION ALL

SELECT
    'Completed Lessons',
    COUNT(*)
FROM babblr_silver.lesson_progress
WHERE status = 'completed'

UNION ALL

SELECT
    'Assessment Attempts',
    COUNT(*)
FROM babblr_silver.assessment_attempts

UNION ALL

SELECT
    'Avg Assessment Score',
    ROUND(AVG(score), 1)
FROM babblr_silver.assessment_attempts

## Daily Active Users Trend

*Tip: In Databricks, click the chart icon below the table to create a visualization*

In [None]:
%%sql
-- Daily Active Users (DAU)
SELECT
    activity_date,
    SUM(active_users) as daily_active_users,
    SUM(conversations) as daily_conversations
FROM babblr_gold.daily_metrics
GROUP BY activity_date
ORDER BY activity_date

**Visualization Settings:**
- Chart Type: Line
- X-axis: activity_date
- Y-axis: daily_active_users, daily_conversations

## Language Distribution

In [None]:
%%sql
-- Users and activity by language
SELECT
    language,
    COUNT(DISTINCT user_id) as users,
    SUM(total_assessments) as assessments,
    ROUND(AVG(avg_score), 1) as avg_score
FROM babblr_silver.user_profiles
GROUP BY language
ORDER BY users DESC

**Visualization Settings:**
- Chart Type: Bar
- X-axis: language
- Y-axis: users

## CEFR Level Funnel

Shows student distribution across proficiency levels.

In [None]:
%%sql
-- CEFR level distribution (all languages)
SELECT
    cefr_level,
    SUM(users_at_level) as total_users,
    ROUND(AVG(avg_assessment_score), 1) as avg_score
FROM babblr_gold.cefr_funnel
GROUP BY cefr_level
ORDER BY
    CASE cefr_level
        WHEN 'A1' THEN 1
        WHEN 'A2' THEN 2
        WHEN 'B1' THEN 3
        WHEN 'B2' THEN 4
        WHEN 'C1' THEN 5
        WHEN 'C2' THEN 6
    END

**Visualization Settings:**
- Chart Type: Funnel or Bar
- X-axis: cefr_level
- Y-axis: total_users

## Lesson Effectiveness Analysis

Which lessons produce the best learning outcomes?

In [None]:
%%sql
-- Top lessons by effectiveness score
SELECT
    lesson_type,
    subject,
    lesson_difficulty,
    total_attempts,
    ROUND(completion_rate * 100, 1) as completion_rate_pct,
    ROUND(avg_mastery * 100, 1) as avg_mastery_pct,
    ROUND(effectiveness_score * 100, 1) as effectiveness_pct
FROM babblr_gold.lesson_effectiveness
ORDER BY effectiveness_score DESC
LIMIT 15

### Effectiveness by Lesson Type

In [None]:
%%sql
-- Aggregate effectiveness by lesson type
SELECT
    lesson_type,
    COUNT(*) as lesson_count,
    SUM(total_attempts) as total_attempts,
    ROUND(AVG(completion_rate) * 100, 1) as avg_completion_rate,
    ROUND(AVG(avg_mastery) * 100, 1) as avg_mastery,
    ROUND(AVG(effectiveness_score) * 100, 1) as avg_effectiveness
FROM babblr_gold.lesson_effectiveness
GROUP BY lesson_type
ORDER BY avg_effectiveness DESC

**Visualization Settings:**
- Chart Type: Bar (grouped)
- X-axis: lesson_type
- Y-axis: avg_completion_rate, avg_mastery

## Student Segment Analysis

Understanding different learner profiles helps personalize the experience.

In [None]:
%%sql
-- Student segments from K-Means clustering
SELECT
    uc.cluster,
    CASE uc.cluster
        WHEN 0 THEN 'High Performers'
        WHEN 1 THEN 'Active Learners'
        WHEN 2 THEN 'Struggling Students'
        WHEN 3 THEN 'Casual Users'
        ELSE 'Unknown'
    END as segment_name,
    COUNT(*) as user_count,
    ROUND(AVG(up.avg_score), 1) as avg_score,
    ROUND(AVG(up.total_assessments), 1) as avg_assessments
FROM babblr_gold.user_clusters uc
JOIN babblr_silver.user_profiles up ON uc.user_id = up.user_id
GROUP BY uc.cluster
ORDER BY uc.cluster

**Visualization Settings:**
- Chart Type: Pie or Bar
- Values: user_count
- Labels: segment_name

### Segment Characteristics Deep Dive

In [None]:
%%sql
-- Detailed segment characteristics
WITH segment_stats AS (
    SELECT
        uc.cluster,
        up.user_id,
        up.avg_score,
        up.total_assessments,
        COALESCE(conv.conv_count, 0) as conversations,
        COALESCE(conv.avg_error_rate, 0) as error_rate,
        COALESCE(lp.completed, 0) as completed_lessons
    FROM babblr_gold.user_clusters uc
    JOIN babblr_silver.user_profiles up ON uc.user_id = up.user_id
    LEFT JOIN (
        SELECT user_id, COUNT(*) as conv_count, AVG(error_rate) as avg_error_rate
        FROM babblr_silver.conversations GROUP BY user_id
    ) conv ON up.user_id = conv.user_id
    LEFT JOIN (
        SELECT user_id, COUNT(*) as completed
        FROM babblr_silver.lesson_progress WHERE status = 'completed' GROUP BY user_id
    ) lp ON up.user_id = lp.user_id
)
SELECT
    cluster,
    COUNT(*) as users,
    ROUND(AVG(avg_score), 1) as avg_score,
    ROUND(AVG(conversations), 1) as avg_conversations,
    ROUND(AVG(error_rate) * 100, 1) as error_rate_pct,
    ROUND(AVG(completed_lessons), 1) as avg_completed_lessons
FROM segment_stats
GROUP BY cluster
ORDER BY cluster

## Topic Engagement Heatmap

In [None]:
%%sql
-- Topic engagement across languages
SELECT
    topic_id,
    language,
    conversation_count,
    ROUND(avg_duration_min, 1) as avg_duration
FROM babblr_gold.topic_engagement
ORDER BY conversation_count DESC
LIMIT 50

**Visualization Settings:**
- Chart Type: Heatmap
- X-axis: language
- Y-axis: topic_id
- Values: conversation_count

### Most Engaging Topics

In [None]:
%%sql
-- Topics with highest engagement (session duration)
SELECT
    topic_id,
    SUM(conversation_count) as total_conversations,
    SUM(unique_users) as total_users,
    ROUND(AVG(avg_duration_min), 1) as avg_session_min,
    ROUND(AVG(avg_messages_per_conv), 1) as avg_messages
FROM babblr_gold.topic_engagement
GROUP BY topic_id
ORDER BY avg_session_min DESC
LIMIT 10

## Error Rate Analysis

Which CEFR levels have the highest error rates?

In [None]:
%%sql
-- Error rates by CEFR level
SELECT
    difficulty_level as cefr_level,
    COUNT(*) as conversations,
    ROUND(AVG(error_rate) * 100, 2) as avg_error_rate_pct,
    ROUND(AVG(message_count), 1) as avg_messages
FROM babblr_silver.conversations
GROUP BY difficulty_level
ORDER BY
    CASE difficulty_level
        WHEN 'A1' THEN 1
        WHEN 'A2' THEN 2
        WHEN 'B1' THEN 3
        WHEN 'B2' THEN 4
        WHEN 'C1' THEN 5
        WHEN 'C2' THEN 6
    END

## Assessment Performance Trends

In [None]:
%%sql
-- Weekly assessment performance
SELECT
    DATE_TRUNC('week', started_at) as week,
    COUNT(*) as assessments,
    ROUND(AVG(score), 1) as avg_score,
    COUNT(DISTINCT user_id) as unique_users
FROM babblr_silver.assessment_attempts
GROUP BY DATE_TRUNC('week', started_at)
ORDER BY week

**Visualization Settings:**
- Chart Type: Combo (line + bar)
- X-axis: week
- Y-axis (left): assessments (bar)
- Y-axis (right): avg_score (line)

## Creating a Databricks Dashboard

To create a dashboard from these visualizations:

1. Click the **chart icon** below any SQL result to create a visualization
2. Configure chart type and axes as noted above
3. Click **Save** on the visualization
4. Go to **Create** > **Dashboard**
5. Click **Add** > **Visualization** and select from this notebook
6. Arrange widgets as desired

**Recommended Dashboard Layout:**
```
+-------------------------------------------------------------+
|  KPI Cards: Users | Conversations | Avg Score | Lessons     |
+-----------------------------+-------------------------------+
|  Daily Active Users Trend   |  Language Distribution        |
+-----------------------------+-------------------------------+
|  CEFR Level Funnel          |  Student Segments Pie         |
+-----------------------------+-------------------------------+
|  Lesson Effectiveness Table                                 |
+-------------------------------------------------------------+
|  Topic Engagement Heatmap                                   |
+-------------------------------------------------------------+
```

## Summary

This dashboard provides insights for:

| Stakeholder | Key Metrics |
|-------------|-------------|
| **Product Manager** | DAU, engagement trends, feature usage |
| **Content Team** | Lesson effectiveness, topic popularity |
| **Learning Designer** | Error patterns, CEFR progression |
| **Growth Team** | User segments, retention indicators |

**Actionable Insights:**
1. Focus content creation on high-engagement topics
2. Improve or retire low-effectiveness lessons
3. Create targeted interventions for "Struggling Students" segment
4. Optimize CEFR advancement pace based on error rates

---

## Interview Talking Points

When presenting this dashboard:

1. **Business Value**: "This dashboard helps the product team understand what's working and what needs improvement in the learning experience."

2. **Technical Depth**: "The data flows through a medallion architecture: Bronze (raw), Silver (cleaned/joined), Gold (aggregated). This ensures data quality and query performance."

3. **ML Integration**: "I used K-Means clustering to segment users for personalized interventions. (MLflow tracking is available in paid editions for experiment management.)"

4. **Scalability**: "This architecture scales - we could add real-time streaming, more sophisticated ML models, or integrate with external data sources."