# 04 Engagement KPIs

This notebook computes behavioral engagement KPIs from the unified
`listening_events` table. Metrics are derived from normalized Spotify
and Apple Music listening events stored in SQLite.

Focus areas include:
- Listening intensity
- Habit consistency
- Session behavior
- Preference diversity
- Discovery vs repeat listening

Outputs are exported as clean aggregate tables for Power BI.

In [1]:
import sqlite3
import pandas as pd
from pathlib import Path

In [2]:
DatabasePath = "../data/processed/MusicPlatformInsights.db"

## Step 1 — Load normalized listening events

We load the unified `listening_events` table from SQLite.
At this stage, all events already share a canonical schema and UTC timestamps.

In [3]:
connect = sqlite3.connect(DatabasePath)

Events = pd.read_sql_query(
    "SELECT * FROM listening_events",
    connect
)

connect.close()

print("Total listening events:", len(Events))
Events.head()

Total listening events: 11362


Unnamed: 0,event_time,platform,artist,track,duration_minutes,session_id
0,2024-08-10 02:45:00+00:00,spotify,Miguel,coffee,3.026717,1
1,2025-01-08 11:54:00+00:00,spotify,Travis Scott,ASTROTHUNDER,2.382817,2
2,2025-01-08 12:00:00+00:00,spotify,SZA,Another Life,2.880433,2
3,2025-01-08 12:03:00+00:00,spotify,BigXthaPlug,Change Me,2.2811,2
4,2025-01-08 12:07:00+00:00,spotify,The Marías,Heavy,4.220217,2


## Step 2 — Create time-based features

We derive date-, week-, and month-level fields from `event_time`.
These features support habit consistency, streak analysis, and
longitudinal engagement metrics.

In [8]:
# Normalize event_time so everything is comparable
#
# event_time comes from different sources and formats:
# - Spotify originally naive timestamps
# - Apple Music already timezone-aware
# - SQLite returns mixed ISO strings
#
# format="mixed" lets pandas handle all of that safely
# utc=True forces everything onto the same timeline
Events["event_time"] = pd.to_datetime(
    Events["event_time"],
    format="mixed",
    utc=True
)

# Add time-based fields for habit and longitudinal analysis
# These are derived here on purpose and not stored in the DB
Events["event_date"] = Events["event_time"].dt.date
Events["event_week"] = Events["event_time"].dt.to_period("W").astype(str)
Events["event_month"] = Events["event_time"].dt.to_period("M").astype(str)

Events.head()

  Events["event_week"] = Events["event_time"].dt.to_period("W").astype(str)
  Events["event_month"] = Events["event_time"].dt.to_period("M").astype(str)


Unnamed: 0,event_time,platform,artist,track,duration_minutes,session_id,event_date,event_week,event_month
0,2024-08-10 02:45:00+00:00,spotify,Miguel,coffee,3.026717,1,2024-08-10,2024-08-05/2024-08-11,2024-08
1,2025-01-08 11:54:00+00:00,spotify,Travis Scott,ASTROTHUNDER,2.382817,2,2025-01-08,2025-01-06/2025-01-12,2025-01
2,2025-01-08 12:00:00+00:00,spotify,SZA,Another Life,2.880433,2,2025-01-08,2025-01-06/2025-01-12,2025-01
3,2025-01-08 12:03:00+00:00,spotify,BigXthaPlug,Change Me,2.2811,2,2025-01-08,2025-01-06/2025-01-12,2025-01
4,2025-01-08 12:07:00+00:00,spotify,The Marías,Heavy,4.220217,2,2025-01-08,2025-01-06/2025-01-12,2025-01


## Step 3 — Listening intensity

Listening intensity measures total time spent listening,
independent of frequency or session structure.

These metrics establish a baseline for overall engagement
and platform usage.

In [None]:
# Total listening time by platform
ListeningByPlatform = (
    Events
    .groupby("platform", as_index=False)["duration_minutes"]
    .sum()
    .rename(columns={"duration_minutes": "total_minutes"})
    .sort_values("total_minutes", ascending=False)
)

ListeningByPlatform.head()