# 03 Add Apple Music Artists

This notebook enriches Apple Music listening events with artist metadata by
joining play activity data with Apple Music Library track metadata.

Notes:
- This is a v2 enrichment step, separate from ingestion.
- Spotify data is not modified.
- Artist enrichment is intentionally conservative and partial.

In [1]:
import sqlite3
import pandas as pd
import json
from pathlib import Path

## Step 1 — Load Apple Music play events from SQLite

We load only Apple Music rows from the canonical `listening_events` table.

In [3]:
DatabasePath = "../data/processed/MusicPlatformInsights.db"

connect = sqlite3.connect(DatabasePath)

apple = pd.read_sql_query(
    """
    SELECT *
    FROM listening_events
    WHERE platform = 'apple_music'
    """,
    connect
)

connect.close()

print("Apple Music events:", len(apple))
apple.head()

Apple Music events: 6803


Unnamed: 0,event_time,platform,artist,track,duration_minutes,session_id
0,2025-09-24 14:10:33.680000+00:00,apple_music,,Moon (feat. Bon Iver),1.988433,1
1,2025-09-28 02:49:36.263000+00:00,apple_music,,ocean eyes,2.952467,2
2,2025-09-30 14:58:17.450000+00:00,apple_music,,Nobody Gets Me,3.015033,3
3,2025-09-30 15:01:54.689000+00:00,apple_music,,Always,3.7551,3
4,2025-09-30 15:02:41.553000+00:00,apple_music,,CHIHIRO,0.867517,3


## Step 2 — Load Apple Music Library Tracks metadata

This file contains artist metadata at the track level and is used only
for enrichment (no timestamps).

In [5]:
# Path to the Apple Music Library Tracks export
# Folder name must exactly match the filesystem
LibraryPath = Path("../data/raw/apple music/Apple Music Library Tracks.json")

# Open and load the JSON file
# Apple exports this file as a *list of track objects*, not a dictionary
with open(LibraryPath, "r", encoding="utf-8") as File:
    LibraryJson = json.load(File)

# Convert the list of track objects directly into a DataFrame
# Each row represents one track saved in the user's Apple Music library
LibraryTracks = pd.DataFrame(LibraryJson)

# Basic sanity check
print("Library tracks loaded:", len(LibraryTracks))

# Preview only the fields we care about for enrichment
LibraryTracks[["Artist", "Title"]].head()

Library tracks loaded: 2357


Unnamed: 0,Artist,Title
0,Daniel Caesar,Please Do Not Lean (feat. BADBADNOTGOOD) [Bonus]
1,Billie Eilish,SKINNY
2,Frank Ocean,Nights
3,Daniel Caesar,Get You (feat. Kali Uchis)
4,Billie Eilish,WILDFLOWER


## Step 3 — Canonicalize Apple Music Library fields

The Apple Music Library export contains extensive metadata, but for artist
enrichment we only need a minimal lookup table.

In this step:
- `Title` is mapped to `track`
- `Artist` is mapped to `artist`
- All other library metadata is intentionally ignored

This produces a clean, two-column dataset (`track`, `artist`) that can be
safely joined with Apple Music play events.

In [7]:
LibraryTracks.head()

Unnamed: 0,Content Type,Track Identifier,Title,Sort Name,Artist,Sort Artist,Composer,Is Part of Compilation,Album,Sort Album,...,Audio File Extension,Track Like Rating,Is Checked,Copyright,Playlist Only Track,Release Date,Purchased Track Identifier,Apple Music Track Identifier,Favorite Status - Track,Favorite Date - Track
0,Song,182885406,Please Do Not Lean (feat. BADBADNOTGOOD) [Bonus],Please Do Not Lean (feat. BADBADNOTGOOD) [Bonus],Daniel Caesar,Daniel Caesar,"Daniel Caesar, Matthew Burnett, Jordan Evans, ...",False,NEVER ENOUGH (Bonus Version),NEVER ENOUGH (Bonus Version),...,m4a,liked,False,"℗ 2023 Hollace Inc., under exclusive license t...",False,2022-04-22T07:00:00Z,1681332245,1681332245,True,2025-09-30T22:31:02Z
1,Song,182885410,SKINNY,SKINNY,Billie Eilish,Billie Eilish,Billie Eilish & FINNEAS,False,HIT ME HARD AND SOFT,HIT ME HARD AND SOFT,...,m4a,liked,False,℗ 2024 Darkroom/Interscope Records,False,2024-05-17T12:00:00Z,1739659137,1739659137,True,2025-09-30T22:33:01Z
2,Song,182885414,Nights,Nights,Frank Ocean,Frank Ocean,,False,Blonde,Blonde,...,m4a,liked,False,℗ Boys Don't Cry,False,2016-08-20T07:00:00Z,1146195720,1146195720,True,2025-09-30T23:06:39Z
3,Song,182885418,Get You (feat. Kali Uchis),Get You (feat. Kali Uchis),Daniel Caesar,Daniel Caesar,,False,Freudian,Freudian,...,m4a,liked,False,℗ 2017 Golden Child Recordings,False,2016-10-21T12:00:00Z,1799080775,1799080775,True,2025-10-02T01:40:29Z
4,Song,182885422,WILDFLOWER,WILDFLOWER,Billie Eilish,Billie Eilish,Billie Eilish & FINNEAS,False,HIT ME HARD AND SOFT,HIT ME HARD AND SOFT,...,m4a,liked,False,℗ 2024 Darkroom/Interscope Records,False,2024-05-17T12:00:00Z,1739659144,1739659144,True,2025-10-02T01:40:31Z


In [9]:
# Rename Apple Music Library columns into canonical join fields
# Track -> track
# artist is already correct
LibraryTracks = LibraryTracks.rename(columns={
    "Title": "track",
    "Artist": "artist"
})

# Keep only the columns needed for artist enrichment
LibraryTracks = LibraryTracks[["track", "artist"]]

# Sanity check
LibraryTracks.head()

Unnamed: 0,track,artist
0,Please Do Not Lean (feat. BADBADNOTGOOD) [Bonus],Daniel Caesar
1,SKINNY,Billie Eilish
2,Nights,Frank Ocean
3,Get You (feat. Kali Uchis),Daniel Caesar
4,WILDFLOWER,Billie Eilish


## Step 9, Enrich Apple Music events with artist names

Apple Music play activity does not reliably include artist names.
We enrich the event-level data by joining against the user's Apple Music
Library Tracks export using the track name as the join key.

Notes:
- This is a left join to preserve all listening events
- Matches may be imperfect due to naming differences
- Unmatched rows will retain null artists

In [10]:
# Before join: inspect missing artists
print(
    "Missing artist count before join:",
    apple["artist"].isna().sum()
)

# Left join Apple play events with library track metadata
# Join key: track name
apple = apple.merge(
    LibraryTracks,
    on="track",
    how="left",
    suffixes=("", "_library")
)

# Fill artist from library where missing
apple["artist"] = apple["artist"].fillna(
    apple["artist_library"]
)

# Drop helper column
apple = apple.drop(columns=["artist_library"])

# After join: inspect improvement
print(
    "Missing artist count after join:",
    apple["artist"].isna().sum()
)

apple.head()

Missing artist count before join: 6803
Missing artist count after join: 3210


Unnamed: 0,event_time,platform,artist,track,duration_minutes,session_id
0,2025-09-24 14:10:33.680000+00:00,apple_music,Daniel Caesar,Moon (feat. Bon Iver),1.988433,1
1,2025-09-28 02:49:36.263000+00:00,apple_music,,ocean eyes,2.952467,2
2,2025-09-30 14:58:17.450000+00:00,apple_music,SZA,Nobody Gets Me,3.015033,3
3,2025-09-30 14:58:17.450000+00:00,apple_music,SZA,Nobody Gets Me,3.015033,3
4,2025-09-30 15:01:54.689000+00:00,apple_music,Daniel Caesar,Always,3.7551,3


## Step 10 — Write updated Apple Music events back to SQLite

Spotify initialized the unified `listening_events` table earlier.
At this stage, we replace only the Apple Music rows with the enriched
version produced in this notebook.

This preserves Spotify data while updating Apple Music events
with recovered artist information.

In [11]:
# Connect to the SQLite database
connect = sqlite3.connect(DatabasePath)

# Remove existing Apple Music rows
# This ensures we do not duplicate events
connect.execute("""
DELETE FROM listening_events
WHERE platform = 'apple_music';
""")

# Append enriched Apple Music events
apple.to_sql(
    "listening_events",
    connect,
    if_exists="append",
    index=False
)

connect.commit()
connect.close()

print("Apple Music events written to SQLite:", len(apple))

Apple Music events written to SQLite: 7273


## Step 11 - Verify Apple Music rows in database

We run a quick SQL sanity check to confirm that Apple Music events
exist in the database and that artist enrichment was applied.

In [12]:
connect = sqlite3.connect(DatabasePath)

Verification = pd.read_sql_query("""
SELECT
    COUNT(*) AS TotalRows,
    COUNT(artist) AS ArtistPresent,
    MIN(event_time) AS MinTime,
    MAX(event_time) AS MaxTime
FROM listening_events
WHERE platform = 'apple_music';
""", connect)

connect.close()

Verification

Unnamed: 0,TotalRows,ArtistPresent,MinTime,MaxTime
0,7273,4063,2025-09-24 14:10:33.680000+00:00,2026-01-07 22:51:18.410000+00:00
