# Playlist Quality Control (Anomaly Detection) — Project Framework

## 1) Goal / user story
Editorial or user playlist curators want to keep playlists *coherent* (mood/sonic profile). The system should automatically flag tracks that are likely “off-mood” so a human can review, reorder, move to another playlist, or remove.

**Core output:** For a given playlist → a ranked list of suspect tracks + *why* (which audio attributes differ) + optional recommendations (replace with similar tracks).

---

## 2) Data sources (Spotify API)
**Entities**
- **Playlists**: playlist metadata, track IDs, added_at, position/order.
- **Tracks**: track metadata (name, artist, popularity, explicit, release date).
- **Audio Features** (Spotify-style): `danceability, energy, valence, tempo, loudness, acousticness, instrumentalness, liveness, speechiness, duration_ms, key, mode, time_signature`.
- (Optional) **Audio Analysis**: segments/bars/timbre (richer but heavier).

**Ingestion pattern**
- Batch pull playlist tracks + audio features on a schedule (or on-demand for a curator).
- Cache results in a small DB (SQLite/Postgres) to avoid repeatedly hitting rate limits.

---

## 3) Feature engineering (represent each song)
Start with interpretable, Spotify-native features.

**Cleaning & normalization**
- Handle missing audio features (rare but possible) → impute or drop.
- Robust scaling (median/IQR) or standard scaling per feature.

**Playlist-relative features (very useful)**
- `z_score` per feature relative to playlist distribution.
- “Distance to neighbors” in the playlist order (captures local mood shifts).

**Optional enrichments**
- Genre embeddings (artist genres) or text embeddings from track/artist names.
- User feedback signals: skips, saves, completion rate (if available).

---

## 4) Learn the “dominant mood/sonic profile”
A playlist can be:
- **Unimodal** (one coherent vibe)
- **Multi-modal** (two+ vibes: e.g., workout warm-up vs peak)

Modeling options (choose based on complexity & interpretability):

### A) Robust single-cluster model (baseline)
Assume one dominant cluster.
- Estimate a robust center `μ` and scatter `Σ`.
- Use **Mahalanobis distance** (with shrinkage covariance) as anomaly score.
- Robust alternatives: Minimum Covariance Determinant (MCD) / robust covariance.

**Pros:** simple, explainable. **Cons:** struggles if playlist is truly multi-vibe.

### B) Clustering then “main cluster” selection
- Fit clustering (e.g., Gaussian Mixture Model, HDBSCAN, k-means).
- Identify the **largest cluster** as the dominant mood.
- Score each track by distance to *nearest* cluster (or dominant cluster only).

**Pros:** handles multi-modal playlists. **Cons:** model selection (k, stability).

### C) Density-based anomaly detection
- Isolation Forest / Local Outlier Factor (LOF) on feature space.

**Pros:** good for outliers. **Cons:** explanations less intuitive unless augmented.

---

## 5) Anomaly scoring & thresholds
**Per-track outputs**
- `anomaly_score`: continuous (higher = more suspect)
- `percentile_rank` within playlist
- `flag`: boolean based on threshold

**Threshold strategies**
- Fixed: top `N` tracks per playlist.
- Percentile: flag top 5–10%.
- Statistical: distance above a chi-square quantile (for Mahalanobis, approx).

**Practical approach:** show a ranked list + allow curator to adjust sensitivity.

---

## 6) Explanations (why is it flagged?)
Curators need *actionable* reasons.

**Simple, effective explanation**
- Show the top contributing features via playlist-relative z-scores:
  - Example: “Low energy (−2.1σ) and high acousticness (+2.6σ) compared to playlist.”

**For Mahalanobis / covariance-based models**
- Provide per-feature standardized residuals.
- Show feature contribution decomposition (approx via whitening transform).

**Visualization**
- Radar chart comparing track vs playlist median.
- 2D embedding (PCA/UMAP) with cluster/center and highlighted outliers.

---

## 7) Curator dashboard (AI + visualization system)
**Views**
1. **Playlist overview**: distribution of features, detected clusters, coherence score.
2. **Ranked suspects table**: track, artist, score, key reasons, preview link.
3. **Interactive scatter**: PCA/UMAP plot; click a point to see track card.
4. **Within-order analysis**: anomaly score over playlist position.

**Actions**
- Mark as “keep” (false positive) / “remove” / “move to another playlist”.
- Sensitivity slider (threshold).
- “Find replacements”: retrieve nearest neighbors from a library based on similarity.

---

## 8) Evaluation plan (how you know it works)
**Offline (if labels exist)**
- If you have historical edits: removed tracks = positives.
- Metrics: Precision@K (for curator workload), ROC-AUC (score quality).

**Online / product metrics**
- Reduction in skip rate.
- Increase in average listen duration.
- Curator acceptance rate of flagged tracks.

**Human-in-the-loop**
- Curators validate flags; feedback becomes training data for improved thresholds/models.

---

## 9) Deployment / system architecture
**Pipeline**
- Ingestion service (Spotify API → DB)
- Feature computation job
- Model service (per-playlist fit + scoring)
- Frontend dashboard

**Storage**
- Tracks table (track_id, metadata)
- Audio features table (track_id, features, timestamp)
- Playlist snapshots (playlist_id, snapshot_id, ordered track_ids)
- Scoring results (playlist_id, snapshot_id, track_id, score, explanations)

**Operational concerns**
- Spotify API rate limits → caching + batching.
- Snapshotting playlists so results are reproducible.
- Auditability: keep model version + feature version.

---

## 10) Suggested MVP scope (datathon-friendly)
1. Pull one playlist’s tracks + audio features.
2. Build robust single-cluster baseline (Mahalanobis) + top-N anomaly ranking.
3. Build a small interactive dashboard (Plotly) with:
   - PCA scatter + highlighted outliers
   - ranked table with explanation strings
4. Add clustering variant (GMM or k-means) if time permits.

---

## 11) Next info needed (when you’re ready)
- What playlist types are you targeting (editorial vs user-made)?
- Do playlists tend to be multi-vibe? (impacts clustering choice)
- Do you have any ground truth (historical removals, skip data) for evaluation?


## What I need from you to begin (minimum → optional)

### Minimum (so we can build the first working prototype)
1. **How we’re getting Spotify data** (pick one):
   - **A)** You will provide a **Spotify Developer app** and set env vars: `SPOTIPY_CLIENT_ID`, `SPOTIPY_CLIENT_SECRET`, (optionally `SPOTIPY_REDIRECT_URI`) 
   - **B)** You will export a **CSV/JSON** of playlist tracks + audio features (offline prototype)
2. **One target playlist to start with**
   - Provide a **playlist URL/URI** (editorial or user-made) and confirm it’s **public** (or you have access).
3. **What “off-mood” means for you (choose one for MVP)**
   - Flag **top N** tracks (e.g., N=5) **or** flag **top X%** (e.g., 10%) **or** use a **statistical threshold**.
4. **Where you want the output**
   - **Notebook only** (tables + plots) **or** **simple dashboard** (Plotly) as an MVP.

### Strongly recommended (improves results quickly)
5. **Playlist type mix**: mostly **unimodal** (one vibe) vs **multi-vibe** (2+ sections). If multi-vibe is common, we’ll prioritize clustering.
6. **Any curator feedback/ground truth** (even small):
   - Tracks historically removed, “keep/remove” labels, skip rate, etc.

### Optional (nice-to-have)
7. **Track preview links / replacement pool**
   - If you have a catalog/library to source replacements from, tell me what it is.
8. **Constraints**
   - Runtime limits, max API calls, must-cache-in-SQLite requirement, etc.

---

### Quick answers I’m asking you for (reply in one message)
1) Data access option (A or B): 
2) Playlist URL/URI: 
3) Threshold choice (top N / top % / statistical): 
4) Output choice (notebook / dashboard):


In [3]:
# This cell validates Spotify credentials via environment variables and parses the playlist id.
# NOTE: We avoid installing spotipy here (PEP 668 externally-managed environment). We'll use raw HTTPS calls instead.

import os
from urllib.parse import urlparse


def parse_playlist_id(playlist_url_or_uri: str) -> str:
    s = playlist_url_or_uri.strip()
    if s.startswith("spotify:playlist:"):
        return s.split(":")[-1]
    if "open.spotify.com" in s:
        path = urlparse(s).path
        parts = [p for p in path.split("/") if p]
        if len(parts) >= 2 and parts[0] == "playlist":
            return parts[1]
    raise ValueError(f"Could not parse playlist id from: {playlist_url_or_uri}")


needed = ["SPOTIPY_CLIENT_ID", "SPOTIPY_CLIENT_SECRET"]
missing = [k for k in needed if not os.getenv(k)]
print("Missing env vars:", missing)

playlist_url = "https://open.spotify.com/playlist/7qF4mOQrz3xPssEeYTc53q?si=379a0431ad75490c"
print("Parsed playlist_id:", parse_playlist_id(playlist_url))

if missing:
    print("\nSet these in your shell (recommended) or in a *new* notebook cell via os.environ[...] and re-run.")

Missing env vars: ['SPOTIPY_CLIENT_ID', 'SPOTIPY_CLIENT_SECRET']
Parsed playlist_id: 7qF4mOQrz3xPssEeYTc53q

Set these in your shell (recommended) or in a *new* notebook cell via os.environ[...] and re-run.


In [None]:
# This cell obtains a Spotify access token (Client Credentials) and downloads playlist tracks + audio features.
# It builds a tidy DataFrame we can use for EDA and clustering.

import os
import time
import base64
import requests
import pandas as pd

SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
SPOTIFY_API_BASE = "https://api.spotify.com/v1"


def spotify_get_access_token(client_id: str, client_secret: str) -> str:
    auth_b64 = base64.b64encode(f"{client_id}:{client_secret}".encode("utf-8")).decode("utf-8")
    headers = {"Authorization": f"Basic {auth_b64}"}
    data = {"grant_type": "client_credentials"}
    r = requests.post(SPOTIFY_TOKEN_URL, headers=headers, data=data, timeout=30)
    r.raise_for_status()
    return r.json()["access_token"]


def spotify_get(url: str, token: str, params: dict | None = None):
    headers = {"Authorization": f"Bearer {token}"}
    r = requests.get(url, headers=headers, params=params, timeout=30)
    r.raise_for_status()
    return r.json()


def fetch_playlist_tracks(playlist_id: str, token: str, limit: int = 100) -> list[dict]:
    items = []
    offset = 0
    while True:
        j = spotify_get(
            f"{SPOTIFY_API_BASE}/playlists/{playlist_id}/tracks",
            token,
            params={"limit": limit, "offset": offset, "market": "US"},
        )
        batch = j.get("items", [])
        items.extend(batch)
        if j.get("next") is None:
            break
        offset += limit
        time.sleep(0.05)  # tiny pause to be polite
    return items


def fetch_audio_features(track_ids: list[str], token: str, batch_size: int = 100) -> dict:
    out = {}
    for i in range(0, len(track_ids), batch_size):
        chunk = track_ids[i : i + batch_size]
        j = spotify_get(
            f"{SPOTIFY_API_BASE}/audio-features",
            token,
            params={"ids": ",".join(chunk)},
        )
        for af in j.get("audio_features", []) or []:
            if af and af.get("id"):
                out[af["id"]] = af
        time.sleep(0.05)
    return out


client_id = os.getenv("SPOTIPY_CLIENT_ID")
client_secret = os.getenv("SPOTIPY_CLIENT_SECRET")
if not client_id or not client_secret:
    raise RuntimeError(
        "Missing SPOTIPY_CLIENT_ID / SPOTIPY_CLIENT_SECRET in environment. "
        "Set them in your shell or in a new cell via os.environ[...] before running."
    )

token = spotify_get_access_token(client_id, client_secret)

playlist_id = parse_playlist_id(playlist_url)
raw_items = fetch_playlist_tracks(playlist_id, token)

# Build track table
rows = []
for pos, it in enumerate(raw_items):
    tr = (it or {}).get("track") or {}
    if not tr or tr.get("id") is None:
        continue
    rows.append(
        {
            "position": pos,
            "added_at": it.get("added_at"),
            "track_id": tr.get("id"),
            "track_name": tr.get("name"),
            "artist_name": (tr.get("artists") or [{}])[0].get("name"),
            "album_name": (tr.get("album") or {}).get("name"),
            "popularity": tr.get("popularity"),
            "explicit": tr.get("explicit"),
            "duration_ms": tr.get("duration_ms"),
        }
    )

df_tracks = pd.DataFrame(rows)
track_ids = df_tracks["track_id"].dropna().unique().tolist()

af_map = fetch_audio_features(track_ids, token)
df_af = pd.DataFrame(list(af_map.values()))

# Merge
df_playlist = df_tracks.merge(df_af, left_on="track_id", right_on="id", how="left")

# Basic cleanup
if "added_at" in df_playlist.columns:
    df_playlist["added_at"] = pd.to_datetime(df_playlist["added_at"], errors="coerce", utc=True)

print("Tracks in playlist (after filtering unavailable):", len(df_tracks))
print("Tracks with audio features:", df_playlist["danceability"].notna().sum())

display(df_playlist.head(5))

missing_af = df_playlist[df_playlist["danceability"].isna()][["track_name", "artist_name", "track_id"]].head(10)
if len(missing_af) > 0:
    print("\nExamples of tracks missing audio features (first 10):")
    display(missing_af)


In [None]:
# Set Spotify credentials for this *session* only (avoid committing secrets to git).
# Prefer using a local .env file + Cell 5/10 instead.
import os
import getpass

# If you exported these in your shell or created a .env already, you can skip this cell.
# This cell will *prompt* you so secrets don't get stored in notebook outputs.

if not os.getenv("SPOTIPY_CLIENT_ID"):
    os.environ["SPOTIPY_CLIENT_ID"] = input("Enter SPOTIPY_CLIENT_ID: ").strip()

if not os.getenv("SPOTIPY_CLIENT_SECRET"):
    os.environ["SPOTIPY_CLIENT_SECRET"] = getpass.getpass("Enter SPOTIPY_CLIENT_SECRET (input hidden): ").strip()

needed = ["SPOTIPY_CLIENT_ID", "SPOTIPY_CLIENT_SECRET"]
missing = [k for k in needed if not os.getenv(k)]
print("Missing env vars:", missing)
if not missing:
    print("Credentials are present in this session. Next: run Cell 10 to fetch df_playlist.")


In [None]:
# This cell loads Spotify credentials from environment variables (optionally from a local .env file)
# and confirms readiness to fetch playlist data.

import os

try:
    from dotenv import load_dotenv
    load_dotenv()  # loads from a local .env if present
except Exception:
    pass

needed = ["SPOTIPY_CLIENT_ID", "SPOTIPY_CLIENT_SECRET"]
missing = [k for k in needed if not os.getenv(k)]
print("Missing env vars:", missing)

if missing:
    print(
        "\nCreate a local .env file (recommended, do NOT commit) with:\n"
        "  SPOTIPY_CLIENT_ID='...'\n"
        "  SPOTIPY_CLIENT_SECRET='...'\n\n"
        "Then restart kernel or re-run this cell.\n"
        "Alternatively, set them in your shell and restart the kernel."
    )
else:
    print("Credentials found. Next: run Cell 3 to fetch playlist tracks + audio features into df_playlist.")


## Next steps (so we can actually pull data + build clustering-based “moods”)

### 0) Security note (important)
You pasted your **client secret** in chat. Treat it as compromised:
- Go to Spotify Developer Dashboard → your app → **rotate/regenerate the client secret**.
- Use the new secret going forward.

### 1) Put credentials in a local `.env` (recommended)
Create a file named `.env` in the same folder where you run this notebook (do **not** commit it):

```
SPOTIPY_CLIENT_ID="<your_client_id>"
SPOTIPY_CLIENT_SECRET="<your_new_rotated_client_secret>"
```

(We don’t actually need the redirect URI for **Client Credentials** flow, since we’re only reading public playlist data.)

### 2) Run the notebook cells in this order
1. **Cell 5** (loads `.env` and checks if env vars exist)
2. **Cell 3** (fetches playlist tracks + audio features into `df_playlist`)

If Cell 3 succeeds, you should see:
- Count of tracks fetched
- Count with audio features
- `df_playlist.head()`

### 3) After `df_playlist` exists (what I’ll do next)
We’ll proceed in small, separate steps:
1. EDA + missing-value checks + robust scaling
2. Choose **N moods** via clustering (GMM/KMeans + model selection)
3. Define anomaly score = distance to **nearest mood cluster** (plus per-feature z-score explanation)
4. Output both:
   - **Notebook tables + plots**
   - A **simple Plotly dashboard view** (scatter + ranked table)

### Quick question (1 line answer)
Do you want mood clustering to use:
- **A)** only Spotify audio features (energy/valence/tempo/etc.)
- **B)** audio features + popularity/explicit/duration


In [None]:
# Fetch playlist tracks + audio features into df_playlist (run after creds are available)
import os
import time
import base64
import requests
import pandas as pd

SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
SPOTIFY_API_BASE = "https://api.spotify.com/v1"


def spotify_get_access_token(client_id: str, client_secret: str) -> str:
    auth_b64 = base64.b64encode(f"{client_id}:{client_secret}".encode("utf-8")).decode("utf-8")
    headers = {"Authorization": f"Basic {auth_b64}"}
    data = {"grant_type": "client_credentials"}
    r = requests.post(SPOTIFY_TOKEN_URL, headers=headers, data=data, timeout=30)
    r.raise_for_status()
    return r.json()["access_token"]


def spotify_get(url: str, token: str, params: dict | None = None):
    headers = {"Authorization": f"Bearer {token}"}
    r = requests.get(url, headers=headers, params=params, timeout=30)
    r.raise_for_status()
    return r.json()


def fetch_playlist_tracks(playlist_id: str, token: str, limit: int = 100) -> list[dict]:
    items = []
    offset = 0
    while True:
        j = spotify_get(
            f"{SPOTIFY_API_BASE}/playlists/{playlist_id}/tracks",
            token,
            params={"limit": limit, "offset": offset, "market": "US"},
        )
        batch = j.get("items", [])
        items.extend(batch)
        if j.get("next") is None:
            break
        offset += limit
        time.sleep(0.05)
    return items


def fetch_audio_features(track_ids: list[str], token: str, batch_size: int = 100) -> dict:
    out = {}
    for i in range(0, len(track_ids), batch_size):
        chunk = track_ids[i : i + batch_size]
        j = spotify_get(
            f"{SPOTIFY_API_BASE}/audio-features",
            token,
            params={"ids": ",".join(chunk)},
        )
        for af in j.get("audio_features", []) or []:
            if af and af.get("id"):
                out[af["id"]] = af
        time.sleep(0.05)
    return out


client_id = os.getenv("SPOTIPY_CLIENT_ID")
client_secret = os.getenv("SPOTIPY_CLIENT_SECRET")
if not client_id or not client_secret:
    raise RuntimeError(
        "Missing SPOTIPY_CLIENT_ID / SPOTIPY_CLIENT_SECRET. "
        "Run Cell 5 (loads .env) or set os.environ[...] in Cell 4, then re-run."
    )

token = spotify_get_access_token(client_id, client_secret)
playlist_id = parse_playlist_id(playlist_url)
raw_items = fetch_playlist_tracks(playlist_id, token)

rows = []
for pos, it in enumerate(raw_items):
    tr = (it or {}).get("track") or {}
    if not tr or tr.get("id") is None:
        continue
    rows.append(
        {
            "position": pos,
            "added_at": it.get("added_at"),
            "track_id": tr.get("id"),
            "track_name": tr.get("name"),
            "artist_name": (tr.get("artists") or [{}])[0].get("name"),
            "album_name": (tr.get("album") or {}).get("name"),
            "popularity": tr.get("popularity"),
            "explicit": tr.get("explicit"),
            "duration_ms": tr.get("duration_ms"),
        }
    )

df_tracks = pd.DataFrame(rows)
track_ids = df_tracks["track_id"].dropna().unique().tolist()

af_map = fetch_audio_features(track_ids, token)
df_af = pd.DataFrame(list(af_map.values()))

df_playlist = df_tracks.merge(df_af, left_on="track_id", right_on="id", how="left")
if "added_at" in df_playlist.columns:
    df_playlist["added_at"] = pd.to_datetime(df_playlist["added_at"], errors="coerce", utc=True)

print("Tracks in playlist (after filtering unavailable):", len(df_tracks))
print("Tracks with audio features:", df_playlist["danceability"].notna().sum())

display(df_playlist.head(5))

missing_af = df_playlist[df_playlist["danceability"].isna()][["track_name", "artist_name", "track_id"]].head(10)
if len(missing_af) > 0:
    print("\nExamples of tracks missing audio features (first 10):")
    display(missing_af)


## Run this now (to pull data and start modeling)

### 0) **Security first (important)**
You pasted a Spotify **client secret** into chat earlier. Treat it as compromised:
- Go to Spotify Developer Dashboard → your app → **rotate/regenerate** the secret.
- Update your local `.env` with the **new** secret.

### 1) Put credentials into a local `.env` (recommended)
Create a `.env` file in the same folder you run this notebook (don’t commit it):

```
SPOTIPY_CLIENT_ID="..."
SPOTIPY_CLIENT_SECRET="..."
```

### 2) Run cells in this order
1. **Cell 5** → loads `.env` (if present) and checks env vars
2. **Cell 7** → fetches playlist tracks + audio features into `df_playlist`

When Cell 7 succeeds, you should see `df_playlist.head()` and counts of tracks/features.

### 3) Quick choice for clustering features (reply A or B)
- **A)** Use only Spotify audio features (danceability/energy/valence/tempo/loudness/acousticness/etc.)
- **B)** Use audio features + metadata (popularity/explicit/duration)

(For “mood”, I recommend **A** for the first pass; metadata can bias the clusters toward mainstream vs niche rather than sonic similarity.)


In [None]:
# This cell loads credentials (optionally from a local .env) and then fetches the playlist into df_playlist.
# It does NOT print secrets; it only verifies presence.

import os

try:
    from dotenv import load_dotenv
    load_dotenv()
except Exception:
    pass

needed = ["SPOTIPY_CLIENT_ID", "SPOTIPY_CLIENT_SECRET"]
missing = [k for k in needed if not os.getenv(k)]
print("Missing env vars:", missing)

if missing:
    raise RuntimeError(
        "Credentials not found. Create a local .env with SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET (do not commit), "
        "or set them in your shell, then re-run this cell."
    )

print("Credentials found. Fetching playlist into df_playlist...")

# --- Fetch playlist tracks + audio features (same logic as Cell 7) ---
import time
import base64
import requests
import pandas as pd

SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
SPOTIFY_API_BASE = "https://api.spotify.com/v1"


def spotify_get_access_token(client_id: str, client_secret: str) -> str:
    auth_b64 = base64.b64encode(f"{client_id}:{client_secret}".encode("utf-8")).decode("utf-8")
    headers = {"Authorization": f"Basic {auth_b64}"}
    data = {"grant_type": "client_credentials"}
    r = requests.post(SPOTIFY_TOKEN_URL, headers=headers, data=data, timeout=30)
    r.raise_for_status()
    return r.json()["access_token"]


def spotify_get(url: str, token: str, params: dict | None = None):
    headers = {"Authorization": f"Bearer {token}"}
    r = requests.get(url, headers=headers, params=params, timeout=30)
    r.raise_for_status()
    return r.json()


def fetch_playlist_tracks(playlist_id: str, token: str, limit: int = 100) -> list[dict]:
    items = []
    offset = 0
    while True:
        j = spotify_get(
            f"{SPOTIFY_API_BASE}/playlists/{playlist_id}/tracks",
            token,
            params={"limit": limit, "offset": offset, "market": "US"},
        )
        batch = j.get("items", [])
        items.extend(batch)
        if j.get("next") is None:
            break
        offset += limit
        time.sleep(0.05)
    return items


def fetch_audio_features(track_ids: list[str], token: str, batch_size: int = 100) -> dict:
    out = {}
    for i in range(0, len(track_ids), batch_size):
        chunk = track_ids[i : i + batch_size]
        j = spotify_get(
            f"{SPOTIFY_API_BASE}/audio-features",
            token,
            params={"ids": ",".join(chunk)},
        )
        for af in j.get("audio_features", []) or []:
            if af and af.get("id"):
                out[af["id"]] = af
        time.sleep(0.05)
    return out


client_id = os.getenv("SPOTIPY_CLIENT_ID")
client_secret = os.getenv("SPOTIPY_CLIENT_SECRET")

token = spotify_get_access_token(client_id, client_secret)
playlist_id = parse_playlist_id(playlist_url)
raw_items = fetch_playlist_tracks(playlist_id, token)

rows = []
for pos, it in enumerate(raw_items):
    tr = (it or {}).get("track") or {}
    if not tr or tr.get("id") is None:
        continue
    rows.append(
        {
            "position": pos,
            "added_at": it.get("added_at"),
            "track_id": tr.get("id"),
            "track_name": tr.get("name"),
            "artist_name": (tr.get("artists") or [{}])[0].get("name"),
            "album_name": (tr.get("album") or {}).get("name"),
            "popularity": tr.get("popularity"),
            "explicit": tr.get("explicit"),
            "duration_ms": tr.get("duration_ms"),
        }
    )

df_tracks = pd.DataFrame(rows)
track_ids = df_tracks["track_id"].dropna().unique().tolist()

af_map = fetch_audio_features(track_ids, token)
df_af = pd.DataFrame(list(af_map.values()))

df_playlist = df_tracks.merge(df_af, left_on="track_id", right_on="id", how="left")
if "added_at" in df_playlist.columns:
    df_playlist["added_at"] = pd.to_datetime(df_playlist["added_at"], errors="coerce", utc=True)

print("Tracks in playlist (after filtering unavailable):", len(df_tracks))
print("Tracks with audio features:", df_playlist["danceability"].notna().sum())

display(df_playlist.head(5))

missing_af = df_playlist[df_playlist["danceability"].isna()][["track_name", "artist_name", "track_id"]].head(10)
if len(missing_af) > 0:
    print("\nExamples of tracks missing audio features (first 10):")
    display(missing_af)


In [9]:
# Fetch playlist into df_playlist (expects SPOTIPY_CLIENT_ID and SPOTIPY_CLIENT_SECRET in env or a local .env)
# Behavior:
# - If creds are missing: print instructions and set df_playlist=None (no exception)
# - If creds exist: download playlist tracks + audio features into df_playlist

import os
import time
import base64
import requests
import pandas as pd

# Load a local .env if present (recommended; do not commit)
try:
    from dotenv import load_dotenv
    load_dotenv()
except Exception:
    pass

needed = ["SPOTIPY_CLIENT_ID", "SPOTIPY_CLIENT_SECRET"]
missing = [k for k in needed if not os.getenv(k)]
print("Missing env vars:", missing)

if missing:
    print("\nCurrent working directory:", os.getcwd())
    print(
        "\nCreate a local .env file in the working directory (do NOT commit) with:\n"
        "  SPOTIPY_CLIENT_ID='...'\n"
        "  SPOTIPY_CLIENT_SECRET='...'\n\n"
        "Then re-run this cell.\n\n"
        "Security note: you pasted a client secret into chat earlier—rotate/regenerate it in Spotify Dashboard."
    )
    df_playlist = None
else:
    SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
    SPOTIFY_API_BASE = "https://api.spotify.com/v1"

    def spotify_get_access_token(client_id: str, client_secret: str) -> str:
        auth_b64 = base64.b64encode(f"{client_id}:{client_secret}".encode("utf-8")).decode("utf-8")
        headers = {"Authorization": f"Basic {auth_b64}"}
        data = {"grant_type": "client_credentials"}
        r = requests.post(SPOTIFY_TOKEN_URL, headers=headers, data=data, timeout=30)
        r.raise_for_status()
        return r.json()["access_token"]

    def spotify_get(url: str, token: str, params: dict | None = None):
        """GET wrapper with basic rate-limit handling and better error messages."""
        headers = {"Authorization": f"Bearer {token}"}
        r = requests.get(url, headers=headers, params=params, timeout=30)
        if r.status_code == 429:
            retry_after = float(r.headers.get("Retry-After", "1"))
            time.sleep(retry_after)
            r = requests.get(url, headers=headers, params=params, timeout=30)
        if not r.ok:
            try:
                err = r.json()
            except Exception:
                err = r.text
            raise requests.HTTPError(
                f"{r.status_code} {r.reason} for {r.url} :: {err} :: headers={dict(r.headers)}"
            )
        return r.json()

    def fetch_playlist_tracks(playlist_id: str, token: str, limit: int = 100) -> list[dict]:
        """Paginate via Spotify's `next` URL to avoid offset edge-cases."""
        items: list[dict] = []
        url = f"{SPOTIFY_API_BASE}/playlists/{playlist_id}/tracks"
        # Keep params minimal; `fields` reduces payload and sometimes avoids edge-case failures.
        params: dict | None = {
            "limit": limit,
            "offset": 0,
            "fields": "items(added_at,track(id,name,popularity,explicit,duration_ms,artists(name),album(name))),next",
        }

        while True:
            j = spotify_get(url, token, params=params)
            items.extend(j.get("items", []) or [])
            nxt = j.get("next")
            if not nxt:
                break
            url, params = nxt, None
            time.sleep(0.05)

        return items

    def fetch_audio_features(track_ids: list[str], token: str, batch_size: int = 100) -> dict:
        out: dict = {}
        for i in range(0, len(track_ids), batch_size):
            chunk = track_ids[i : i + batch_size]
            j = spotify_get(
                f"{SPOTIFY_API_BASE}/audio-features",
                token,
                params={"ids": ",".join(chunk)},
            )
            for af in j.get("audio_features", []) or []:
                if af and af.get("id"):
                    out[af["id"]] = af
            time.sleep(0.05)
        return out

    client_id = os.getenv("SPOTIPY_CLIENT_ID")
    client_secret = os.getenv("SPOTIPY_CLIENT_SECRET")

    # playlist_url and parse_playlist_id are expected from earlier cells
    playlist_id = parse_playlist_id(playlist_url)
    token = spotify_get_access_token(client_id, client_secret)

    # Diagnostic: can we read playlist metadata at all?
    playlist_meta = spotify_get(
        f"{SPOTIFY_API_BASE}/playlists/{playlist_id}",
        token,
        params={"fields": "id,name,public,collaborative,owner(id,display_name),tracks.href,tracks.total"},
    )
    print("Playlist meta:", playlist_meta)

    # Fetch track items
    try:
        raw_items = fetch_playlist_tracks(playlist_id, token)
    except requests.HTTPError as e:
        # For this playlist we can read `/playlists/{id}` but `/playlists/{id}/tracks` is returning 403.
        # That usually means Client Credentials tokens are insufficient and we need a USER OAuth token.
        print("\nERROR fetching playlist tracks with Client Credentials.")
        print("Details:", e)
        print(
            "\nNext step: switch to Authorization Code flow (user OAuth) and request scope: playlist-read-private. "
            "Even for public playlists, Spotify sometimes blocks the tracks endpoint for app-only tokens.\n"
            "If you want a quick sanity check: try replacing `playlist_url` with a known Spotify editorial playlist URL; "
            "if that works, it confirms this is an auth-mode issue for this particular playlist."
        )
        df_playlist = None
        raw_items = []

    if not raw_items:
        raise RuntimeError(
            "Could not fetch playlist tracks (empty result). If you continue to see 403 Forbidden for the tracks endpoint, "
            "your playlist likely requires a USER OAuth token (Authorization Code flow) rather than Client Credentials. "
            "As a quick test, try a known public Spotify editorial playlist; if that works, we'll switch auth flow for user playlists."
        )

    # Build track table
    rows = []
    for pos, it in enumerate(raw_items):
        tr = (it or {}).get("track") or {}
        if not tr or tr.get("id") is None:
            continue
        rows.append(
            {
                "position": pos,
                "added_at": it.get("added_at"),
                "track_id": tr.get("id"),
                "track_name": tr.get("name"),
                "artist_name": (tr.get("artists") or [{}])[0].get("name"),
                "album_name": (tr.get("album") or {}).get("name"),
                "popularity": tr.get("popularity"),
                "explicit": tr.get("explicit"),
                "duration_ms": tr.get("duration_ms"),
            }
        )

    df_tracks = pd.DataFrame(rows)
    track_ids = df_tracks["track_id"].dropna().unique().tolist()

    af_map = fetch_audio_features(track_ids, token)
    df_af = pd.DataFrame(list(af_map.values()))

    df_playlist = df_tracks.merge(df_af, left_on="track_id", right_on="id", how="left")
    if "added_at" in df_playlist.columns:
        df_playlist["added_at"] = pd.to_datetime(df_playlist["added_at"], errors="coerce", utc=True)

    print("Tracks in playlist (after filtering unavailable):", len(df_tracks))
    print("Tracks with audio features:", int(df_playlist["danceability"].notna().sum()) if "danceability" in df_playlist.columns else 0)

    display(
        df_playlist[
            [
                "position",
                "track_name",
                "artist_name",
                "popularity",
                "danceability",
                "energy",
                "valence",
                "tempo",
                "acousticness",
            ]
        ].head(10)
    )


Missing env vars: []
Playlist meta: {'owner': {'display_name': 'sahith', 'id': 'sahith360'}, 'collaborative': False, 'name': 'sahith songs 1/24/26', 'public': True, 'id': '7qF4mOQrz3xPssEeYTc53q'}


HTTPError: 403 Forbidden for https://api.spotify.com/v1/playlists/7qF4mOQrz3xPssEeYTc53q/tracks?limit=100&offset=0&fields=items%28added_at%2Ctrack%28id%2Cname%2Cpopularity%2Cexplicit%2Cduration_ms%2Cartists%28name%29%2Calbum%28name%29%29%29%2Cnext :: {'error': {'status': 403, 'message': 'Forbidden'}} :: headers={'content-type': 'application/json; charset=utf-8', 'cache-control': 'private, max-age=0', 'access-control-allow-origin': '*', 'access-control-allow-headers': 'Accept, App-Platform, Authorization, Content-Type, Origin, Retry-After, Spotify-App-Version, X-Cloud-Trace-Context, client-token, content-access-token', 'access-control-allow-methods': 'GET, POST, OPTIONS, PUT, DELETE, PATCH', 'access-control-allow-credentials': 'true', 'access-control-max-age': '604800', 'strict-transport-security': 'max-age=31536000', 'x-content-type-options': 'nosniff', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000, h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'content-encoding': 'gzip', 'vary': 'Accept-Encoding', 'date': 'Sat, 21 Feb 2026 10:05:19 GMT', 'server': 'envoy', 'via': 'HTTP/2 edgeproxy, 1.1 google', 'Transfer-Encoding': 'chunked'}

## Action needed (so we can fetch the playlist and start clustering)

### 0) Security: rotate the leaked secret
Because the **client secret was pasted into chat**, treat it as compromised:
- Spotify Developer Dashboard → your app → **rotate/regenerate** the client secret
- Use the **new** secret below.

### 1) Create a local `.env` file (recommended)
Create a file named **`.env`** in the notebook working directory:

- Working directory (from Cell 10 output): `/Users/sahith/Documents/Projects`

`.env` contents (don’t commit this file):
```
SPOTIPY_CLIENT_ID="6422afd1bde845a2b5fa72150b8270ed"
SPOTIPY_CLIENT_SECRET="<PASTE_YOUR_NEW_ROTATED_SECRET_HERE>"
```

### 2) Run the fetch cell
Re-run **Cell 10** (the “Fetch playlist into df_playlist” cell).

When it works, you should see:
- number of tracks fetched
- `df_playlist.head()` preview

### 3) Confirm one choice (reply A or B)
For clustering “moods”, should we use:
- **A)** only Spotify audio features (recommended for mood)
- **B)** audio features + metadata (popularity/explicit/duration)


In [None]:
# Step 1/3: Prepare modeling matrix X from df_playlist (run after Cell 10 succeeds)
# - Validates df_playlist
# - Selects mood features
# - Drops rows with missing features
# - Robust-scales by median/IQR (outlier-resistant)

import numpy as np
import pandas as pd

if "df_playlist" not in globals() or df_playlist is None or not isinstance(df_playlist, pd.DataFrame) or df_playlist.empty:
    raise RuntimeError("df_playlist is not available. Run Cell 10 to fetch playlist data first.")

# Spotify audio features for 'mood' clustering (interpretable + sonic)
mood_features = [
    "danceability",
    "energy",
    "valence",
    "tempo",
    "loudness",
    "acousticness",
    "instrumentalness",
    "liveness",
    "speechiness",
]

available = [c for c in mood_features if c in df_playlist.columns]
missing = sorted(set(mood_features) - set(available))
print("Mood features available:", available)
if missing:
    print("Warning: missing expected features:", missing)

# Coerce to numeric and drop rows missing any of the required features
_df = df_playlist.copy()
for c in available:
    _df[c] = pd.to_numeric(_df[c], errors="coerce")

n0 = len(_df)
df_model = _df.dropna(subset=available).reset_index(drop=True)
print(f"Rows kept for modeling (non-missing features): {len(df_model)} / {n0}")

# Robust scaling: (x - median) / IQR
med = df_model[available].median()
iqr = (df_model[available].quantile(0.75) - df_model[available].quantile(0.25)).replace(0, np.nan)
X = (df_model[available] - med) / iqr
X = X.fillna(0.0)

# Quick sanity checks
print("X shape:", X.shape)
print("Any NaNs in X:", bool(np.isnan(X.to_numpy()).any()))

# Keep these for next steps
# - df_model: cleaned rows aligned to X
# - X: robust-scaled feature matrix
# - available: final feature list
# - med, iqr: for explanations


In [None]:
# Step 2/3: Fit a clustering model to discover N "moods" (Gaussian Mixture) and assign each track to a mood
# - Requires: df_model, X from Cell 12
# - Outputs: gmm, labels, cluster_probs

import numpy as np
import pandas as pd
from sklearn.mixture import GaussianMixture

if "df_model" not in globals() or "X" not in globals():
    raise RuntimeError("df_model/X not found. Run Cell 12 after df_playlist is fetched.")

# Choose a small range for K; playlists are usually 1-5 vibes
k_range = list(range(1, min(7, len(df_model)) + 1))  # cap by #rows

bics = []
models = []
for k in k_range:
    g = GaussianMixture(
        n_components=k,
        covariance_type="full",
        random_state=42,
        n_init=10,
        reg_covar=1e-4,
    )
    g.fit(X)
    bics.append(g.bic(X))
    models.append(g)

best_idx = int(np.argmin(bics))
best_k = k_range[best_idx]
gmm = models[best_idx]

labels = gmm.predict(X)
cluster_probs = gmm.predict_proba(X)

print("Tried K:", k_range)
print("BICs:", [round(v, 1) for v in bics])
print("Selected K (lowest BIC):", best_k)

# Quick cluster sizes for sanity
sizes = pd.Series(labels).value_counts().sort_index()
print("Cluster sizes:")
print(sizes.to_string())


In [28]:
# Spotify *user* OAuth (Authorization Code with PKCE) to fetch playlist tracks when Client Credentials returns 403.
# Two-phase notebook-friendly flow:
#   Phase 1: run this cell -> prints an authorization URL.
#   Phase 2: after approving, set:
#       os.environ['SPOTIFY_REDIRECTED_URL'] = '<FULL_REDIRECT_URL>'
#     then re-run this same cell to exchange code -> access_token.
#
# IMPORTANT redirect URI note:
# - Spotify often rejects `https://localhost:...` with: INVALID_CLIENT: Insecure redirect URI
# - Use an HTTP loopback redirect and add it EXACTLY in Spotify Dashboard → your app → Settings.
# - Per your latest update, your redirect URI is:
#     http://127.0.0.1:8888/callback

import os
import base64
import hashlib
import secrets
import time
from urllib.parse import urlencode, urlparse, parse_qs

import requests
import pandas as pd

SPOTIFY_AUTH_URL = "https://accounts.spotify.com/authorize"
SPOTIFY_TOKEN_URL = "https://accounts.spotify.com/api/token"
SPOTIFY_API_BASE = "https://api.spotify.com/v1"

# Load .env if present
try:
    from dotenv import load_dotenv

    load_dotenv()
except Exception:
    pass

client_id = os.getenv("SPOTIPY_CLIENT_ID")
if not client_id:
    raise RuntimeError("Missing SPOTIPY_CLIENT_ID. Put it in your .env and re-run.")

# Must EXACTLY match Spotify app setting.
redirect_uri = os.getenv("SPOTIPY_REDIRECT_URI") or "http://127.0.0.1:8888/callback"

# Scopes: include private/collab to avoid playlist edge cases
scope = "playlist-read-private playlist-read-collaborative"


def _b64url(b: bytes) -> str:
    return base64.urlsafe_b64encode(b).decode("utf-8").rstrip("=")


# Persist PKCE verifier/state across reruns so phase 2 works reliably
code_verifier = os.getenv("SPOTIFY_PKCE_VERIFIER")
state = os.getenv("SPOTIFY_OAUTH_STATE")
if not code_verifier or not state:
    code_verifier = _b64url(secrets.token_bytes(32))
    state = _b64url(secrets.token_bytes(16))
    os.environ["SPOTIFY_PKCE_VERIFIER"] = code_verifier
    os.environ["SPOTIFY_OAUTH_STATE"] = state

code_challenge = _b64url(hashlib.sha256(code_verifier.encode("utf-8")).digest())

auth_url = (
    f"{SPOTIFY_AUTH_URL}?"
    + urlencode(
        {
            "client_id": client_id,
            "response_type": "code",
            "redirect_uri": redirect_uri,
            "scope": scope,
            "code_challenge_method": "S256",
            "code_challenge": code_challenge,
            "state": state,
        }
    )
)

print("Redirect URI being used (must match app settings exactly):", redirect_uri)
print(
    "\nIf you see INVALID_CLIENT: Insecure redirect URI:\n"
    "- Spotify Developer Dashboard → your app → Settings\n"
    "- Add the redirect URI shown above EXACTLY (http vs https, port, path)\n"
    "- Save, then re-run this cell to generate a fresh auth URL\n"
)

print("PHASE 1: Open this URL in your browser and approve:\n")
print(auth_url)

redirected = (os.getenv("SPOTIFY_REDIRECTED_URL") or "").strip()
if not redirected:
    print(
        "\nPHASE 2: After approval, copy the FULL redirected URL and set in a new cell:\n"
        "  import os\n"
        "  os.environ['SPOTIFY_REDIRECTED_URL'] = '<PASTE_FULL_URL_HERE>'\n"
        "Then re-run this cell."
    )
else:
    # Phase 2: parse and exchange
    u = urlparse(redirected)
    qs = parse_qs(u.query)

    if "error" in qs:
        raise RuntimeError(f"OAuth error: {qs.get('error')}")

    code = qs.get("code", [None])[0]
    returned_state = qs.get("state", [None])[0]

    if not code:
        raise RuntimeError(
            "No `code` found in SPOTIFY_REDIRECTED_URL. Make sure you pasted the full redirected URL."
        )

    if returned_state != state:
        print(
            "WARNING: OAuth state mismatch. This usually means SPOTIFY_REDIRECTED_URL is from a different auth attempt.\n"
            "Proceeding anyway; if token exchange fails, clear env vars SPOTIFY_PKCE_VERIFIER and SPOTIFY_OAUTH_STATE and retry."
        )

    r = requests.post(
        SPOTIFY_TOKEN_URL,
        data={
            "client_id": client_id,
            "grant_type": "authorization_code",
            "code": code,
            "redirect_uri": redirect_uri,
            "code_verifier": code_verifier,
        },
        timeout=30,
    )

    if not r.ok:
        raise RuntimeError(f"Token exchange failed: {r.status_code} {r.text}")

    tok = r.json()
    access_token = tok["access_token"]
    print("\nUser access token acquired. Expires_in (sec):", tok.get("expires_in"))

    # Save token into env for downstream cells if desired
    os.environ["SPOTIFY_USER_ACCESS_TOKEN"] = access_token

    # Minimal verification call
    me = requests.get(
        f"{SPOTIFY_API_BASE}/me",
        headers={"Authorization": f"Bearer {access_token}"},
        timeout=30,
    )
    if me.ok:
        print("Authenticated as:", me.json().get("id"))
    else:
        print("NOTE: /me check failed (this can happen if token has limited scopes):", me.status_code, me.text)

    print("\nNext step: use SPOTIFY_USER_ACCESS_TOKEN to fetch playlist tracks + audio features.")


Redirect URI being used (must match app settings exactly): http://127.0.0.1:8888/callback

If you see INVALID_CLIENT: Insecure redirect URI:
- Spotify Developer Dashboard → your app → Settings
- Add the redirect URI shown above EXACTLY (http vs https, port, path)
- Save, then re-run this cell to generate a fresh auth URL

PHASE 1: Open this URL in your browser and approve:

https://accounts.spotify.com/authorize?client_id=6422afd1bde845a2b5fa72150b8270ed&response_type=code&redirect_uri=http%3A%2F%2F127.0.0.1%3A8888%2Fcallback&scope=playlist-read-private+playlist-read-collaborative&code_challenge_method=S256&code_challenge=DmJUwaz3g1QsmzidVdfRiPnEOk8TY53dAyIXxwqjgG8&state=savCJtOxdeXvMRp93j7IMg

User access token acquired. Expires_in (sec): 3600
Authenticated as: sahith360

Next step: use SPOTIFY_USER_ACCESS_TOKEN to fetch playlist tracks + audio features.


In [30]:
# Fetch playlist tracks + audio features using a USER OAuth access token (PKCE)
# Creates: df_playlist (tidy table for modeling)

import os
import time
import requests
import pandas as pd

SPOTIFY_API_BASE = "https://api.spotify.com/v1"

playlist_id = parse_playlist_id(playlist_url)
user_token = (os.getenv("SPOTIFY_USER_ACCESS_TOKEN") or "").strip()

if not user_token:
    raise RuntimeError(
        "Missing SPOTIFY_USER_ACCESS_TOKEN. Run the PKCE auth cell (Cell 14) through Phase 2 so it exchanges the code and sets the token."
    )


def spotify_get_json(url: str, token: str, params: dict | None = None) -> dict:
    headers = {"Authorization": f"Bearer {token}"}
    r = requests.get(url, headers=headers, params=params, timeout=30)
    if r.status_code == 429:
        retry_after = float(r.headers.get("Retry-After", "1"))
        time.sleep(retry_after)
        r = requests.get(url, headers=headers, params=params, timeout=30)
    if not r.ok:
        try:
            err = r.json()
        except Exception:
            err = r.text
        raise requests.HTTPError(f"{r.status_code} {r.reason} for {r.url} :: {err}")
    return r.json()


# Quick token sanity check
me = spotify_get_json(f"{SPOTIFY_API_BASE}/me", user_token)
print("Authenticated as:", me.get("id"))

# Playlist meta sanity check
meta = spotify_get_json(
    f"{SPOTIFY_API_BASE}/playlists/{playlist_id}",
    user_token,
    params={"fields": "id,name,public,collaborative,owner(id,display_name),tracks(total,href)"},
)
print("Playlist:", meta.get("name"), "| owner:", (meta.get("owner") or {}).get("id"), "| total:", (meta.get("tracks") or {}).get("total"))


def fetch_playlist_track_items(playlist_id: str, token: str, limit: int = 100) -> list[dict]:
    items: list[dict] = []
    url = f"{SPOTIFY_API_BASE}/playlists/{playlist_id}/tracks"
    params: dict | None = {
        "limit": limit,
        "offset": 0,
        "additional_types": "track",
        "fields": "items(added_at,track(id,name,popularity,explicit,duration_ms,artists(name),album(name))),next",
    }
    while True:
        j = spotify_get_json(url, token, params=params)
        items.extend(j.get("items", []) or [])
        nxt = j.get("next")
        if not nxt:
            break
        url, params = nxt, None
        time.sleep(0.05)
    return items


def fetch_audio_features(track_ids: list[str], token: str, batch_size: int = 100) -> dict:
    out: dict = {}
    for i in range(0, len(track_ids), batch_size):
        chunk = track_ids[i : i + batch_size]
        j = spotify_get_json(
            f"{SPOTIFY_API_BASE}/audio-features",
            token,
            params={"ids": ",".join(chunk)},
        )
        for af in j.get("audio_features", []) or []:
            if af and af.get("id"):
                out[af["id"]] = af
        time.sleep(0.05)
    return out


raw_items = fetch_playlist_track_items(playlist_id, user_token)

rows = []
for pos, it in enumerate(raw_items):
    tr = (it or {}).get("track") or {}
    if not tr or tr.get("id") is None:
        continue
    rows.append(
        {
            "position": pos,
            "added_at": it.get("added_at"),
            "track_id": tr.get("id"),
            "track_name": tr.get("name"),
            "artist_name": (tr.get("artists") or [{}])[0].get("name"),
            "album_name": (tr.get("album") or {}).get("name"),
            "popularity": tr.get("popularity"),
            "explicit": tr.get("explicit"),
            "duration_ms": tr.get("duration_ms"),
        }
    )

df_tracks = pd.DataFrame(rows)
track_ids = df_tracks["track_id"].dropna().unique().tolist()

af_map = fetch_audio_features(track_ids, user_token)
df_af = pd.DataFrame(list(af_map.values()))

df_playlist = df_tracks.merge(df_af, left_on="track_id", right_on="id", how="left")
df_playlist["added_at"] = pd.to_datetime(df_playlist["added_at"], errors="coerce", utc=True)

print("Tracks fetched:", len(df_tracks))
print("Tracks with audio features:", int(df_playlist["danceability"].notna().sum()) if "danceability" in df_playlist.columns else 0)

display(df_playlist[["position","track_name","artist_name","popularity","danceability","energy","valence","tempo","acousticness"]].head(10))


Authenticated as: sahith360
Playlist: sahith songs 1/24/26 | owner: sahith360 | total: None


HTTPError: 403 Forbidden for https://api.spotify.com/v1/playlists/7qF4mOQrz3xPssEeYTc53q/tracks?limit=100&offset=0&additional_types=track&fields=items%28added_at%2Ctrack%28id%2Cname%2Cpopularity%2Cexplicit%2Cduration_ms%2Cartists%28name%29%2Calbum%28name%29%29%29%2Cnext :: {'error': {'status': 403, 'message': 'Forbidden'}}