# Text Classification: Genius Song Lyrics (1%)

**Dataset:** 34'049 Songs · 26'408 Artists · 6 Genres
**Genres:** Rap / Hip-Hop · Rock · Pop · R&B · Country · Miscellaneous

**Purpose:**
Use the best performing model from `model-evaluation.ipynb` to classify new song lyrics into genres. This notebook serves as a prototype for an interactive text-classification demo and allows predicting single lyrics or multiple lyrics in batch.

**Selected Model:**
SentenceTransformer (**all-MiniLM-L6-v2**) + **LinearSVC**

---

# 1. Imports and Setup
## 1.1 Import Libraries

In [None]:
import joblib
import numpy as np
from sentence_transformers import SentenceTransformer

## 1.2 Load Trained Model and Label Encoder

In [None]:
# Load classifier and label encoder
clf_st_svc = joblib.load("models/clf_st_svc.joblib")
label_encoder = joblib.load("models/label_encoder.joblib")

# Load SentenceTransformer model
st_model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")

print("Model and label encoder loaded.")
print("Genres:", list(label_encoder.classes_))

# 2. Classification
## 2.1 Classification of one Lyric

In [None]:
lyrics = """
Yeah I'm driving through the city late at night,
lights low, bass loud, trouble on my mind...
"""

In [None]:
lyrics_clean = lyrics.strip()

In [None]:
embedding_tensor = st_model.encode(
    [lyrics_clean],
    batch_size=16,
    show_progress_bar=False,
    convert_to_numpy=False,
    convert_to_tensor=True,
)

# convert to python list
embedding = embedding_tensor.tolist()

In [None]:
pred_idx = clf_st_svc.predict(embedding)[0]
pred_genre = label_encoder.inverse_transform([pred_idx])[0]

print("Predicted genre:", pred_genre)

## 2.2 Classification of more Lyrics

In [None]:
texts = [
    "Yeah, I'm riding through the city with my homies late at night...",
    "Baby, I miss you every single day, I can't get you off my mind...",
    "Whiskey on the dashboard, small town lights and dusty roads...",
    "The crowd is roaring, the drums are loud, the stage is burning..."
]

In [None]:
emb = st_model.encode(
    [t.strip() for t in texts],
    convert_to_numpy=False,
    convert_to_tensor=True,
    show_progress_bar=False,
)
emb_list = emb.tolist()

In [None]:
pred_idx = clf_st_svc.predict(emb_list)
pred_genres = label_encoder.inverse_transform(pred_idx)

for t, g in zip(texts, pred_genres):
    print(t[:80] + "...")
    print("→", g)
    print("-" * 50)

## 2.3 Interpretation
The predictions look quite intuitive:

- City + homies + late night -> rock
(could also fit rap, but the overall vibe leans more toward “rebellious/rock-ish”)

- “I miss you every single day” -> country
(classic heartbreak theme)

- Whiskey + dusty roads + small town -> country
(so country it’s almost a stereotype)

- Crowd, drums, stage is burning -> pop
(clear stadium/performance energy)

**Summary:**
The classifier assigns genres in a way that aligns well with typical lyrical themes. Even with short inputs, it captures stylistic cues reliably.