## Emotion Classification with PyTorch (lyrics embeddings)

This notebook builds a small neural network using **PyTorch** to classify a song's **emotion** starting from **clean text lyrics**.

We assume we start from a `pandas` DataFrame with at least two columns:

- `Lyrics`: preprocessed song lyrics as text
- `Emotion`: categorical label (e.g. `"Happy"`, `"Sad"`, ...)

We then call a **mocked sentence transformer encoder** (which you will later replace with a real model) to turn each lyric into a fixed-size embedding vector of a **typical dimensionality for this task** (e.g. 768).

The steps are:

1. Generate a synthetic dataset of **lyrics → emotion** pairs.
2. Pass the lyrics through a **mock sentence transformer** to obtain embeddings.
3. Segment the embeddings into **training** and **test** sets.
4. Build a small **feedforward neural network** in PyTorch on top of the embeddings.
5. Train the network using **k-fold cross-validation** on the training set.
6. Evaluate the model using **precision**, **recall**, and **F1-score**.

Later, you can plug in your real lyrics dataset and real sentence-transformer encoder while keeping the rest of the pipeline unchanged.


In [1]:
# Imports and configuration

import numpy as np
import pandas as pd

from torch.utils.tensorboard import SummaryWriter
from datetime import datetime

from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import precision_recall_fscore_support, classification_report

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Device selection
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


Using device: cpu


## Load or create lyrics–emotion data

In this cell we either:
- Load your real lyrics dataset into a `pandas` DataFrame with columns `Lyrics` and `Emotion`, **or**
- Create a small synthetic dataset of lyrics–emotion pairs for demo purposes.

We then call a **mocked sentence transformer encoder** to turn each lyric into a dense embedding of a typical size for this kind of task (e.g. 768). Later, you can replace the synthetic data and the mock encoder with your real dataset and real sentence-transformer model.


In [2]:
# TODO: Replace this with your real lyrics + emotion data loading
# For real data you should provide:
#   - A DataFrame `df` with at least:
#       - "Lyrics": cleaned lyrics text
#       - "Emotion": string label per track
#   - An encoder function that maps a list of texts to a 2D array of embeddings.

TEXT_COL = "Lyrics"
TARGET_COL = "Emotion"

# Typical sentence-transformer embedding size for emotion/semantic tasks
EMBEDDING_DIM = 768

# Synthetic example data (for demo). Remove when using real data.
num_samples = 100000  # keep relatively small so the demo stays fast
random_seed = 41
rng = np.random.default_rng(random_seed)

EMOTION_CLASSES = [
    "anger", "surprise", "sadness", "joy", "love", "fear"
]

# Very small synthetic vocabulary to build fake lyrics
MOCK_WORDS = [
    "love", "hate", "dance", "cry", "night", "day", "heart", "alone",
    "party", "rain", "sun", "fire", "cold", "fear", "smile", "tears",
]


def make_fake_lyric(min_len: int = 5, max_len: int = 30, rng=None) -> str:
    """Generate a random lyric-like sentence from a tiny mock vocabulary."""
    if rng is None:
        rng = np.random.default_rng()
    length = int(rng.integers(min_len, max_len + 1))
    words = rng.choice(MOCK_WORDS, size=length)
    return " ".join(words)


# Synthetic lyrics + labels
lyrics = [make_fake_lyric(rng=rng) for _ in range(num_samples)]
emotion_labels = rng.choice(EMOTION_CLASSES, size=num_samples)

df = pd.DataFrame({TEXT_COL: lyrics, TARGET_COL: emotion_labels})


def mock_sentence_transformer_encode(texts, embedding_dim: int = EMBEDDING_DIM, random_state: int | None = None) -> np.ndarray:
    """Mocked sentence-transformer encoder.

    Replace this implementation with a real sentence-transformer `encode` call.
    It should take a sequence of strings and return a 2D float32 array of
    shape (num_samples, embedding_dim).
    """
    rng_local = np.random.default_rng(random_state)
    return rng_local.normal(size=(len(texts), embedding_dim)).astype(np.float32)


# Compute synthetic embeddings from lyrics
X_embeddings = mock_sentence_transformer_encode(df[TEXT_COL].tolist(), EMBEDDING_DIM, random_state=random_seed)

print(df.head())
print("\nEmbeddings shape:", X_embeddings.shape)
print("\nClass distribution:\n", df[TARGET_COL].value_counts())


                                              Lyrics Emotion
0  tears fear cold fire dance night fear day fear...     joy
1  heart day love heart cry rain sun tears party ...     joy
2  dance fear dance smile fear cry heart dance ra...   anger
3  tears smile hate night rain dance day fear fir...     joy
4  smile cry dance tears smile dance day dance he...    fear

Embeddings shape: (100000, 768)

Class distribution:
 Emotion
sadness     16800
surprise    16758
love        16688
fear        16613
joy         16587
anger       16554
Name: count, dtype: int64


## Preprocessing and train/test split

Here we:

- Encode the `Emotion` labels to integer class IDs.
- Split the **embeddings** into **train** and **test** sets (hold-out test set).
- Optionally standardize the embedding dimensions with `StandardScaler` fitted on the training data only.


In [4]:
# Encode labels
label_encoder = LabelEncoder()
df["Emotion_encoded"] = label_encoder.fit_transform(df[TARGET_COL])

y = df["Emotion_encoded"].values.astype(np.int64)
X = X_embeddings.astype(np.float32)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=random_seed, stratify=y
)

# Scale embedding dimensions (optional but often helpful)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

n_features = X_train.shape[1]
n_classes = len(np.unique(y))

print(f"Train shape: {X_train.shape}, Test shape: {X_test.shape}")
print(f"Embedding dimension: {n_features}")
print(f"Number of classes: {n_classes}")


Train shape: (80000, 768), Test shape: (20000, 768)
Embedding dimension: 768
Number of classes: 6


## Dataset and model definitions

This section defines:

- A `Dataset` wrapper (`EmotionDataset`) for PyTorch.
- A small feedforward neural network (`SimpleEmotionNet`) for emotion classification.


In [5]:
class EmotionDataset(Dataset):
    def __init__(self, X, y):
        # X is a matrix of sentence-transformer embeddings: shape (num_samples, embedding_dim)
        self.X = torch.from_numpy(X).float()
        self.y = torch.from_numpy(y).long()

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]


class SimpleEmotionNet(nn.Module):
    """Simple emotion classifier on top of fixed embeddings.

    Input: batch of embedding vectors of shape (batch_size, embedding_dim).
    """

    def __init__(self, input_dim, num_classes):
        super(SimpleEmotionNet, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, num_classes),
        )

    def forward(self, x):
        return self.net(x)


## Training and evaluation helpers

We define helper functions to:

- Train the model for one epoch.
- Evaluate the model and compute **precision**, **recall**, and **F1-score** (macro-averaged).


In [6]:
def train_one_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0

    for X_batch, y_batch in dataloader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * X_batch.size(0)

    return running_loss / len(dataloader.dataset)


def evaluate_model(model, dataloader, device):
    model.eval()
    all_preds = []
    all_targets = []

    with torch.no_grad():
        for X_batch, y_batch in dataloader:
            X_batch = X_batch.to(device)
            outputs = model(X_batch)
            preds = torch.argmax(outputs, dim=1).cpu().numpy()
            all_preds.extend(preds)
            all_targets.extend(y_batch.numpy())

    all_preds = np.array(all_preds)
    all_targets = np.array(all_targets)

    precision, recall, f1, _ = precision_recall_fscore_support(
        all_targets, all_preds, average="macro", zero_division=0
    )

    return precision, recall, f1, all_targets, all_preds


## K-fold cross-validation on training set

We perform k-fold cross-validation on the **training** data only, to estimate generalization performance before evaluating once on the held-out test set.


In [10]:
k_folds = 5
batch_size = 32
num_epochs = 20
learning_rate = 1e-3

epochs_between_reports = 5

skf = StratifiedKFold(n_splits=k_folds, shuffle=True, random_state=42)

fold_results = []

# Tensorboard logging
log_dir = f"runs/emotion_cv_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
writer = SummaryWriter(log_dir)
hparams = {
    "learning_rate": learning_rate,
    "batch_size": batch_size,
    "k_folds": k_folds,
    "num_epochs": num_epochs,
    "epochs_between_reports": epochs_between_reports,
}
writer.add_text(
    "hparams",
    "\n".join(f"{k}: {v}" for k, v in hparams.items()),
)

for fold, (train_idx, val_idx) in enumerate(skf.split(X_train, y_train), 1):
    print(f"\n==== Fold {fold}/{k_folds} ====")

    X_tr, X_val = X_train[train_idx], X_train[val_idx]
    y_tr, y_val = y_train[train_idx], y_train[val_idx]

    train_dataset = EmotionDataset(X_tr, y_tr)
    val_dataset = EmotionDataset(X_val, y_val)

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    # Model that consumes sentence-transformer embeddings as input features
    model = SimpleEmotionNet(n_features, n_classes).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Train
    for epoch in range(1, num_epochs + 1):
        loss = train_one_epoch(model, train_loader, criterion, optimizer, device)
        writer.add_scalar(f"Loss/kcv_train/fold_{fold}", loss, epoch)
        if epoch % epochs_between_reports == 0 or epoch == 1 or epoch == num_epochs:
            print(f"Epoch {epoch}/{num_epochs} - Loss: {loss:.4f}")
            precision, recall, f1, y_true_val, y_pred_val = evaluate_model(model, val_loader, device)
            writer.add_scalar(f"F1_Score/kcv_validation/fold_{fold}", f1, epoch)

    # Evaluate on validation split
    print(f"Fold {fold} - Precision: {precision:.4f} | Recall: {recall:.4f} | F1: {f1:.4f}")

    fold_results.append({
        "fold": fold,
        "precision": precision,
        "recall": recall,
        "f1": f1,
    })

print("\n==== Cross-Validation Summary (Macro-Averaged) ====")
for r in fold_results:
    print(
        f"Fold {r['fold']}: "
        f"Precision={r['precision']:.4f}, "
        f"Recall={r['recall']:.4f}, "
        f"F1={r['f1']:.4f}"
    )

mean_precision = np.mean([r["precision"] for r in fold_results])
mean_recall = np.mean([r["recall"] for r in fold_results])
mean_f1 = np.mean([r["f1"] for r in fold_results])
print(
    f"\nMean over {k_folds} folds - "
    f"Precision={mean_precision:.4f}, Recall={mean_recall:.4f}, F1={mean_f1:.4f}"
)



==== Fold 1/5 ====
Epoch 1/20 - Loss: 1.7937
Epoch 5/20 - Loss: 1.5950
Epoch 10/20 - Loss: 1.3435
Epoch 15/20 - Loss: 1.2062
Epoch 20/20 - Loss: 1.1150
Fold 1 - Precision: 0.1706 | Recall: 0.1711 | F1: 0.1683

==== Fold 2/5 ====
Epoch 1/20 - Loss: 1.7935
Epoch 5/20 - Loss: 1.6020
Epoch 10/20 - Loss: 1.3490
Epoch 15/20 - Loss: 1.2075
Epoch 20/20 - Loss: 1.1144
Fold 2 - Precision: 0.1699 | Recall: 0.1702 | F1: 0.1659

==== Fold 3/5 ====
Epoch 1/20 - Loss: 1.7944
Epoch 5/20 - Loss: 1.6028
Epoch 10/20 - Loss: 1.3464
Epoch 15/20 - Loss: 1.2062
Epoch 20/20 - Loss: 1.1150
Fold 3 - Precision: 0.1639 | Recall: 0.1635 | F1: 0.1627

==== Fold 4/5 ====
Epoch 1/20 - Loss: 1.7940
Epoch 5/20 - Loss: 1.5979
Epoch 10/20 - Loss: 1.3445
Epoch 15/20 - Loss: 1.2036
Epoch 20/20 - Loss: 1.1060
Fold 4 - Precision: 0.1640 | Recall: 0.1636 | F1: 0.1631

==== Fold 5/5 ====
Epoch 1/20 - Loss: 1.7935
Epoch 5/20 - Loss: 1.6039
Epoch 10/20 - Loss: 1.3539
Epoch 15/20 - Loss: 1.2159
Epoch 20/20 - Loss: 1.1246
Fold 5 

## Final training on full training set and evaluation on test set

Here we train a fresh model on the **entire training set** and then evaluate once on the held-out **test set**, printing macro-averaged metrics and a detailed classification report.


In [8]:
train_dataset_full = EmotionDataset(X_train, y_train)
test_dataset = EmotionDataset(X_test, y_test)

train_loader_full = DataLoader(train_dataset_full, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

final_model = SimpleEmotionNet(n_features, n_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(final_model.parameters(), lr=learning_rate)

for epoch in range(1, num_epochs + 1):
    loss = train_one_epoch(final_model, train_loader_full, criterion, optimizer, device)
    if epoch % 5 == 0 or epoch == 1:
        print(f"[Final Model] Epoch {epoch}/{num_epochs} - Loss: {loss:.4f}")

precision, recall, f1, y_true_test, y_pred_test = evaluate_model(final_model, test_loader, device)
print("\n==== Test Set Metrics (Macro-Averaged) ====")
print(f"Precision: {precision:.4f} | Recall: {recall:.4f} | F1: {f1:.4f}")

print("\n==== Detailed Classification Report (Test Set) ====")
print(classification_report(
    y_true_test,
    y_pred_test,
    target_names=label_encoder.classes_,
    zero_division=0,
))


[Final Model] Epoch 1/20 - Loss: 1.7934
[Final Model] Epoch 5/20 - Loss: 1.6453
[Final Model] Epoch 10/20 - Loss: 1.4431


KeyboardInterrupt: 