## Emotion Classification with PyTorch

This notebook builds a small neural network using **PyTorch** to classify a song's **emotion** from audio-related features.

We assume we start from a `pandas` DataFrame `df` with columns:

- `Popularity`
- `Energy`
- `Danceability`
- `Positiveness`
- `Speechiness`
- `Liveness`
- `Acousticness`
- `Instrumentalness`

and a target column:

- `Emotion` (categorical label)

The steps are:

1. Segment the data into **training** and **test** sets.
2. Build a small **neural network** in PyTorch.
3. Train the network using **k-fold cross-validation** on the training set.
4. Evaluate the model using **precision**, **recall**, and **F1-score**.

You can later replace the data-loading cell with your real dataset.


In [24]:
# Imports and configuration

import numpy as np
import pandas as pd

from torch.utils.tensorboard import SummaryWriter
from datetime import datetime

from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import precision_recall_fscore_support, classification_report

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Device selection
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


Using device: cpu


## Load or create data

In this cell we either:
- Load your real dataset into a `pandas` DataFrame `df`, **or**
- Create a small synthetic dataset for demo purposes.

Replace the synthetic-data block with your actual data loading when you have it.


In [34]:
# TODO: Replace this with your real data loading

FEATURE_COLS = [
    "Popularity", "Energy", "Danceability", "Positiveness",
    "Speechiness", "Liveness", "Acousticness", "Instrumentalness",
]
TARGET_COL = "Emotion"

# Synthetic example data (for demo). Remove when using real data.
num_samples = 100000
random_seed = 41
np.random.seed(random_seed)

synthetic_X = np.random.rand(num_samples, len(FEATURE_COLS))
synthetic_emotions = np.random.choice(["Happy", "Sad", "Angry", "Relaxed", "Disgusted", "Scared", "Surprised", "Neutral", "Bored", "Tense", "Nervous", "Anxious", "Depressed"], size=num_samples)

df = pd.DataFrame(synthetic_X, columns=FEATURE_COLS)
df[TARGET_COL] = synthetic_emotions

print(df.head())
print("\nClass distribution:\n", df[TARGET_COL].value_counts())


   Popularity    Energy  Danceability  Positiveness  Speechiness  Liveness  \
0    0.250924  0.046096      0.676816      0.043469     0.116424  0.603866   
1    0.917448  0.418780      0.332260      0.283034     0.186282  0.317110   
2    0.704983  0.314677      0.745282      0.398213     0.608226  0.728456   
3    0.232223  0.441665      0.373021      0.583606     0.100031  0.741352   
4    0.322892  0.642927      0.999472      0.281002     0.582225  0.872601   

   Acousticness  Instrumentalness    Emotion  
0      0.190931          0.668516  Disgusted  
1      0.481169          0.069520     Scared  
2      0.421758          0.393908  Surprised  
3      0.083198          0.126224    Nervous  
4      0.789339          0.218088      Bored  

Class distribution:
 Emotion
Tense        7844
Happy        7836
Relaxed      7806
Angry        7781
Bored        7764
Sad          7736
Anxious      7637
Neutral      7629
Scared       7621
Disgusted    7617
Depressed    7590
Surprised    7576
Ner

## Preprocessing and train/test split

Here we:

- Encode the `Emotion` labels to integers.
- Split the data into **train** and **test** sets (hold-out test set).
- Scale the features using `StandardScaler` fitted on the training data only.


In [35]:
# Encode labels
label_encoder = LabelEncoder()
df["Emotion_encoded"] = label_encoder.fit_transform(df[TARGET_COL])

X = df[FEATURE_COLS].values.astype(np.float32)
y = df["Emotion_encoded"].values.astype(np.int64)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=random_seed, stratify=y
)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

n_features = X_train.shape[1]
n_classes = len(np.unique(y))

print(f"Train shape: {X_train.shape}, Test shape: {X_test.shape}")
print(f"Number of classes: {n_classes}")


Train shape: (80000, 8), Test shape: (20000, 8)
Number of classes: 13


## Dataset and model definitions

This section defines:

- A `Dataset` wrapper (`EmotionDataset`) for PyTorch.
- A small feedforward neural network (`SimpleEmotionNet`) for emotion classification.


In [36]:
class EmotionDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.from_numpy(X).float()
        self.y = torch.from_numpy(y).long()

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]


class SimpleEmotionNet(nn.Module):
    def __init__(self, input_dim, num_classes):
        super(SimpleEmotionNet, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, 32),
            nn.ReLU(),
            nn.Linear(32, 16),
            nn.ReLU(),
            nn.Linear(16, num_classes),
        )

    def forward(self, x):
        return self.net(x)


## Training and evaluation helpers

We define helper functions to:

- Train the model for one epoch.
- Evaluate the model and compute **precision**, **recall**, and **F1-score** (macro-averaged).


In [37]:
def train_one_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0

    for X_batch, y_batch in dataloader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)

        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * X_batch.size(0)

    return running_loss / len(dataloader.dataset)


def evaluate_model(model, dataloader, device):
    model.eval()
    all_preds = []
    all_targets = []

    with torch.no_grad():
        for X_batch, y_batch in dataloader:
            X_batch = X_batch.to(device)
            outputs = model(X_batch)
            preds = torch.argmax(outputs, dim=1).cpu().numpy()
            all_preds.extend(preds)
            all_targets.extend(y_batch.numpy())

    all_preds = np.array(all_preds)
    all_targets = np.array(all_targets)

    precision, recall, f1, _ = precision_recall_fscore_support(
        all_targets, all_preds, average="macro", zero_division=0
    )

    return precision, recall, f1, all_targets, all_preds


## K-fold cross-validation on training set

We perform k-fold cross-validation on the **training** data only, to estimate generalization performance before evaluating once on the held-out test set.


In [38]:
k_folds = 5
batch_size = 32
num_epochs = 20
learning_rate = 1e-3

epochs_between_reports = 5

skf = StratifiedKFold(n_splits=k_folds, shuffle=True, random_state=42)

fold_results = []

# Tensorboard logging
log_dir = f"runs/emotion_cv_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
writer = SummaryWriter(log_dir)
hparams = {
    "learning_rate": learning_rate,
    "batch_size": batch_size,
    "k_folds": k_folds,
    "num_epochs": num_epochs,
    "epochs_between_reports": epochs_between_reports,
}
writer.add_text(
    "hparams",
    "\n".join(f"{k}: {v}" for k, v in hparams.items()),
)

for fold, (train_idx, val_idx) in enumerate(skf.split(X_train, y_train), 1):
    print(f"\n==== Fold {fold}/{k_folds} ====")

    X_tr, X_val = X_train[train_idx], X_train[val_idx]
    y_tr, y_val = y_train[train_idx], y_train[val_idx]

    train_dataset = EmotionDataset(X_tr, y_tr)
    val_dataset = EmotionDataset(X_val, y_val)

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    model = SimpleEmotionNet(n_features, n_classes).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    # Train
    for epoch in range(1, num_epochs + 1):
        loss = train_one_epoch(model, train_loader, criterion, optimizer, device)
        writer.add_scalar(f"Loss/kcv_train/fold_{fold}", loss, epoch)
        if epoch % epochs_between_reports == 0 or epoch == 1 or epoch == num_epochs:
            print(f"Epoch {epoch}/{num_epochs} - Loss: {loss:.4f}")
            precision, recall, f1, y_true_val, y_pred_val = evaluate_model(model, val_loader, device)
            writer.add_scalar(f"F1_Score/kcv_validation/fold_{fold}", f1, epoch)

    # Evaluate on validation split
    print(f"Fold {fold} - Precision: {precision:.4f} | Recall: {recall:.4f} | F1: {f1:.4f}")

    fold_results.append({
        "fold": fold,
        "precision": precision,
        "recall": recall,
        "f1": f1,
    })

print("\n==== Cross-Validation Summary (Macro-Averaged) ====")
for r in fold_results:
    print(
        f"Fold {r['fold']}: "
        f"Precision={r['precision']:.4f}, "
        f"Recall={r['recall']:.4f}, "
        f"F1={r['f1']:.4f}"
    )

mean_precision = np.mean([r["precision"] for r in fold_results])
mean_recall = np.mean([r["recall"] for r in fold_results])
mean_f1 = np.mean([r["f1"] for r in fold_results])
print(
    f"\nMean over {k_folds} folds - "
    f"Precision={mean_precision:.4f}, Recall={mean_recall:.4f}, F1={mean_f1:.4f}"
)



==== Fold 1/5 ====
Epoch 1/20 - Loss: 2.5670
Epoch 5/20 - Loss: 2.5639
Epoch 10/20 - Loss: 2.5618
Epoch 15/20 - Loss: 2.5604
Epoch 20/20 - Loss: 2.5591
Fold 1 - Precision: 0.0790 | Recall: 0.0767 | F1: 0.0585

==== Fold 2/5 ====
Epoch 1/20 - Loss: 2.5672
Epoch 5/20 - Loss: 2.5637
Epoch 10/20 - Loss: 2.5618
Epoch 15/20 - Loss: 2.5605
Epoch 20/20 - Loss: 2.5588
Fold 2 - Precision: 0.0742 | Recall: 0.0762 | F1: 0.0672

==== Fold 3/5 ====
Epoch 1/20 - Loss: 2.5667
Epoch 5/20 - Loss: 2.5634
Epoch 10/20 - Loss: 2.5615
Epoch 15/20 - Loss: 2.5603
Epoch 20/20 - Loss: 2.5591
Fold 3 - Precision: 0.0746 | Recall: 0.0768 | F1: 0.0680

==== Fold 4/5 ====
Epoch 1/20 - Loss: 2.5671
Epoch 5/20 - Loss: 2.5636
Epoch 10/20 - Loss: 2.5618
Epoch 15/20 - Loss: 2.5602
Epoch 20/20 - Loss: 2.5592
Fold 4 - Precision: 0.0704 | Recall: 0.0731 | F1: 0.0657

==== Fold 5/5 ====
Epoch 1/20 - Loss: 2.5670
Epoch 5/20 - Loss: 2.5641
Epoch 10/20 - Loss: 2.5625
Epoch 15/20 - Loss: 2.5614
Epoch 20/20 - Loss: 2.5602
Fold 5 

## Final training on full training set and evaluation on test set

Here we train a fresh model on the **entire training set** and then evaluate once on the held-out **test set**, printing macro-averaged metrics and a detailed classification report.


In [21]:
train_dataset_full = EmotionDataset(X_train, y_train)
test_dataset = EmotionDataset(X_test, y_test)

train_loader_full = DataLoader(train_dataset_full, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

final_model = SimpleEmotionNet(n_features, n_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(final_model.parameters(), lr=learning_rate)

for epoch in range(1, num_epochs + 1):
    loss = train_one_epoch(final_model, train_loader_full, criterion, optimizer, device)
    if epoch % 5 == 0 or epoch == 1:
        print(f"[Final Model] Epoch {epoch}/{num_epochs} - Loss: {loss:.4f}")

precision, recall, f1, y_true_test, y_pred_test = evaluate_model(final_model, test_loader, device)
print("\n==== Test Set Metrics (Macro-Averaged) ====")
print(f"Precision: {precision:.4f} | Recall: {recall:.4f} | F1: {f1:.4f}")

print("\n==== Detailed Classification Report (Test Set) ====")
print(classification_report(
    y_true_test,
    y_pred_test,
    target_names=label_encoder.classes_,
    zero_division=0,
))


[Final Model] Epoch 1/20 - Loss: 2.5681
[Final Model] Epoch 5/20 - Loss: 2.5611
[Final Model] Epoch 10/20 - Loss: 2.5563
[Final Model] Epoch 15/20 - Loss: 2.5518
[Final Model] Epoch 20/20 - Loss: 2.5479

==== Test Set Metrics (Macro-Averaged) ====
Precision: 0.0770 | Recall: 0.0756 | F1: 0.0678

==== Detailed Classification Report (Test Set) ====
              precision    recall  f1-score   support

       Angry       0.06      0.03      0.04       300
     Anxious       0.08      0.07      0.08       321
       Bored       0.12      0.05      0.08       313
   Depressed       0.06      0.04      0.05       315
   Disgusted       0.09      0.20      0.12       307
       Happy       0.07      0.04      0.05       305
     Nervous       0.09      0.04      0.05       305
     Neutral       0.08      0.09      0.08       305
     Relaxed       0.08      0.14      0.10       297
         Sad       0.06      0.17      0.09       318
      Scared       0.06      0.03      0.04       304
  