# Phase 4.6: Control Experiment - The Impact of Data Balancing

**Objective:** This notebook serves as a crucial scientific control experiment (an ablation study). After achieving peak performance with our "Ultimate Generalist" model (v5) which was trained on a balanced dataset, we now test the hypothesis that data balancing was a key factor in its success.

To do this, we will run the exact same advanced training regimen, but on the original, **unbalanced** dataset.

In [1]:
# Step 1: Connect to Google Drive to access your files
from google.colab import drive
drive.mount('/content/drive')

# Step 2: Install all the necessary special libraries for our project
# This single line installs everything we need.
!pip install librosa audiomentations pandas seaborn matplotlib tqdm

Mounted at /content/drive
Collecting audiomentations
  Downloading audiomentations-0.42.0-py3-none-any.whl.metadata (11 kB)
Collecting numpy-minmax<1,>=0.3.0 (from audiomentations)
  Downloading numpy_minmax-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Collecting numpy-rms<1,>=0.4.2 (from audiomentations)
  Downloading numpy_rms-0.6.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.5 kB)
Collecting python-stretch<1,>=0.3.1 (from audiomentations)
  Downloading python_stretch-0.3.1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Downloading audiomentations-0.42.0-py3-none-any.whl (86 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.5/86.5 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy_minmax-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux201

In [2]:
import os

# The path we expect the files to be in
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/processed_spectrograms_final/"

print(f"Checking for folder at: {SPECTROGRAM_PATH}")

try:
    # Get a list of all files in that directory
    all_npy_files = os.listdir(SPECTROGRAM_PATH)

    print(f"\n✅ Success! Found the folder.")
    print(f"Total files found in the folder: {len(all_npy_files)}")

    if len(all_npy_files) > 0:
        print("\nHere are the first 10 filenames found:")
        # Print the first 10 filenames for us to inspect
        for filename in sorted(all_npy_files)[:10]:
            print(filename)
    else:
        print("\nWARNING: The folder is empty!")

except FileNotFoundError:
    print(f"\n❌ ERROR: The folder '{SPECTrogram_PATH}' does not exist.")
    print("Please double-check that you uploaded the folder and that the name is spelled exactly correct.")

Checking for folder at: /content/drive/MyDrive/ser_project/processed_spectrograms_final/

✅ Success! Found the folder.
Total files found in the folder: 8882

Here are the first 10 filenames found:
03-01-01-01-01-01-01.npy
03-01-01-01-01-01-02.npy
03-01-01-01-01-01-03.npy
03-01-01-01-01-01-04.npy
03-01-01-01-01-01-05.npy
03-01-01-01-01-01-06.npy
03-01-01-01-01-01-07.npy
03-01-01-01-01-01-08.npy
03-01-01-01-01-01-09.npy
03-01-01-01-01-01-10.npy


## Part 1: Training on an Unbalanced Dataset

The core of this experiment is a single, deliberate change from our champion v5 model:

* **Data Strategy:** Instead of loading our balanced data splits, we are loading the `_unbalanced.pkl` file lists. This means the model will see a natural distribution of data, which is heavily skewed towards the larger CREMA-D dataset.

All other advanced techniques (`SpecAugment`, `CosineAnnealingLR` scheduler, `ResNet18` architecture) are kept identical to ensure a fair, apples-to-apples comparison.

In [3]:
# ===================================================================
# ULTIMATE COLAB SCRIPT v8: The Advanced Generalist Trainer (FINAL PATH FIX 3)
# ===================================================================
import torch, torch.nn as nn, torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import os, numpy as np, pickle
from sklearn.metrics import accuracy_score, classification_report
from tqdm import tqdm
from torch.optim.lr_scheduler import CosineAnnealingLR
from torchvision import models
from torchvision import transforms

# --- Configuration ---
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/processed_spectrograms_final/"
FILE_LIST_PATH = "/content/drive/MyDrive/ser_project/"
LEARNING_RATE = 0.001; BATCH_SIZE = 64; EPOCHS = 40
CHECKPOINT_BEST_PATH = "/content/drive/MyDrive/ser_project/resnet_advanced_best.pth"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); print(f"Using device: {device}")

# --- Mappings ---
unified_emotion_map = { "neutral": 0, "happy": 1, "sad": 2, "angry": 3, "fearful": 4, "disgust": 5 }
unified_emotion_labels = ["neutral", "happy", "sad", "angry", "fearful", "disgust"]

# --- SpecAugment Transformation Pipeline ---
spec_augment_transform = transforms.Compose([
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.05), ratio=(0.2, 5.0), value=0),
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.08), ratio=(0.01, 0.2), value=0),
])

# --- Helper function to get filename robustly ---
def get_basename(path):
    return path.replace('\\', '/').split('/')[-1]

# --- Dataset Class ---
class SpectrogramDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300):
        self.file_paths, self.labels, self.target_width = file_paths, labels, target_width
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        # Use our new robust function to get the base filename
        filename = get_basename(self.file_paths[idx]).replace('.wav', '.npy')
        file_path = os.path.join(SPECTROGRAM_PATH, filename)
        label = self.labels[idx]
        spectrogram = np.load(file_path)
        current_width = spectrogram.shape[1]
        if current_width < self.target_width: spectrogram = np.pad(spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: spectrogram = spectrogram[:, :self.target_width]
        spec_min, spec_max = spectrogram.min(), spectrogram.max()
        if spec_max > spec_min: spectrogram = (spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([spectrogram, spectrogram, spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

# --- Prepare Data ---
print("Loading pre-defined and balanced data splits...")
with open(os.path.join(FILE_LIST_PATH, 'train_files_unbalanced.pkl'), 'rb') as f: train_files_raw = pickle.load(f)
with open(os.path.join(FILE_LIST_PATH, 'val_files_unbalanced.pkl'), 'rb') as f: val_files_raw = pickle.load(f)
with open(os.path.join(FILE_LIST_PATH, 'test_files_unbalanced.pkl'), 'rb') as f: test_files_raw = pickle.load(f)

print("Verifying that all spectrogram files exist...")
def verify_and_filter_files(file_list_raw):
    verified_files = []
    for f_path in file_list_raw:
        # Use our new robust function here as well
        npy_filename = get_basename(f_path).replace('.wav', '.npy')
        full_npy_path = os.path.join(SPECTROGRAM_PATH, npy_filename)
        if os.path.exists(full_npy_path):
            # We keep the original path in the list for the label getter
            verified_files.append(f_path)
    skipped_count = len(file_list_raw) - len(verified_files)
    return verified_files, skipped_count

train_files, train_skipped = verify_and_filter_files(train_files_raw)
val_files, val_skipped = verify_and_filter_files(val_files_raw)
test_files, test_skipped = verify_and_filter_files(test_files_raw)

print(f"Train set: {len(train_files)} files found, {train_skipped} skipped.")
print(f"Validation set: {len(val_files)} files found, {val_skipped} skipped.")
print(f"Test set: {len(test_files)} files found, {test_skipped} skipped.")

# Create label lists
ravdess_map = { "01": "neutral", "03": "happy", "04": "sad", "05": "angry", "06": "fearful", "07": "disgust" }
crema_d_map = { "NEU": "neutral", "HAP": "happy", "SAD": "sad", "ANG": "angry", "FEA": "fearful", "DIS": "disgust" }
def get_label_from_path(filepath):
    filename = get_basename(filepath)
    try:
        if '03-01' in filename: return unified_emotion_map[ravdess_map[filename.split("-")[2]]]
        else: return unified_emotion_map[crema_d_map[filename.split("_")[2]]]
    except (IndexError, KeyError): return None

train_labels = [get_label_from_path(f) for f in train_files]; val_labels = [get_label_from_path(f) for f in val_files]; test_labels = [get_label_from_path(f) for f in test_files]

train_dataset = SpectrogramDataset(train_files, train_labels); val_dataset = SpectrogramDataset(val_files, val_labels); test_dataset = SpectrogramDataset(test_files, test_labels)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2); val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2); test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)

# --- Train the Model ---
model = models.resnet18(weights='IMAGENET1K_V1'); model.fc = nn.Linear(model.fc.in_features, len(unified_emotion_labels)); model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE); criterion = nn.CrossEntropyLoss()
scheduler = CosineAnnealingLR(optimizer, T_max=EPOCHS)
best_val_acc = 0.0
print("Starting advanced training with SpecAugment...")
for epoch in range(EPOCHS):
    model.train(); running_loss = 0.0
    for inputs, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{EPOCHS} [Train]"):
        inputs, labels = inputs.to(device), labels.to(device)
        inputs = spec_augment_transform(inputs)
        optimizer.zero_grad(); outputs = model(inputs); loss = criterion(outputs, labels)
        loss.backward(); optimizer.step(); running_loss += loss.item() * inputs.size(0)
    train_loss = running_loss / len(train_dataset)
    model.eval(); val_loss = 0.0; correct = 0; total = 0
    with torch.no_grad():
        for inputs, labels in tqdm(val_loader, desc=f"Epoch {epoch+1}/{EPOCHS} [Val]"):
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs); loss = criterion(outputs, labels); val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs.data, 1); total += labels.size(0); correct += (predicted == labels).sum().item()
    val_accuracy = 100 * correct / total; val_loss /= len(val_dataset)
    print(f"Epoch {epoch+1}/{EPOCHS} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Acc: {val_accuracy:.2f}%")
    if val_accuracy > best_val_acc:
        best_val_acc = val_accuracy
        print(f"🎉 New best validation accuracy: {best_val_acc:.2f}%. Saving model...")
        torch.save({'model_state_dict': model.state_dict()}, CHECKPOINT_BEST_PATH)
    scheduler.step()

# --- Final Evaluation ---
print("\n--- FINAL EVALUATION OF ADVANCED GENERALIST MODEL ---")
print(f"Loading best model (from epoch with {best_val_acc:.2f}% validation accuracy) for final testing...")
best_checkpoint = torch.load(CHECKPOINT_BEST_PATH); model.load_state_dict(best_checkpoint['model_state_dict']); model.eval()
all_preds, all_true = [], []
with torch.no_grad():
    for inputs, labels in tqdm(test_loader, desc="Final Evaluation"):
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs); _, preds = torch.max(outputs, 1); all_preds.extend(preds.cpu().numpy()); all_true.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_true, all_preds)
print(f"\nFinal Advanced Generalist Model Accuracy on the Test Set: {accuracy * 100:.2f}%")
print("\nClassification Report:"); print(classification_report(all_true, all_preds, target_names=unified_emotion_labels, zero_division=0))

Using device: cuda
Loading pre-defined and balanced data splits...
Verifying that all spectrogram files exist...
Train set: 6500 files found, 0 skipped.
Validation set: 723 files found, 0 skipped.
Test set: 1275 files found, 0 skipped.
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


100%|██████████| 44.7M/44.7M [00:00<00:00, 189MB/s]


Starting advanced training with SpecAugment...


Epoch 1/40 [Train]: 100%|██████████| 102/102 [03:23<00:00,  1.99s/it]
Epoch 1/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.01it/s]


Epoch 1/40 | Train Loss: 1.3750 | Val Loss: 1.7702 | Val Acc: 34.30%
🎉 New best validation accuracy: 34.30%. Saving model...


Epoch 2/40 [Train]: 100%|██████████| 102/102 [00:25<00:00,  3.93it/s]
Epoch 2/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.23it/s]


Epoch 2/40 | Train Loss: 1.1496 | Val Loss: 4.8484 | Val Acc: 20.89%


Epoch 3/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.43it/s]
Epoch 3/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.26it/s]


Epoch 3/40 | Train Loss: 1.0465 | Val Loss: 2.1026 | Val Acc: 36.65%
🎉 New best validation accuracy: 36.65%. Saving model...


Epoch 4/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.18it/s]
Epoch 4/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.31it/s]


Epoch 4/40 | Train Loss: 0.9581 | Val Loss: 1.3923 | Val Acc: 52.84%
🎉 New best validation accuracy: 52.84%. Saving model...


Epoch 5/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.15it/s]
Epoch 5/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.60it/s]


Epoch 5/40 | Train Loss: 0.8403 | Val Loss: 1.3017 | Val Acc: 52.84%


Epoch 6/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.48it/s]
Epoch 6/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.17it/s]


Epoch 6/40 | Train Loss: 0.7088 | Val Loss: 1.7608 | Val Acc: 48.41%


Epoch 7/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.53it/s]
Epoch 7/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.08it/s]


Epoch 7/40 | Train Loss: 0.6122 | Val Loss: 2.5896 | Val Acc: 37.21%


Epoch 8/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.53it/s]
Epoch 8/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.01it/s]


Epoch 8/40 | Train Loss: 0.5002 | Val Loss: 1.8277 | Val Acc: 47.30%


Epoch 9/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.35it/s]
Epoch 9/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.03it/s]


Epoch 9/40 | Train Loss: 0.3765 | Val Loss: 2.0300 | Val Acc: 48.82%


Epoch 10/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.33it/s]
Epoch 10/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.10it/s]


Epoch 10/40 | Train Loss: 0.2830 | Val Loss: 2.7965 | Val Acc: 43.71%


Epoch 11/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.34it/s]
Epoch 11/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.02it/s]


Epoch 11/40 | Train Loss: 0.1993 | Val Loss: 3.1540 | Val Acc: 44.26%


Epoch 12/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.35it/s]
Epoch 12/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.16it/s]


Epoch 12/40 | Train Loss: 0.2137 | Val Loss: 2.9890 | Val Acc: 46.89%


Epoch 13/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.31it/s]
Epoch 13/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.14it/s]


Epoch 13/40 | Train Loss: 0.1449 | Val Loss: 2.0428 | Val Acc: 57.68%
🎉 New best validation accuracy: 57.68%. Saving model...


Epoch 14/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.10it/s]
Epoch 14/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.01it/s]


Epoch 14/40 | Train Loss: 0.0870 | Val Loss: 1.7184 | Val Acc: 59.61%
🎉 New best validation accuracy: 59.61%. Saving model...


Epoch 15/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.10it/s]
Epoch 15/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.21it/s]


Epoch 15/40 | Train Loss: 0.1001 | Val Loss: 2.1175 | Val Acc: 55.74%


Epoch 16/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.30it/s]
Epoch 16/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.11it/s]


Epoch 16/40 | Train Loss: 0.0706 | Val Loss: 1.7621 | Val Acc: 61.55%
🎉 New best validation accuracy: 61.55%. Saving model...


Epoch 17/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.09it/s]
Epoch 17/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.06it/s]


Epoch 17/40 | Train Loss: 0.0491 | Val Loss: 2.1024 | Val Acc: 55.60%


Epoch 18/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.35it/s]
Epoch 18/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.17it/s]


Epoch 18/40 | Train Loss: 0.0430 | Val Loss: 1.9732 | Val Acc: 60.44%


Epoch 19/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.44it/s]
Epoch 19/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.43it/s]


Epoch 19/40 | Train Loss: 0.0512 | Val Loss: 1.6761 | Val Acc: 62.52%
🎉 New best validation accuracy: 62.52%. Saving model...


Epoch 20/40 [Train]: 100%|██████████| 102/102 [00:25<00:00,  4.08it/s]
Epoch 20/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.81it/s]


Epoch 20/40 | Train Loss: 0.0531 | Val Loss: 1.9814 | Val Acc: 57.81%


Epoch 21/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.49it/s]
Epoch 21/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.21it/s]


Epoch 21/40 | Train Loss: 0.0342 | Val Loss: 1.8638 | Val Acc: 60.30%


Epoch 22/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.56it/s]
Epoch 22/40 [Val]: 100%|██████████| 12/12 [00:03<00:00,  3.95it/s]


Epoch 22/40 | Train Loss: 0.0211 | Val Loss: 1.8742 | Val Acc: 61.41%


Epoch 23/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.54it/s]
Epoch 23/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.38it/s]


Epoch 23/40 | Train Loss: 0.0144 | Val Loss: 1.7527 | Val Acc: 62.79%
🎉 New best validation accuracy: 62.79%. Saving model...


Epoch 24/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.19it/s]
Epoch 24/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.77it/s]


Epoch 24/40 | Train Loss: 0.0183 | Val Loss: 1.7898 | Val Acc: 63.35%
🎉 New best validation accuracy: 63.35%. Saving model...


Epoch 25/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.15it/s]
Epoch 25/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.38it/s]


Epoch 25/40 | Train Loss: 0.0127 | Val Loss: 1.6808 | Val Acc: 65.01%
🎉 New best validation accuracy: 65.01%. Saving model...


Epoch 26/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.15it/s]
Epoch 26/40 [Val]: 100%|██████████| 12/12 [00:03<00:00,  3.96it/s]


Epoch 26/40 | Train Loss: 0.0114 | Val Loss: 1.7095 | Val Acc: 61.55%


Epoch 27/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.46it/s]
Epoch 27/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.71it/s]


Epoch 27/40 | Train Loss: 0.0103 | Val Loss: 1.7124 | Val Acc: 63.62%


Epoch 28/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.39it/s]
Epoch 28/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.06it/s]


Epoch 28/40 | Train Loss: 0.0089 | Val Loss: 1.7997 | Val Acc: 63.21%


Epoch 29/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.32it/s]
Epoch 29/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.16it/s]


Epoch 29/40 | Train Loss: 0.0069 | Val Loss: 1.8133 | Val Acc: 63.07%


Epoch 30/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.36it/s]
Epoch 30/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.15it/s]


Epoch 30/40 | Train Loss: 0.0059 | Val Loss: 1.7689 | Val Acc: 64.45%


Epoch 31/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.34it/s]
Epoch 31/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.25it/s]


Epoch 31/40 | Train Loss: 0.0057 | Val Loss: 1.7713 | Val Acc: 62.66%


Epoch 32/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.37it/s]
Epoch 32/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.21it/s]


Epoch 32/40 | Train Loss: 0.0076 | Val Loss: 1.7794 | Val Acc: 64.45%


Epoch 33/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.31it/s]
Epoch 33/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.22it/s]


Epoch 33/40 | Train Loss: 0.0063 | Val Loss: 1.7481 | Val Acc: 64.45%


Epoch 34/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.38it/s]
Epoch 34/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  5.27it/s]


Epoch 34/40 | Train Loss: 0.0051 | Val Loss: 1.7796 | Val Acc: 63.49%


Epoch 35/40 [Train]: 100%|██████████| 102/102 [00:23<00:00,  4.41it/s]
Epoch 35/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.83it/s]


Epoch 35/40 | Train Loss: 0.0071 | Val Loss: 1.7725 | Val Acc: 64.32%


Epoch 36/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.55it/s]
Epoch 36/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.26it/s]


Epoch 36/40 | Train Loss: 0.0046 | Val Loss: 1.7269 | Val Acc: 64.45%


Epoch 37/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.56it/s]
Epoch 37/40 [Val]: 100%|██████████| 12/12 [00:03<00:00,  3.88it/s]


Epoch 37/40 | Train Loss: 0.0070 | Val Loss: 1.7555 | Val Acc: 65.15%
🎉 New best validation accuracy: 65.15%. Saving model...


Epoch 38/40 [Train]: 100%|██████████| 102/102 [00:24<00:00,  4.22it/s]
Epoch 38/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.18it/s]


Epoch 38/40 | Train Loss: 0.0051 | Val Loss: 1.7656 | Val Acc: 64.59%


Epoch 39/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.52it/s]
Epoch 39/40 [Val]: 100%|██████████| 12/12 [00:03<00:00,  3.83it/s]


Epoch 39/40 | Train Loss: 0.0043 | Val Loss: 1.7094 | Val Acc: 64.87%


Epoch 40/40 [Train]: 100%|██████████| 102/102 [00:22<00:00,  4.51it/s]
Epoch 40/40 [Val]: 100%|██████████| 12/12 [00:02<00:00,  4.08it/s]


Epoch 40/40 | Train Loss: 0.0051 | Val Loss: 1.7014 | Val Acc: 65.15%

--- FINAL EVALUATION OF ADVANCED GENERALIST MODEL ---
Loading best model (from epoch with 65.15% validation accuracy) for final testing...


Final Evaluation: 100%|██████████| 20/20 [00:04<00:00,  4.67it/s]


Final Advanced Generalist Model Accuracy on the Test Set: 66.75%

Classification Report:
              precision    recall  f1-score   support

     neutral       0.71      0.77      0.74       177
       happy       0.67      0.69      0.68       214
         sad       0.65      0.58      0.61       236
       angry       0.72      0.76      0.74       217
     fearful       0.65      0.60      0.62       215
     disgust       0.61      0.63      0.62       216

    accuracy                           0.67      1275
   macro avg       0.67      0.67      0.67      1275
weighted avg       0.67      0.67      0.67      1275






## Part 2: The Verdict - Comparing Balanced vs. Unbalanced

Here, we evaluate the model trained on unbalanced data. We test its performance on the RAVDESS and CREMA-D domains separately. The key goal is to compare these results directly against the champion v5 model's scores to quantify the impact of data balancing.

In [5]:
# ===================================================================
# FINAL SCRIPT v2: Evaluating the Generalist Model (with Path Fix)
# ===================================================================
import torch, torch.nn as nn, os, numpy as np, pickle
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import accuracy_score, classification_report
from tqdm import tqdm
from torchvision import models

# --- Configuration ---
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/processed_spectrograms_final/"
FILE_LIST_PATH = "/content/drive/MyDrive/ser_project/"
CHECKPOINT_BEST_PATH = "/content/drive/MyDrive/ser_project/resnet_advanced_best.pth"
BATCH_SIZE = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); print(f"Using device: {device}")

# --- Mappings and Dataset Class ---
unified_emotion_map = { "neutral": 0, "happy": 1, "sad": 2, "angry": 3, "fearful": 4, "disgust": 5 }
unified_emotion_labels = ["neutral", "happy", "sad", "angry", "fearful", "disgust"]

def get_basename(path): # Robust way to get filename
    return path.replace('\\', '/').split('/')[-1]

class SpectrogramDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300):
        self.file_paths, self.labels, self.target_width = file_paths, labels, target_width
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        filename = get_basename(self.file_paths[idx]).replace('.wav', '.npy')
        file_path = os.path.join(SPECTROGRAM_PATH, filename)
        label = self.labels[idx]
        spectrogram = np.load(file_path)
        current_width = spectrogram.shape[1]
        if current_width < self.target_width: spectrogram = np.pad(spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: spectrogram = spectrogram[:, :self.target_width]
        spec_min, spec_max = spectrogram.min(), spectrogram.max()
        if spec_max > spec_min: spectrogram = (spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([spectrogram, spectrogram, spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

# --- Load the Best Trained Model ---
print("Loading the best 'Ultimate Generalist' model...")
model = models.resnet18(); model.fc = nn.Linear(model.fc.in_features, len(unified_emotion_labels));
best_checkpoint = torch.load(CHECKPOINT_BEST_PATH); model.load_state_dict(best_checkpoint['model_state_dict']);
model = model.to(device)
model.eval()

# --- Load the test set file list ---
print("Loading the test set data split...")
with open(os.path.join(FILE_LIST_PATH, 'test_files_unbalanced.pkl'), 'rb') as f: test_files_raw = pickle.load(f)

# --- THIS IS THE CRUCIAL FIX ---
# Normalize Windows paths ('\') to Linux paths ('/')
test_files = [p.replace('\\', '/') for p in test_files_raw]
print("File paths normalized for Linux environment.")

# --- Create the label list for the test set ---
ravdess_map = { "01": "neutral", "03": "happy", "04": "sad", "05": "angry", "06": "fearful", "07": "disgust" }
crema_d_map = { "NEU": "neutral", "HAP": "happy", "SAD": "sad", "ANG": "angry", "FEA": "fearful", "DIS": "disgust" }
def get_label(filepath):
    filename = get_basename(filepath)
    try:
        if '03-01' in filename: return unified_emotion_map[ravdess_map[filename.split("-")[2]]]
        else: return unified_emotion_map[crema_d_map[filename.split("_")[2]]]
    except (IndexError, KeyError): return None
test_labels = [get_label(f) for f in test_files]

# Filter out any files that might have failed label parsing
valid_indices = [i for i, lbl in enumerate(test_labels) if lbl is not None]
test_files = [test_files[i] for i in valid_indices]
test_labels = [test_labels[i] for i in valid_indices]

# --- Filter the test set for each dataset ---
ravdess_test_files = [f for f in test_files if 'ravdess_data' in f.lower()]
ravdess_test_labels = [l for i, l in enumerate(test_labels) if 'ravdess_data' in test_files[i].lower()]

crema_d_test_files = [f for f in test_files if 'crema_d_data' in f.lower()]
crema_d_test_labels = [l for i, l in enumerate(test_labels) if 'crema_d_data' in test_files[i].lower()]

# --- Evaluation Function ---
def evaluate(files, labels, name):
    dataset = SpectrogramDataset(files, labels)
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=False)
    all_preds, all_true = [], []
    with torch.no_grad():
        for inputs, labs in tqdm(loader, desc=f"Evaluating on {name}"):
            inputs, labs = inputs.to(device), labs.to(device)
            outputs = model(inputs); _, preds = torch.max(outputs, 1); all_preds.extend(preds.cpu().numpy()); all_true.extend(labs.cpu().numpy())
    accuracy = accuracy_score(all_true, all_preds)
    print(f"\n>>> Accuracy on {name}: {accuracy * 100:.2f}%")
    print(f"Classification Report for {name}:"); print(classification_report(all_true, all_preds, target_names=unified_emotion_labels, zero_division=0))

# --- Run the Final Evaluations ---
if ravdess_test_files:
    evaluate(ravdess_test_files, ravdess_test_labels, "RAVDESS Test Set")
if crema_d_test_files:
    evaluate(crema_d_test_files, crema_d_test_labels, "CREMA-D Test Set")

Using device: cuda
Loading the best 'Ultimate Generalist' model...
Loading the test set data split...
File paths normalized for Linux environment.


Evaluating on RAVDESS Test Set: 100%|██████████| 3/3 [00:00<00:00,  3.66it/s]



>>> Accuracy on RAVDESS Test Set: 82.48%
Classification Report for RAVDESS Test Set:
              precision    recall  f1-score   support

     neutral       0.84      0.94      0.89        17
       happy       0.83      0.89      0.86        27
         sad       0.79      0.73      0.76        26
       angry       0.87      0.72      0.79        18
     fearful       0.83      0.86      0.84        22
     disgust       0.81      0.81      0.81        27

    accuracy                           0.82       137
   macro avg       0.83      0.83      0.83       137
weighted avg       0.82      0.82      0.82       137



Evaluating on CREMA-D Test Set: 100%|██████████| 18/18 [00:04<00:00,  3.84it/s]


>>> Accuracy on CREMA-D Test Set: 64.85%
Classification Report for CREMA-D Test Set:
              precision    recall  f1-score   support

     neutral       0.69      0.76      0.72       160
       happy       0.64      0.66      0.65       187
         sad       0.63      0.56      0.59       210
       angry       0.71      0.77      0.74       199
     fearful       0.63      0.56      0.59       193
     disgust       0.58      0.61      0.60       189

    accuracy                           0.65      1138
   macro avg       0.65      0.65      0.65      1138
weighted avg       0.65      0.65      0.65      1138




