# Phase 4.5: The Ultimate Generalist - Achieving Peak Performance

**Objective:** Building on the success of our first Generalist model, the goal of this phase is to create the "Ultimate Generalist" by incorporating several state-of-the-art techniques to maximize performance and robustness across both domains.

In [1]:
# Step 1: Connect to Google Drive to access your files
from google.colab import drive
drive.mount('/content/drive')

# Step 2: Install all the necessary special libraries for our project
# This single line installs everything we need.
!pip install librosa audiomentations pandas seaborn matplotlib tqdm

Mounted at /content/drive
Collecting audiomentations
  Downloading audiomentations-0.42.0-py3-none-any.whl.metadata (11 kB)
Collecting numpy-minmax<1,>=0.3.0 (from audiomentations)
  Downloading numpy_minmax-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Collecting numpy-rms<1,>=0.4.2 (from audiomentations)
  Downloading numpy_rms-0.6.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.5 kB)
Collecting python-stretch<1,>=0.3.1 (from audiomentations)
  Downloading python_stretch-0.3.1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Downloading audiomentations-0.42.0-py3-none-any.whl (86 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.5/86.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy_minmax-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux201

In [2]:
import os

# The path we expect the files to be in
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/processed_spectrograms_final/"

print(f"Checking for folder at: {SPECTROGRAM_PATH}")

try:
    # Get a list of all files in that directory
    all_npy_files = os.listdir(SPECTROGRAM_PATH)

    print(f"\n✅ Success! Found the folder.")
    print(f"Total files found in the folder: {len(all_npy_files)}")

    if len(all_npy_files) > 0:
        print("\nHere are the first 10 filenames found:")
        # Print the first 10 filenames for us to inspect
        for filename in sorted(all_npy_files)[:10]:
            print(filename)
    else:
        print("\nWARNING: The folder is empty!")

except FileNotFoundError:
    print(f"\n❌ ERROR: The folder '{SPECTrogram_PATH}' does not exist.")
    print("Please double-check that you uploaded the folder and that the name is spelled exactly correct.")

Checking for folder at: /content/drive/MyDrive/ser_project/processed_spectrograms_final/

✅ Success! Found the folder.
Total files found in the folder: 8882

Here are the first 10 filenames found:
03-01-01-01-01-01-01.npy
03-01-01-01-01-01-02.npy
03-01-01-01-01-01-03.npy
03-01-01-01-01-01-04.npy
03-01-01-01-01-01-05.npy
03-01-01-01-01-01-06.npy
03-01-01-01-01-01-07.npy
03-01-01-01-01-01-08.npy
03-01-01-01-01-01-09.npy
03-01-01-01-01-01-10.npy


## Part 1: The Advanced Training Regimen

This script implements our most powerful training strategy. We introduce three key upgrades over the previous generalist model:

1.  **Balanced Dataset:** We now load our data from pre-defined, balanced file lists. This ensures the model sees a proportional number of examples for each emotion, preventing bias and improving performance on under-represented classes.
2.  **SpecAugment:** We apply a powerful data augmentation technique that randomly masks sections of time and frequency in our spectrograms. This forces the model to learn more robust and resilient features.
3.  **Advanced Scheduler:** We use a `CosineAnnealingLR` scheduler, which intelligently adjusts the learning rate during training to help the model converge to a better final solution.

In [3]:
# ===================================================================
# ULTIMATE COLAB SCRIPT v8: The Advanced Generalist Trainer (FINAL PATH FIX 3)
# ===================================================================
import torch, torch.nn as nn, torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import os, numpy as np, pickle
from sklearn.metrics import accuracy_score, classification_report
from tqdm import tqdm
from torch.optim.lr_scheduler import CosineAnnealingLR
from torchvision import models
from torchvision import transforms

# --- Configuration ---
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/processed_spectrograms_final/"
FILE_LIST_PATH = "/content/drive/MyDrive/ser_project/"
LEARNING_RATE = 0.001; BATCH_SIZE = 64; EPOCHS = 40
CHECKPOINT_BEST_PATH = "/content/drive/MyDrive/ser_project/resnet_advanced_best.pth"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); print(f"Using device: {device}")

# --- Mappings ---
unified_emotion_map = { "neutral": 0, "happy": 1, "sad": 2, "angry": 3, "fearful": 4, "disgust": 5 }
unified_emotion_labels = ["neutral", "happy", "sad", "angry", "fearful", "disgust"]

# --- SpecAugment Transformation Pipeline ---
spec_augment_transform = transforms.Compose([
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.05), ratio=(0.2, 5.0), value=0),
    transforms.RandomErasing(p=0.5, scale=(0.02, 0.08), ratio=(0.01, 0.2), value=0),
])

# --- Helper function to get filename robustly ---
def get_basename(path):
    return path.replace('\\', '/').split('/')[-1]

# --- Dataset Class ---
class SpectrogramDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300):
        self.file_paths, self.labels, self.target_width = file_paths, labels, target_width
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        # Use our new robust function to get the base filename
        filename = get_basename(self.file_paths[idx]).replace('.wav', '.npy')
        file_path = os.path.join(SPECTROGRAM_PATH, filename)
        label = self.labels[idx]
        spectrogram = np.load(file_path)
        current_width = spectrogram.shape[1]
        if current_width < self.target_width: spectrogram = np.pad(spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: spectrogram = spectrogram[:, :self.target_width]
        spec_min, spec_max = spectrogram.min(), spectrogram.max()
        if spec_max > spec_min: spectrogram = (spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([spectrogram, spectrogram, spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

# --- Prepare Data ---
print("Loading pre-defined and balanced data splits...")
with open(os.path.join(FILE_LIST_PATH, 'train_files.pkl'), 'rb') as f: train_files_raw = pickle.load(f)
with open(os.path.join(FILE_LIST_PATH, 'val_files.pkl'), 'rb') as f: val_files_raw = pickle.load(f)
with open(os.path.join(FILE_LIST_PATH, 'test_files.pkl'), 'rb') as f: test_files_raw = pickle.load(f)

print("Verifying that all spectrogram files exist...")
def verify_and_filter_files(file_list_raw):
    verified_files = []
    for f_path in file_list_raw:
        # Use our new robust function here as well
        npy_filename = get_basename(f_path).replace('.wav', '.npy')
        full_npy_path = os.path.join(SPECTROGRAM_PATH, npy_filename)
        if os.path.exists(full_npy_path):
            # We keep the original path in the list for the label getter
            verified_files.append(f_path)
    skipped_count = len(file_list_raw) - len(verified_files)
    return verified_files, skipped_count

train_files, train_skipped = verify_and_filter_files(train_files_raw)
val_files, val_skipped = verify_and_filter_files(val_files_raw)
test_files, test_skipped = verify_and_filter_files(test_files_raw)

print(f"Train set: {len(train_files)} files found, {train_skipped} skipped.")
print(f"Validation set: {len(val_files)} files found, {val_skipped} skipped.")
print(f"Test set: {len(test_files)} files found, {test_skipped} skipped.")

# Create label lists
ravdess_map = { "01": "neutral", "03": "happy", "04": "sad", "05": "angry", "06": "fearful", "07": "disgust" }
crema_d_map = { "NEU": "neutral", "HAP": "happy", "SAD": "sad", "ANG": "angry", "FEA": "fearful", "DIS": "disgust" }
def get_label_from_path(filepath):
    filename = get_basename(filepath)
    try:
        if '03-01' in filename: return unified_emotion_map[ravdess_map[filename.split("-")[2]]]
        else: return unified_emotion_map[crema_d_map[filename.split("_")[2]]]
    except (IndexError, KeyError): return None

train_labels = [get_label_from_path(f) for f in train_files]; val_labels = [get_label_from_path(f) for f in val_files]; test_labels = [get_label_from_path(f) for f in test_files]

train_dataset = SpectrogramDataset(train_files, train_labels); val_dataset = SpectrogramDataset(val_files, val_labels); test_dataset = SpectrogramDataset(test_files, test_labels)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2); val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2); test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)

# --- Train the Model ---
model = models.resnet18(weights='IMAGENET1K_V1'); model.fc = nn.Linear(model.fc.in_features, len(unified_emotion_labels)); model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE); criterion = nn.CrossEntropyLoss()
scheduler = CosineAnnealingLR(optimizer, T_max=EPOCHS)
best_val_acc = 0.0
print("Starting advanced training with SpecAugment...")
for epoch in range(EPOCHS):
    model.train(); running_loss = 0.0
    for inputs, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{EPOCHS} [Train]"):
        inputs, labels = inputs.to(device), labels.to(device)
        inputs = spec_augment_transform(inputs)
        optimizer.zero_grad(); outputs = model(inputs); loss = criterion(outputs, labels)
        loss.backward(); optimizer.step(); running_loss += loss.item() * inputs.size(0)
    train_loss = running_loss / len(train_dataset)
    model.eval(); val_loss = 0.0; correct = 0; total = 0
    with torch.no_grad():
        for inputs, labels in tqdm(val_loader, desc=f"Epoch {epoch+1}/{EPOCHS} [Val]"):
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs); loss = criterion(outputs, labels); val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs.data, 1); total += labels.size(0); correct += (predicted == labels).sum().item()
    val_accuracy = 100 * correct / total; val_loss /= len(val_dataset)
    print(f"Epoch {epoch+1}/{EPOCHS} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Acc: {val_accuracy:.2f}%")
    if val_accuracy > best_val_acc:
        best_val_acc = val_accuracy
        print(f"🎉 New best validation accuracy: {best_val_acc:.2f}%. Saving model...")
        torch.save({'model_state_dict': model.state_dict()}, CHECKPOINT_BEST_PATH)
    scheduler.step()

# --- Final Evaluation ---
print("\n--- FINAL EVALUATION OF ADVANCED GENERALIST MODEL ---")
print(f"Loading best model (from epoch with {best_val_acc:.2f}% validation accuracy) for final testing...")
best_checkpoint = torch.load(CHECKPOINT_BEST_PATH); model.load_state_dict(best_checkpoint['model_state_dict']); model.eval()
all_preds, all_true = [], []
with torch.no_grad():
    for inputs, labels in tqdm(test_loader, desc="Final Evaluation"):
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs); _, preds = torch.max(outputs, 1); all_preds.extend(preds.cpu().numpy()); all_true.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_true, all_preds)
print(f"\nFinal Advanced Generalist Model Accuracy on the Test Set: {accuracy * 100:.2f}%")
print("\nClassification Report:"); print(classification_report(all_true, all_preds, target_names=unified_emotion_labels, zero_division=0))

Using device: cuda
Loading pre-defined and balanced data splits...
Verifying that all spectrogram files exist...
Train set: 11385 files found, 0 skipped.
Validation set: 1266 files found, 0 skipped.
Test set: 2233 files found, 0 skipped.
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


100%|██████████| 44.7M/44.7M [00:00<00:00, 68.0MB/s]


Starting advanced training with SpecAugment...


Epoch 1/40 [Train]: 100%|██████████| 178/178 [04:47<00:00,  1.61s/it]
Epoch 1/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.41it/s]


Epoch 1/40 | Train Loss: 1.0303 | Val Loss: 1.4354 | Val Acc: 57.35%
🎉 New best validation accuracy: 57.35%. Saving model...


Epoch 2/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.27it/s]
Epoch 2/40 [Val]: 100%|██████████| 20/20 [00:05<00:00,  3.93it/s]


Epoch 2/40 | Train Loss: 0.6802 | Val Loss: 0.8935 | Val Acc: 67.77%
🎉 New best validation accuracy: 67.77%. Saving model...


Epoch 3/40 [Train]: 100%|██████████| 178/178 [00:42<00:00,  4.15it/s]
Epoch 3/40 [Val]: 100%|██████████| 20/20 [00:03<00:00,  5.05it/s]


Epoch 3/40 | Train Loss: 0.5857 | Val Loss: 1.3444 | Val Acc: 60.74%


Epoch 4/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.32it/s]
Epoch 4/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.08it/s]


Epoch 4/40 | Train Loss: 0.4942 | Val Loss: 1.5582 | Val Acc: 55.13%


Epoch 5/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.32it/s]
Epoch 5/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.68it/s]


Epoch 5/40 | Train Loss: 0.4041 | Val Loss: 0.6286 | Val Acc: 77.80%
🎉 New best validation accuracy: 77.80%. Saving model...


Epoch 6/40 [Train]: 100%|██████████| 178/178 [00:43<00:00,  4.11it/s]
Epoch 6/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.91it/s]


Epoch 6/40 | Train Loss: 0.3423 | Val Loss: 0.7170 | Val Acc: 74.72%


Epoch 7/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.24it/s]
Epoch 7/40 [Val]: 100%|██████████| 20/20 [00:05<00:00,  3.96it/s]


Epoch 7/40 | Train Loss: 0.2975 | Val Loss: 1.5647 | Val Acc: 63.27%


Epoch 8/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.25it/s]
Epoch 8/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.95it/s]


Epoch 8/40 | Train Loss: 0.2288 | Val Loss: 0.9403 | Val Acc: 74.01%


Epoch 9/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.26it/s]
Epoch 9/40 [Val]: 100%|██████████| 20/20 [00:03<00:00,  5.01it/s]


Epoch 9/40 | Train Loss: 0.1965 | Val Loss: 0.8618 | Val Acc: 72.91%


Epoch 10/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.28it/s]
Epoch 10/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.01it/s]


Epoch 10/40 | Train Loss: 0.1335 | Val Loss: 0.9757 | Val Acc: 75.59%


Epoch 11/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.34it/s]
Epoch 11/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.94it/s]


Epoch 11/40 | Train Loss: 0.1175 | Val Loss: 1.0396 | Val Acc: 76.30%


Epoch 12/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.31it/s]
Epoch 12/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.90it/s]


Epoch 12/40 | Train Loss: 0.1045 | Val Loss: 0.7833 | Val Acc: 80.09%
🎉 New best validation accuracy: 80.09%. Saving model...


Epoch 13/40 [Train]: 100%|██████████| 178/178 [00:44<00:00,  4.04it/s]
Epoch 13/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.57it/s]


Epoch 13/40 | Train Loss: 0.0845 | Val Loss: 0.7901 | Val Acc: 80.96%
🎉 New best validation accuracy: 80.96%. Saving model...


Epoch 14/40 [Train]: 100%|██████████| 178/178 [00:43<00:00,  4.09it/s]
Epoch 14/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.77it/s]


Epoch 14/40 | Train Loss: 0.0625 | Val Loss: 0.9422 | Val Acc: 79.86%


Epoch 15/40 [Train]: 100%|██████████| 178/178 [00:42<00:00,  4.23it/s]
Epoch 15/40 [Val]: 100%|██████████| 20/20 [00:05<00:00,  3.97it/s]


Epoch 15/40 | Train Loss: 0.0521 | Val Loss: 0.9721 | Val Acc: 80.17%


Epoch 16/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.28it/s]
Epoch 16/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.70it/s]


Epoch 16/40 | Train Loss: 0.0356 | Val Loss: 0.8286 | Val Acc: 81.28%
🎉 New best validation accuracy: 81.28%. Saving model...


Epoch 17/40 [Train]: 100%|██████████| 178/178 [00:43<00:00,  4.12it/s]
Epoch 17/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.15it/s]


Epoch 17/40 | Train Loss: 0.0295 | Val Loss: 1.1224 | Val Acc: 79.70%


Epoch 18/40 [Train]: 100%|██████████| 178/178 [00:42<00:00,  4.21it/s]
Epoch 18/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.77it/s]


Epoch 18/40 | Train Loss: 0.0427 | Val Loss: 0.8477 | Val Acc: 81.67%
🎉 New best validation accuracy: 81.67%. Saving model...


Epoch 19/40 [Train]: 100%|██████████| 178/178 [00:43<00:00,  4.08it/s]
Epoch 19/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.83it/s]


Epoch 19/40 | Train Loss: 0.0351 | Val Loss: 0.9897 | Val Acc: 80.65%


Epoch 20/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.24it/s]
Epoch 20/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.26it/s]


Epoch 20/40 | Train Loss: 0.0147 | Val Loss: 0.8868 | Val Acc: 82.07%
🎉 New best validation accuracy: 82.07%. Saving model...


Epoch 21/40 [Train]: 100%|██████████| 178/178 [00:42<00:00,  4.14it/s]
Epoch 21/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.99it/s]


Epoch 21/40 | Train Loss: 0.0123 | Val Loss: 0.9049 | Val Acc: 82.07%


Epoch 22/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.33it/s]
Epoch 22/40 [Val]: 100%|██████████| 20/20 [00:05<00:00,  3.88it/s]


Epoch 22/40 | Train Loss: 0.0182 | Val Loss: 0.9695 | Val Acc: 81.04%


Epoch 23/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.30it/s]
Epoch 23/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.92it/s]


Epoch 23/40 | Train Loss: 0.0138 | Val Loss: 1.0451 | Val Acc: 80.02%


Epoch 24/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.28it/s]
Epoch 24/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.24it/s]


Epoch 24/40 | Train Loss: 0.0116 | Val Loss: 0.9176 | Val Acc: 81.28%


Epoch 25/40 [Train]: 100%|██████████| 178/178 [00:40<00:00,  4.34it/s]
Epoch 25/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.78it/s]


Epoch 25/40 | Train Loss: 0.0137 | Val Loss: 0.9317 | Val Acc: 82.31%
🎉 New best validation accuracy: 82.31%. Saving model...


Epoch 26/40 [Train]: 100%|██████████| 178/178 [00:42<00:00,  4.21it/s]
Epoch 26/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.89it/s]


Epoch 26/40 | Train Loss: 0.0106 | Val Loss: 0.9337 | Val Acc: 81.67%


Epoch 27/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.33it/s]
Epoch 27/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.22it/s]


Epoch 27/40 | Train Loss: 0.0095 | Val Loss: 0.9309 | Val Acc: 81.99%


Epoch 28/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.30it/s]
Epoch 28/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.95it/s]


Epoch 28/40 | Train Loss: 0.0062 | Val Loss: 1.0236 | Val Acc: 81.28%


Epoch 29/40 [Train]: 100%|██████████| 178/178 [00:40<00:00,  4.35it/s]
Epoch 29/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.52it/s]


Epoch 29/40 | Train Loss: 0.0048 | Val Loss: 0.9519 | Val Acc: 82.78%
🎉 New best validation accuracy: 82.78%. Saving model...


Epoch 30/40 [Train]: 100%|██████████| 178/178 [00:43<00:00,  4.13it/s]
Epoch 30/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.98it/s]


Epoch 30/40 | Train Loss: 0.0044 | Val Loss: 0.9349 | Val Acc: 83.02%
🎉 New best validation accuracy: 83.02%. Saving model...


Epoch 31/40 [Train]: 100%|██████████| 178/178 [00:43<00:00,  4.14it/s]
Epoch 31/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.29it/s]


Epoch 31/40 | Train Loss: 0.0035 | Val Loss: 0.9416 | Val Acc: 82.70%


Epoch 32/40 [Train]: 100%|██████████| 178/178 [00:42<00:00,  4.20it/s]
Epoch 32/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.75it/s]


Epoch 32/40 | Train Loss: 0.0040 | Val Loss: 0.9845 | Val Acc: 81.91%


Epoch 33/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.27it/s]
Epoch 33/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.80it/s]


Epoch 33/40 | Train Loss: 0.0026 | Val Loss: 0.9538 | Val Acc: 82.15%


Epoch 34/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.27it/s]
Epoch 34/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.22it/s]


Epoch 34/40 | Train Loss: 0.0020 | Val Loss: 0.9455 | Val Acc: 82.86%


Epoch 35/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.31it/s]
Epoch 35/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.92it/s]


Epoch 35/40 | Train Loss: 0.0020 | Val Loss: 0.9849 | Val Acc: 82.54%


Epoch 36/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.33it/s]
Epoch 36/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.30it/s]


Epoch 36/40 | Train Loss: 0.0028 | Val Loss: 0.9630 | Val Acc: 83.02%


Epoch 37/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.28it/s]
Epoch 37/40 [Val]: 100%|██████████| 20/20 [00:03<00:00,  5.02it/s]


Epoch 37/40 | Train Loss: 0.0018 | Val Loss: 0.9677 | Val Acc: 82.54%


Epoch 38/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.32it/s]
Epoch 38/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.95it/s]


Epoch 38/40 | Train Loss: 0.0012 | Val Loss: 0.9515 | Val Acc: 82.94%


Epoch 39/40 [Train]: 100%|██████████| 178/178 [00:40<00:00,  4.35it/s]
Epoch 39/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.08it/s]


Epoch 39/40 | Train Loss: 0.0035 | Val Loss: 0.9822 | Val Acc: 82.94%


Epoch 40/40 [Train]: 100%|██████████| 178/178 [00:41<00:00,  4.33it/s]
Epoch 40/40 [Val]: 100%|██████████| 20/20 [00:04<00:00,  4.96it/s]


Epoch 40/40 | Train Loss: 0.0015 | Val Loss: 0.9657 | Val Acc: 81.91%

--- FINAL EVALUATION OF ADVANCED GENERALIST MODEL ---
Loading best model (from epoch with 83.02% validation accuracy) for final testing...


Final Evaluation: 100%|██████████| 35/35 [00:07<00:00,  4.44it/s]


Final Advanced Generalist Model Accuracy on the Test Set: 81.01%

Classification Report:
              precision    recall  f1-score   support

     neutral       0.79      0.82      0.81       266
       happy       0.83      0.81      0.82       374
         sad       0.76      0.83      0.80       393
       angry       0.84      0.86      0.85       383
     fearful       0.83      0.74      0.78       390
     disgust       0.82      0.80      0.81       427

    accuracy                           0.81      2233
   macro avg       0.81      0.81      0.81      2233
weighted avg       0.81      0.81      0.81      2233






## Part 2: The Final Verdict

This is the definitive evaluation of our champion CNN-based model. After training with our advanced regimen, we load the best-performing checkpoint and evaluate it separately on the RAVDESS and CREMA-D test sets. This provides a clear measure of its mastery of the clean domain and its generalization capability on the challenging domain.

In [5]:
# ===================================================================
# FINAL SCRIPT v2: Evaluating the Generalist Model (with Path Fix)
# ===================================================================
import torch, torch.nn as nn, os, numpy as np, pickle
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import accuracy_score, classification_report
from tqdm import tqdm
from torchvision import models

# --- Configuration ---
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/processed_spectrograms_final/"
FILE_LIST_PATH = "/content/drive/MyDrive/ser_project/"
CHECKPOINT_BEST_PATH = "/content/drive/MyDrive/ser_project/resnet_advanced_best.pth"
BATCH_SIZE = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); print(f"Using device: {device}")

# --- Mappings and Dataset Class ---
unified_emotion_map = { "neutral": 0, "happy": 1, "sad": 2, "angry": 3, "fearful": 4, "disgust": 5 }
unified_emotion_labels = ["neutral", "happy", "sad", "angry", "fearful", "disgust"]

def get_basename(path): # Robust way to get filename
    return path.replace('\\', '/').split('/')[-1]

class SpectrogramDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300):
        self.file_paths, self.labels, self.target_width = file_paths, labels, target_width
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        filename = get_basename(self.file_paths[idx]).replace('.wav', '.npy')
        file_path = os.path.join(SPECTROGRAM_PATH, filename)
        label = self.labels[idx]
        spectrogram = np.load(file_path)
        current_width = spectrogram.shape[1]
        if current_width < self.target_width: spectrogram = np.pad(spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: spectrogram = spectrogram[:, :self.target_width]
        spec_min, spec_max = spectrogram.min(), spectrogram.max()
        if spec_max > spec_min: spectrogram = (spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([spectrogram, spectrogram, spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

# --- Load the Best Trained Model ---
print("Loading the best 'Ultimate Generalist' model...")
model = models.resnet18(); model.fc = nn.Linear(model.fc.in_features, len(unified_emotion_labels));
best_checkpoint = torch.load(CHECKPOINT_BEST_PATH); model.load_state_dict(best_checkpoint['model_state_dict']);
model = model.to(device)
model.eval()

# --- Load the test set file list ---
print("Loading the test set data split...")
with open(os.path.join(FILE_LIST_PATH, 'test_files.pkl'), 'rb') as f: test_files_raw = pickle.load(f)

# --- THIS IS THE CRUCIAL FIX ---
# Normalize Windows paths ('\') to Linux paths ('/')
test_files = [p.replace('\\', '/') for p in test_files_raw]
print("File paths normalized for Linux environment.")

# --- Create the label list for the test set ---
ravdess_map = { "01": "neutral", "03": "happy", "04": "sad", "05": "angry", "06": "fearful", "07": "disgust" }
crema_d_map = { "NEU": "neutral", "HAP": "happy", "SAD": "sad", "ANG": "angry", "FEA": "fearful", "DIS": "disgust" }
def get_label(filepath):
    filename = get_basename(filepath)
    try:
        if '03-01' in filename: return unified_emotion_map[ravdess_map[filename.split("-")[2]]]
        else: return unified_emotion_map[crema_d_map[filename.split("_")[2]]]
    except (IndexError, KeyError): return None
test_labels = [get_label(f) for f in test_files]

# Filter out any files that might have failed label parsing
valid_indices = [i for i, lbl in enumerate(test_labels) if lbl is not None]
test_files = [test_files[i] for i in valid_indices]
test_labels = [test_labels[i] for i in valid_indices]

# --- Filter the test set for each dataset ---
ravdess_test_files = [f for f in test_files if 'ravdess_data' in f.lower()]
ravdess_test_labels = [l for i, l in enumerate(test_labels) if 'ravdess_data' in test_files[i].lower()]

crema_d_test_files = [f for f in test_files if 'crema_d_data' in f.lower()]
crema_d_test_labels = [l for i, l in enumerate(test_labels) if 'crema_d_data' in test_files[i].lower()]

# --- Evaluation Function ---
def evaluate(files, labels, name):
    dataset = SpectrogramDataset(files, labels)
    loader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=False)
    all_preds, all_true = [], []
    with torch.no_grad():
        for inputs, labs in tqdm(loader, desc=f"Evaluating on {name}"):
            inputs, labs = inputs.to(device), labs.to(device)
            outputs = model(inputs); _, preds = torch.max(outputs, 1); all_preds.extend(preds.cpu().numpy()); all_true.extend(labs.cpu().numpy())
    accuracy = accuracy_score(all_true, all_preds)
    print(f"\n>>> Accuracy on {name}: {accuracy * 100:.2f}%")
    print(f"Classification Report for {name}:"); print(classification_report(all_true, all_preds, target_names=unified_emotion_labels, zero_division=0))

# --- Run the Final Evaluations ---
if ravdess_test_files:
    evaluate(ravdess_test_files, ravdess_test_labels, "RAVDESS Test Set")
if crema_d_test_files:
    evaluate(crema_d_test_files, crema_d_test_labels, "CREMA-D Test Set")

Using device: cuda
Loading the best 'Ultimate Generalist' model...
Loading the test set data split...
File paths normalized for Linux environment.


Evaluating on RAVDESS Test Set: 100%|██████████| 18/18 [00:04<00:00,  3.80it/s]



>>> Accuracy on RAVDESS Test Set: 99.73%
Classification Report for RAVDESS Test Set:
              precision    recall  f1-score   support

     neutral       0.97      1.00      0.98        97
       happy       1.00      0.99      0.99       202
         sad       1.00      1.00      1.00       198
       angry       1.00      1.00      1.00       188
     fearful       1.00      1.00      1.00       203
     disgust       1.00      1.00      1.00       216

    accuracy                           1.00      1104
   macro avg       0.99      1.00      1.00      1104
weighted avg       1.00      1.00      1.00      1104



Evaluating on CREMA-D Test Set: 100%|██████████| 18/18 [00:04<00:00,  3.83it/s]


>>> Accuracy on CREMA-D Test Set: 62.71%
Classification Report for CREMA-D Test Set:
              precision    recall  f1-score   support

     neutral       0.69      0.71      0.70       169
       happy       0.62      0.61      0.62       172
         sad       0.56      0.67      0.61       195
       angry       0.69      0.72      0.70       195
     fearful       0.59      0.47      0.52       187
     disgust       0.62      0.59      0.61       211

    accuracy                           0.63      1129
   macro avg       0.63      0.63      0.63      1129
weighted avg       0.63      0.63      0.62      1129




