# Phase 2: Quantifying the Domain Gap and Exploring Initial Solutions

**Objective:** After establishing a strong baseline, this phase is dedicated to rigorously defining the core research problem.

Our workplan is as follows:
1.  **Create a More Robust Specialist:** We will retrain our Specialist model using on-the-fly data augmentation and early stopping to create the best possible version.
2.  **Conduct a Cross-Domain Evaluation:** We will take this robust model and test it on a different, more challenging dataset (CREMA-D) to precisely measure its failure to generalize.
3.  **Test a First Solution:** We will explore Knowledge Distillation as an initial, simple method to see if it can solve the domain generalization problem.

In [1]:
# --- Mount Google Drive ---
from google.colab import drive
drive.mount('/content/drive')

# --- Install All Necessary Packages ---
!pip install librosa resampy praat-parselmouth audiomentations

Mounted at /content/drive
Collecting resampy
  Downloading resampy-0.4.3-py3-none-any.whl.metadata (3.0 kB)
Collecting praat-parselmouth
  Downloading praat_parselmouth-0.4.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.9 kB)
Collecting audiomentations
  Downloading audiomentations-0.42.0-py3-none-any.whl.metadata (11 kB)
Collecting numpy-minmax<1,>=0.3.0 (from audiomentations)
  Downloading numpy_minmax-0.5.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)
Collecting numpy-rms<1,>=0.4.2 (from audiomentations)
  Downloading numpy_rms-0.6.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.5 kB)
Collecting python-stretch<1,>=0.3.1 (from audiomentations)
  Downloading python_stretch-0.3.1-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Downloading resampy-0.4.3-py3-none-any.whl (3.1 MB)
[2K   [90m━━━━━━━━━━

In [2]:
# --- This script creates the pre-computed spectrograms for RAVDESS ---
import os
import librosa
import numpy as np
from tqdm import tqdm

# --- Configuration ---
AUDIO_PATH = "/content/drive/MyDrive/ser_project/ravdess_data/"
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/ravdess_spectrograms/"
os.makedirs(SPECTROGRAM_PATH, exist_ok=True)

print("Starting audio to spectrogram conversion for RAVDESS...")
actor_folders = [f for f in os.listdir(AUDIO_PATH) if os.path.isdir(os.path.join(AUDIO_PATH, f))]
for actor_folder in tqdm(actor_folders, desc="Processing RAVDESS Actors"):
    actor_path = os.path.join(AUDIO_PATH, actor_folder)
    for file_name in os.listdir(actor_path):
        try:
            file_path = os.path.join(actor_path, file_name)
            audio, sr = librosa.load(file_path, res_type='kaiser_fast', duration=3, sr=22050*2, offset=0.5)
            spectrogram = librosa.feature.melspectrogram(y=audio, sr=sr, n_mels=128, fmax=8000)
            db_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)
            output_filename = os.path.join(SPECTROGRAM_PATH, f"{os.path.splitext(file_name)[0]}.npy")
            np.save(output_filename, db_spectrogram)
        except Exception as e:
            print(f"\nError processing {file_path}: {e}")

print("\nRAVDESS spectrogram conversion complete.")

Starting audio to spectrogram conversion for RAVDESS...


Processing RAVDESS Actors: 100%|██████████| 24/24 [20:22<00:00, 50.93s/it]


RAVDESS spectrogram conversion complete.





## Part 1: Creating a More Robust Specialist Model

To ensure our test is as fair as possible, we first create a stronger Specialist model. We enhance the training process from Phase 1 in two key ways:
* **On-the-Fly Augmentation:** We now apply random pitch shifts and add Gaussian noise directly to the raw audio during training. This prevents the model from simply memorizing the training data and forces it to learn more fundamental features of emotional speech.
* **Early Stopping:** The model's state is saved only when it achieves a new best performance on the validation set, preventing overfitting.

In [3]:
# --- This script trains the main ResNet18 model with augmentation and early stopping ---
import torch, torch.nn as nn, torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import os, numpy as np, librosa
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from tqdm import tqdm
from torch.optim.lr_scheduler import StepLR
from torchvision import models
from audiomentations import Compose, AddGaussianNoise, PitchShift

# --- Configuration ---
AUDIO_PATH = "/content/drive/MyDrive/ser_project/ravdess_data/"
LEARNING_RATE = 0.001; BATCH_SIZE = 32; EPOCHS = 30
CHECKPOINT_BEST_PATH = "/content/drive/MyDrive/ser_project/resnet_best_augmented.pth"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# --- Mappings and Augmenter ---
emotion_map = { "01": 0, "02": 1, "03": 2, "04": 3, "05": 4, "06": 5, "07": 6, "08": 7 }
emotion_labels_list = ["neutral", "calm", "happy", "sad", "angry", "fearful", "disgust", "surprise"]
augmenter = Compose([AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.4), PitchShift(min_semitones=-4, max_semitones=4, p=0.4)])

# --- Dataset for On-the-Fly Augmentation ---
class AudioDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300, augment=False):
        self.file_paths = file_paths; self.labels = labels
        self.target_width = target_width; self.augment = augment
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        file_path = self.file_paths[idx]; label = self.labels[idx]
        audio, sr = librosa.load(file_path, res_type='kaiser_fast', duration=3, sr=22050*2, offset=0.5)
        if self.augment: audio = augmenter(samples=audio, sample_rate=sr)
        spectrogram = librosa.feature.melspectrogram(y=audio, sr=sr, n_mels=128, fmax=8000)
        db_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)
        current_width = db_spectrogram.shape[1]
        if current_width < self.target_width: db_spectrogram = np.pad(db_spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: db_spectrogram = db_spectrogram[:, :self.target_width]
        spec_min, spec_max = db_spectrogram.min(), db_spectrogram.max()
        if spec_max > spec_min: db_spectrogram = (db_spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([db_spectrogram, db_spectrogram, db_spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

# --- Prepare Data ---
all_files = []; all_labels = []
for root, dirs, files in os.walk(AUDIO_PATH):
    for file in files:
        if file.endswith('.wav'): all_files.append(os.path.join(root, file))
all_labels = [emotion_map[os.path.basename(f).split("-")[2]] for f in all_files]
train_files, temp_files, train_labels, temp_labels = train_test_split(all_files, all_labels, test_size=0.2, random_state=42, stratify=all_labels)
val_files, test_files, val_labels, test_labels = train_test_split(temp_files, temp_labels, test_size=0.5, random_state=42, stratify=temp_labels)
train_dataset = AudioDataset(train_files, train_labels, augment=True)
val_dataset = AudioDataset(val_files, val_labels, augment=False)
test_dataset = AudioDataset(test_files, test_labels, augment=False)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True); val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False); test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

# --- Initialize Model and Optimizer ---
model = models.resnet18(weights='IMAGENET1K_V1'); model.fc = nn.Linear(model.fc.in_features, len(emotion_labels_list)); model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE); criterion = nn.CrossEntropyLoss(); scheduler = StepLR(optimizer, step_size=7, gamma=0.1)

# --- Training Loop with Early Stopping ---
best_val_acc = 0.0
print("Starting main training with augmentation and early stopping...")
for epoch in range(EPOCHS):
    model.train(); running_loss = 0.0
    for inputs, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{EPOCHS} [Train]"):
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad(); outputs = model(inputs); loss = criterion(outputs, labels)
        loss.backward(); optimizer.step(); running_loss += loss.item() * inputs.size(0)
    train_loss = running_loss / len(train_dataset)

    model.eval(); val_loss = 0.0; correct = 0; total = 0
    with torch.no_grad():
        for inputs, labels in tqdm(val_loader, desc=f"Epoch {epoch+1}/{EPOCHS} [Val]"):
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs); loss = criterion(outputs, labels); val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs.data, 1); total += labels.size(0); correct += (predicted == labels).sum().item()
    val_accuracy = 100 * correct / total; val_loss /= len(val_dataset)
    print(f"Epoch {epoch+1}/{EPOCHS} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f} | Val Acc: {val_accuracy:.2f}%")

    if val_accuracy > best_val_acc:
        best_val_acc = val_accuracy
        print(f"🎉 New best validation accuracy: {best_val_acc:.2f}%. Saving model...")
        torch.save({'epoch': epoch + 1, 'model_state_dict': model.state_dict()}, CHECKPOINT_BEST_PATH)
    scheduler.step()

# --- Final Evaluation on Test Set ---
print("\n--- FINAL EVALUATION ON RAVDESS TEST SET ---")
print(f"Loading best model from epoch with accuracy: {best_val_acc:.2f}%")
best_checkpoint = torch.load(CHECKPOINT_BEST_PATH); model.load_state_dict(best_checkpoint['model_state_dict']); model.eval()
all_preds = []; all_true = []
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs); _, preds = torch.max(outputs, 1); all_preds.extend(preds.cpu().numpy()); all_true.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_true, all_preds)
print(f"Final Model Accuracy on Test Set: {accuracy * 100:.2f}%")
print("\nClassification Report:"); print(classification_report(all_true, all_preds, target_names=emotion_labels_list, zero_division=0))

Using device: cuda
Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


100%|██████████| 44.7M/44.7M [00:00<00:00, 147MB/s]


Starting main training with augmentation and early stopping...


Epoch 1/30 [Train]: 100%|██████████| 36/36 [01:58<00:00,  3.29s/it]
Epoch 1/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.21s/it]


Epoch 1/30 | Train Loss: 1.6597 | Val Loss: 1.6046 | Val Acc: 52.08%
🎉 New best validation accuracy: 52.08%. Saving model...


Epoch 2/30 [Train]: 100%|██████████| 36/36 [02:01<00:00,  3.39s/it]
Epoch 2/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.20s/it]


Epoch 2/30 | Train Loss: 1.2291 | Val Loss: 2.2854 | Val Acc: 37.50%


Epoch 3/30 [Train]: 100%|██████████| 36/36 [01:59<00:00,  3.31s/it]
Epoch 3/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.15s/it]


Epoch 3/30 | Train Loss: 0.9685 | Val Loss: 3.3568 | Val Acc: 34.72%


Epoch 4/30 [Train]: 100%|██████████| 36/36 [02:01<00:00,  3.38s/it]
Epoch 4/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.23s/it]


Epoch 4/30 | Train Loss: 0.8515 | Val Loss: 1.8425 | Val Acc: 42.36%


Epoch 5/30 [Train]: 100%|██████████| 36/36 [02:03<00:00,  3.42s/it]
Epoch 5/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.80s/it]


Epoch 5/30 | Train Loss: 0.7350 | Val Loss: 1.8061 | Val Acc: 52.78%
🎉 New best validation accuracy: 52.78%. Saving model...


Epoch 6/30 [Train]: 100%|██████████| 36/36 [02:04<00:00,  3.46s/it]
Epoch 6/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.07s/it]


Epoch 6/30 | Train Loss: 0.6054 | Val Loss: 1.4610 | Val Acc: 56.94%
🎉 New best validation accuracy: 56.94%. Saving model...


Epoch 7/30 [Train]: 100%|██████████| 36/36 [02:00<00:00,  3.36s/it]
Epoch 7/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.22s/it]


Epoch 7/30 | Train Loss: 0.6307 | Val Loss: 4.8132 | Val Acc: 26.39%


Epoch 8/30 [Train]: 100%|██████████| 36/36 [02:00<00:00,  3.35s/it]
Epoch 8/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.84s/it]


Epoch 8/30 | Train Loss: 0.3575 | Val Loss: 0.7941 | Val Acc: 72.92%
🎉 New best validation accuracy: 72.92%. Saving model...


Epoch 9/30 [Train]: 100%|██████████| 36/36 [02:02<00:00,  3.40s/it]
Epoch 9/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.18s/it]


Epoch 9/30 | Train Loss: 0.2264 | Val Loss: 0.7483 | Val Acc: 75.00%
🎉 New best validation accuracy: 75.00%. Saving model...


Epoch 10/30 [Train]: 100%|██████████| 36/36 [02:04<00:00,  3.46s/it]
Epoch 10/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.20s/it]


Epoch 10/30 | Train Loss: 0.2257 | Val Loss: 0.7471 | Val Acc: 75.69%
🎉 New best validation accuracy: 75.69%. Saving model...


Epoch 11/30 [Train]: 100%|██████████| 36/36 [02:01<00:00,  3.36s/it]
Epoch 11/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.80s/it]


Epoch 11/30 | Train Loss: 0.1959 | Val Loss: 0.7935 | Val Acc: 75.69%


Epoch 12/30 [Train]: 100%|██████████| 36/36 [02:02<00:00,  3.39s/it]
Epoch 12/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.22s/it]


Epoch 12/30 | Train Loss: 0.1515 | Val Loss: 0.7745 | Val Acc: 73.61%


Epoch 13/30 [Train]: 100%|██████████| 36/36 [02:00<00:00,  3.35s/it]
Epoch 13/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.05s/it]


Epoch 13/30 | Train Loss: 0.1477 | Val Loss: 0.7753 | Val Acc: 73.61%


Epoch 14/30 [Train]: 100%|██████████| 36/36 [02:02<00:00,  3.41s/it]
Epoch 14/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.03s/it]


Epoch 14/30 | Train Loss: 0.1374 | Val Loss: 0.7800 | Val Acc: 74.31%


Epoch 15/30 [Train]: 100%|██████████| 36/36 [02:02<00:00,  3.39s/it]
Epoch 15/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.20s/it]


Epoch 15/30 | Train Loss: 0.1252 | Val Loss: 0.7725 | Val Acc: 74.31%


Epoch 16/30 [Train]: 100%|██████████| 36/36 [02:02<00:00,  3.39s/it]
Epoch 16/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.94s/it]


Epoch 16/30 | Train Loss: 0.1105 | Val Loss: 0.7564 | Val Acc: 75.69%


Epoch 17/30 [Train]: 100%|██████████| 36/36 [02:03<00:00,  3.42s/it]
Epoch 17/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.08s/it]


Epoch 17/30 | Train Loss: 0.1151 | Val Loss: 0.7733 | Val Acc: 74.31%


Epoch 18/30 [Train]: 100%|██████████| 36/36 [02:00<00:00,  3.34s/it]
Epoch 18/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.27s/it]


Epoch 18/30 | Train Loss: 0.1131 | Val Loss: 0.7818 | Val Acc: 75.00%


Epoch 19/30 [Train]: 100%|██████████| 36/36 [02:01<00:00,  3.38s/it]
Epoch 19/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.87s/it]


Epoch 19/30 | Train Loss: 0.1165 | Val Loss: 0.7920 | Val Acc: 75.00%


Epoch 20/30 [Train]: 100%|██████████| 36/36 [01:58<00:00,  3.30s/it]
Epoch 20/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.18s/it]


Epoch 20/30 | Train Loss: 0.1083 | Val Loss: 0.7854 | Val Acc: 76.39%
🎉 New best validation accuracy: 76.39%. Saving model...


Epoch 21/30 [Train]: 100%|██████████| 36/36 [01:59<00:00,  3.33s/it]
Epoch 21/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.07s/it]


Epoch 21/30 | Train Loss: 0.1018 | Val Loss: 0.7763 | Val Acc: 75.69%


Epoch 22/30 [Train]: 100%|██████████| 36/36 [02:01<00:00,  3.38s/it]
Epoch 22/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.17s/it]


Epoch 22/30 | Train Loss: 0.1201 | Val Loss: 0.7904 | Val Acc: 75.00%


Epoch 23/30 [Train]: 100%|██████████| 36/36 [02:01<00:00,  3.38s/it]
Epoch 23/30 [Val]: 100%|██████████| 5/5 [00:08<00:00,  1.77s/it]


Epoch 23/30 | Train Loss: 0.0931 | Val Loss: 0.7856 | Val Acc: 75.00%


Epoch 24/30 [Train]: 100%|██████████| 36/36 [01:59<00:00,  3.32s/it]
Epoch 24/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.17s/it]


Epoch 24/30 | Train Loss: 0.1056 | Val Loss: 0.7773 | Val Acc: 75.00%


Epoch 25/30 [Train]: 100%|██████████| 36/36 [02:00<00:00,  3.34s/it]
Epoch 25/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.85s/it]


Epoch 25/30 | Train Loss: 0.1054 | Val Loss: 0.7848 | Val Acc: 74.31%


Epoch 26/30 [Train]: 100%|██████████| 36/36 [02:00<00:00,  3.36s/it]
Epoch 26/30 [Val]: 100%|██████████| 5/5 [00:10<00:00,  2.19s/it]


Epoch 26/30 | Train Loss: 0.1037 | Val Loss: 0.7892 | Val Acc: 75.69%


Epoch 27/30 [Train]: 100%|██████████| 36/36 [01:59<00:00,  3.33s/it]
Epoch 27/30 [Val]: 100%|██████████| 5/5 [00:09<00:00,  1.91s/it]


Epoch 27/30 | Train Loss: 0.1078 | Val Loss: 0.8031 | Val Acc: 74.31%


Epoch 28/30 [Train]: 100%|██████████| 36/36 [02:03<00:00,  3.43s/it]
Epoch 28/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.28s/it]


Epoch 28/30 | Train Loss: 0.1021 | Val Loss: 0.7752 | Val Acc: 75.00%


Epoch 29/30 [Train]: 100%|██████████| 36/36 [02:04<00:00,  3.45s/it]
Epoch 29/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.26s/it]


Epoch 29/30 | Train Loss: 0.1042 | Val Loss: 0.7836 | Val Acc: 75.00%


Epoch 30/30 [Train]: 100%|██████████| 36/36 [02:03<00:00,  3.44s/it]
Epoch 30/30 [Val]: 100%|██████████| 5/5 [00:11<00:00,  2.29s/it]


Epoch 30/30 | Train Loss: 0.1001 | Val Loss: 0.7893 | Val Acc: 74.31%

--- FINAL EVALUATION ON RAVDESS TEST SET ---
Loading best model from epoch with accuracy: 76.39%
Final Model Accuracy on Test Set: 80.56%

Classification Report:
              precision    recall  f1-score   support

     neutral       0.71      0.50      0.59        10
        calm       0.83      0.79      0.81        19
       happy       0.67      0.74      0.70        19
         sad       0.78      0.74      0.76        19
       angry       0.84      0.84      0.84        19
     fearful       0.76      0.80      0.78        20
     disgust       0.94      0.89      0.92        19
    surprise       0.86      1.00      0.93        19

    accuracy                           0.81       144
   macro avg       0.80      0.79      0.79       144
weighted avg       0.81      0.81      0.80       144



## Part 2: The Moment of Truth - Quantifying the Domain Gap

This is the most critical experiment so far. We take our best Specialist model, which was trained *only* on the clean, acted speech of the RAVDESS dataset, and evaluate its performance on the unseen CREMA-D dataset, which contains more natural, crowd-sourced speech.

This test will give us a hard number that quantifies the "Domain Gap"—the performance drop when a model is faced with data different from what it was trained on.

In [4]:
# --- This cell tests the best model on the CREMA-D dataset ---
# Make sure you have uploaded and unzipped CREMA-D to this path
CREMA_D_AUDIO_PATH = "/content/drive/MyDrive/ser_project/crema_d_data/AudioWAV/"
CREMA_D_SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/crema_d_spectrograms/"
CHECKPOINT_BEST_PATH = "/content/drive/MyDrive/ser_project/resnet_best_augmented.pth"

# === PART A: PREPROCESS CREMA-D ===
os.makedirs(CREMA_D_SPECTROGRAM_PATH, exist_ok=True)
print("\nStarting audio to spectrogram conversion for CREMA-D...")
crema_d_files_to_process = [f for f in os.listdir(CREMA_D_AUDIO_PATH) if f.endswith('.wav')]
for file_name in tqdm(crema_d_files_to_process, desc="Processing CREMA-D"):
    try:
        file_path = os.path.join(CREMA_D_AUDIO_PATH, file_name)
        audio, sr = librosa.load(file_path, res_type='kaiser_fast', duration=3, sr=22050*2, offset=0.5)
        spectrogram = librosa.feature.melspectrogram(y=audio, sr=sr, n_mels=128, fmax=8000)
        db_spectrogram = librosa.power_to_db(spectrogram, ref=np.max)
        output_filename = os.path.join(CREMA_D_SPECTROGRAM_PATH, f"{os.path.splitext(file_name)[0]}.npy")
        np.save(output_filename, db_spectrogram)
    except Exception as e:
        print(f"\nError processing {file_path}: {e}")
print("\nCREMA-D spectrogram conversion complete.")

# === PART B: EVALUATE ON CREMA-D ===
# --- Mappings and Dataset ---
crema_d_emotion_map = { "NEU": 0, "HAP": 2, "SAD": 3, "ANG": 4, "FEA": 5, "DIS": 6 }
# Note: We are mapping to the original 8-class indices that the model was trained on
class CremaDDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300):
        self.file_paths, self.labels, self.target_width = file_paths, labels, target_width
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        spectrogram = np.load(self.file_paths[idx]); label = self.labels[idx]
        current_width = spectrogram.shape[1]
        if current_width < self.target_width: spectrogram = np.pad(spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: spectrogram = spectrogram[:, :self.target_width]
        spec_min, spec_max = spectrogram.min(), spectrogram.max()
        if spec_max > spec_min: spectrogram = (spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([spectrogram, spectrogram, spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

crema_d_files = [os.path.join(CREMA_D_SPECTROGRAM_PATH, f) for f in os.listdir(CREMA_D_SPECTROGRAM_PATH) if f.endswith('.npy')]
crema_d_labels = [crema_d_emotion_map[os.path.basename(f).split("_")[2]] for f in crema_d_files]
test_dataset = CremaDDataset(crema_d_files, crema_d_labels)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

# --- Load Model and Evaluate ---
print("\nLoading best model for cross-corpus evaluation...")
model = models.resnet18(); model.fc = nn.Linear(model.fc.in_features, 8);
best_checkpoint = torch.load(CHECKPOINT_BEST_PATH); model.load_state_dict(best_checkpoint['model_state_dict']); model = model.to(device); model.eval()
print("\n--- EVALUATION ON CREMA-D DATASET ---")
all_preds, all_true = [], []
with torch.no_grad():
    for inputs, labels in tqdm(test_loader, desc="Evaluating on CREMA-D"):
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs); _, preds = torch.max(outputs, 1); all_preds.extend(preds.cpu().numpy()); all_true.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_true, all_preds)
print(f"Final Model Accuracy on CREMA-D: {accuracy * 100:.2f}%")
report_labels_indices = sorted(list(set(crema_d_labels))); report_labels_names = [emotion_labels_list[i] for i in report_labels_indices]
print("\nClassification Report:"); print(classification_report(all_true, all_preds, labels=report_labels_indices, target_names=report_labels_names, zero_division=0))


Starting audio to spectrogram conversion for CREMA-D...


Processing CREMA-D: 100%|██████████| 7442/7442 [11:33<00:00, 10.73it/s]



CREMA-D spectrogram conversion complete.

Loading best model for cross-corpus evaluation...

--- EVALUATION ON CREMA-D DATASET ---


Evaluating on CREMA-D: 100%|██████████| 233/233 [01:44<00:00,  2.23it/s]

Final Model Accuracy on CREMA-D: 22.39%

Classification Report:
              precision    recall  f1-score   support

     neutral       0.26      0.03      0.05      1087
       happy       0.30      0.02      0.04      1271
         sad       0.23      0.64      0.34      1271
       angry       0.84      0.08      0.15      1271
     fearful       0.24      0.36      0.28      1271
     disgust       0.42      0.19      0.26      1271

   micro avg       0.26      0.22      0.24      7442
   macro avg       0.38      0.22      0.19      7442
weighted avg       0.38      0.22      0.19      7442






## Part 3: A Failed First Attempt - Exploring Knowledge Distillation

With the domain gap clearly measured, we explore a potential solution. Knowledge Distillation is a technique where a smaller "student" model (our simple AudioCNN) is trained to mimic the predictions of a larger "teacher" model (our robust ResNet18). The hypothesis is that the student might learn a more compact and generalizable representation of the teacher's knowledge.

As the results show, this experiment was a valuable failure, demonstrating that distilling knowledge from a non-generalizing teacher is not a viable solution.

In [5]:
# --- This script trains a small 'student' CNN using the large 'teacher' ResNet18 ---
import torch, torch.nn as nn, torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import os, numpy as np
from sklearn.model_selection import train_test_split
from tqdm import tqdm
from torchvision import models

# --- Configuration ---
SPECTROGRAM_PATH = "/content/drive/MyDrive/ser_project/ravdess_spectrograms/"
TEACHER_CHECKPOINT_PATH = "/content/drive/MyDrive/ser_project/resnet_best_augmented.pth"
STUDENT_CHECKPOINT_PATH = "/content/drive/MyDrive/ser_project/student_cnn_distilled.pth"
LEARNING_RATE = 0.005; BATCH_SIZE = 32; EPOCHS = 50
TEMPERATURE = 4.0; ALPHA = 0.3
device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); print(f"Using device: {device}")

# --- Mappings and Dataset (using pre-computed spectrograms for speed) ---
emotion_map = { "01": 0, "02": 1, "03": 2, "04": 3, "05": 4, "06": 5, "07": 6, "08": 7 }
emotion_labels_list = ["neutral", "calm", "happy", "sad", "angry", "fearful", "disgust", "surprise"]
class SpectrogramDataset(Dataset):
    def __init__(self, file_paths, labels, target_width=300):
        self.file_paths, self.labels, self.target_width = file_paths, labels, target_width
    def __len__(self): return len(self.file_paths)
    def __getitem__(self, idx):
        spectrogram = np.load(self.file_paths[idx]); label = self.labels[idx]
        current_width = spectrogram.shape[1]
        if current_width < self.target_width: spectrogram = np.pad(spectrogram, ((0, 0), (0, self.target_width - current_width)), mode='constant')
        elif current_width > self.target_width: spectrogram = spectrogram[:, :self.target_width]
        spec_min, spec_max = spectrogram.min(), spectrogram.max()
        if spec_max > spec_min: spectrogram = (spectrogram - spec_min) / (spec_max - spec_min)
        spectrogram_3ch = np.stack([spectrogram, spectrogram, spectrogram], axis=0)
        return torch.tensor(spectrogram_3ch, dtype=torch.float32), torch.tensor(label, dtype=torch.long)

# --- Student Model Definition (AudioCNN Modified for 3-Channel Input) ---
class AudioCNN(nn.Module):
    def __init__(self, num_classes=8, flattened_size=9216):
        super(AudioCNN, self).__init__(); self.conv1 = nn.Conv2d(3, 16, 3, 1, 1); self.bn1 = nn.BatchNorm2d(16); self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(16, 32, 3, 1, 1); self.bn2 = nn.BatchNorm2d(32); self.pool2 = nn.MaxPool2d(2, 2)
        self.conv3 = nn.Conv2d(32, 64, 3, 1, 1); self.bn3 = nn.BatchNorm2d(64); self.pool3 = nn.MaxPool2d(4, 4)
        self.flatten = nn.Flatten(); self.fc1 = nn.Linear(flattened_size, 128); self.dropout = nn.Dropout(0.5); self.fc2 = nn.Linear(128, num_classes)
    def forward(self, x):
        x = self.pool1(F.relu(self.bn1(self.conv1(x)))); x = self.pool2(F.relu(self.bn2(self.conv2(x)))); x = self.pool3(F.relu(self.bn3(self.conv3(x))))
        x = self.flatten(x); x = F.relu(self.fc1(x)); x = self.dropout(x); x = self.fc2(x); return x

# --- Prepare Data ---
all_files = [os.path.join(SPECTROGRAM_PATH, f) for f in os.listdir(SPECTROGRAM_PATH) if f.endswith('.npy')]
all_labels = [emotion_map[os.path.basename(f).split("-")[2]] for f in all_files]
train_files, test_files, train_labels, test_labels = train_test_split(all_files, all_labels, test_size=0.2, random_state=42, stratify=all_labels)
train_dataset = SpectrogramDataset(train_files, train_labels); test_dataset = SpectrogramDataset(test_files, test_labels)
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True); test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

# --- Load Teacher Model ---
print("Loading Teacher model (ResNet18)...")
teacher_model = models.resnet18(); teacher_model.fc = nn.Linear(teacher_model.fc.in_features, len(emotion_labels_list))
teacher_checkpoint = torch.load(TEACHER_CHECKPOINT_PATH); teacher_model.load_state_dict(teacher_checkpoint['model_state_dict'])
teacher_model = teacher_model.to(device); teacher_model.eval()

# --- Initialize Student Model ---
print("Initializing Student model (AudioCNN)...")
student_model = AudioCNN(num_classes=8, flattened_size=9216).to(device)

# --- Optimizer and Loss Functions ---
optimizer = torch.optim.Adam(student_model.parameters(), lr=LEARNING_RATE)
criterion_ce = nn.CrossEntropyLoss(); criterion_kd = nn.KLDivLoss(reduction='batchmean')

# --- Distillation Training Loop ---
print("Starting distillation training...")
for epoch in range(EPOCHS):
    student_model.train(); running_loss = 0.0
    for inputs, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{EPOCHS}"):
        inputs, labels = inputs.to(device), labels.to(device)
        with torch.no_grad(): teacher_outputs = teacher_model(inputs)
        student_outputs = student_model(inputs)
        loss_ce = criterion_ce(student_outputs, labels)
        soft_teacher = F.softmax(teacher_outputs / TEMPERATURE, dim=1)
        soft_student = F.log_softmax(student_outputs / TEMPERATURE, dim=1)
        loss_kd = criterion_kd(soft_student, soft_teacher)
        loss = ALPHA * loss_ce + (1.0 - ALPHA) * (TEMPERATURE**2) * loss_kd
        optimizer.zero_grad(); loss.backward(); optimizer.step()
        running_loss += loss.item() * inputs.size(0)
    epoch_loss = running_loss / len(train_dataset); print(f"Epoch {epoch+1}/{EPOCHS} - Distillation Loss: {epoch_loss:.4f}")

# --- Final Evaluation of Student Model ---
student_model.eval(); all_preds, all_true = [], []
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = student_model(inputs); _, preds = torch.max(outputs, 1)
        all_preds.extend(preds.cpu().numpy()); all_true.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_true, all_preds)
print(f"\n--- FINAL EVALUATION OF DISTILLED STUDENT MODEL ---")
print(f"Student Model Accuracy: {accuracy * 100:.2f}%")
print("\nClassification Report:"); print(classification_report(all_true, all_preds, target_names=emotion_labels_list, zero_division=0))

Using device: cuda
Loading Teacher model (ResNet18)...
Initializing Student model (AudioCNN)...
Starting distillation training...


Epoch 1/50: 100%|██████████| 36/36 [00:32<00:00,  1.11it/s]


Epoch 1/50 - Distillation Loss: 11.8160


Epoch 2/50: 100%|██████████| 36/36 [00:06<00:00,  5.76it/s]


Epoch 2/50 - Distillation Loss: 7.8196


Epoch 3/50: 100%|██████████| 36/36 [00:05<00:00,  6.04it/s]


Epoch 3/50 - Distillation Loss: 7.7996


Epoch 4/50: 100%|██████████| 36/36 [00:06<00:00,  5.93it/s]


Epoch 4/50 - Distillation Loss: 7.7833


Epoch 5/50: 100%|██████████| 36/36 [00:06<00:00,  5.88it/s]


Epoch 5/50 - Distillation Loss: 7.7713


Epoch 6/50: 100%|██████████| 36/36 [00:06<00:00,  5.99it/s]


Epoch 6/50 - Distillation Loss: 7.7612


Epoch 7/50: 100%|██████████| 36/36 [00:05<00:00,  6.08it/s]


Epoch 7/50 - Distillation Loss: 7.7532


Epoch 8/50: 100%|██████████| 36/36 [00:05<00:00,  6.03it/s]


Epoch 8/50 - Distillation Loss: 7.7477


Epoch 9/50: 100%|██████████| 36/36 [00:05<00:00,  6.03it/s]


Epoch 9/50 - Distillation Loss: 7.7430


Epoch 10/50: 100%|██████████| 36/36 [00:05<00:00,  6.03it/s]


Epoch 10/50 - Distillation Loss: 7.7399


Epoch 11/50: 100%|██████████| 36/36 [00:05<00:00,  6.23it/s]


Epoch 11/50 - Distillation Loss: 7.7370


Epoch 12/50: 100%|██████████| 36/36 [00:05<00:00,  6.00it/s]


Epoch 12/50 - Distillation Loss: 7.7351


Epoch 13/50: 100%|██████████| 36/36 [00:05<00:00,  6.21it/s]


Epoch 13/50 - Distillation Loss: 7.7330


Epoch 14/50: 100%|██████████| 36/36 [00:06<00:00,  5.67it/s]


Epoch 14/50 - Distillation Loss: 7.7333


Epoch 15/50: 100%|██████████| 36/36 [00:06<00:00,  5.92it/s]


Epoch 15/50 - Distillation Loss: 7.7316


Epoch 16/50: 100%|██████████| 36/36 [00:06<00:00,  5.45it/s]


Epoch 16/50 - Distillation Loss: 7.7313


Epoch 17/50: 100%|██████████| 36/36 [00:06<00:00,  5.49it/s]


Epoch 17/50 - Distillation Loss: 7.7308


Epoch 18/50: 100%|██████████| 36/36 [00:06<00:00,  5.82it/s]


Epoch 18/50 - Distillation Loss: 7.7301


Epoch 19/50: 100%|██████████| 36/36 [00:05<00:00,  6.14it/s]


Epoch 19/50 - Distillation Loss: 7.7298


Epoch 20/50: 100%|██████████| 36/36 [00:05<00:00,  6.15it/s]


Epoch 20/50 - Distillation Loss: 7.7289


Epoch 21/50: 100%|██████████| 36/36 [00:06<00:00,  5.33it/s]


Epoch 21/50 - Distillation Loss: 7.7292


Epoch 22/50: 100%|██████████| 36/36 [00:05<00:00,  6.12it/s]


Epoch 22/50 - Distillation Loss: 7.7291


Epoch 23/50: 100%|██████████| 36/36 [00:06<00:00,  5.94it/s]


Epoch 23/50 - Distillation Loss: 7.7288


Epoch 24/50: 100%|██████████| 36/36 [00:06<00:00,  5.96it/s]


Epoch 24/50 - Distillation Loss: 7.7292


Epoch 25/50: 100%|██████████| 36/36 [00:06<00:00,  5.74it/s]


Epoch 25/50 - Distillation Loss: 7.7290


Epoch 26/50: 100%|██████████| 36/36 [00:06<00:00,  5.97it/s]


Epoch 26/50 - Distillation Loss: 7.7293


Epoch 27/50: 100%|██████████| 36/36 [00:06<00:00,  5.86it/s]


Epoch 27/50 - Distillation Loss: 7.7284


Epoch 28/50: 100%|██████████| 36/36 [00:06<00:00,  5.84it/s]


Epoch 28/50 - Distillation Loss: 7.7292


Epoch 29/50: 100%|██████████| 36/36 [00:06<00:00,  5.91it/s]


Epoch 29/50 - Distillation Loss: 7.7286


Epoch 30/50: 100%|██████████| 36/36 [00:05<00:00,  6.09it/s]


Epoch 30/50 - Distillation Loss: 7.7295


Epoch 31/50: 100%|██████████| 36/36 [00:05<00:00,  6.01it/s]


Epoch 31/50 - Distillation Loss: 7.7290


Epoch 32/50: 100%|██████████| 36/36 [00:06<00:00,  5.87it/s]


Epoch 32/50 - Distillation Loss: 7.7285


Epoch 33/50: 100%|██████████| 36/36 [00:06<00:00,  5.68it/s]


Epoch 33/50 - Distillation Loss: 7.7287


Epoch 34/50: 100%|██████████| 36/36 [00:06<00:00,  5.86it/s]


Epoch 34/50 - Distillation Loss: 7.7289


Epoch 35/50: 100%|██████████| 36/36 [00:06<00:00,  5.72it/s]


Epoch 35/50 - Distillation Loss: 7.7288


Epoch 36/50: 100%|██████████| 36/36 [00:06<00:00,  5.53it/s]


Epoch 36/50 - Distillation Loss: 7.7288


Epoch 37/50: 100%|██████████| 36/36 [00:06<00:00,  5.70it/s]


Epoch 37/50 - Distillation Loss: 7.7291


Epoch 38/50: 100%|██████████| 36/36 [00:06<00:00,  5.82it/s]


Epoch 38/50 - Distillation Loss: 7.7290


Epoch 39/50: 100%|██████████| 36/36 [00:05<00:00,  6.20it/s]


Epoch 39/50 - Distillation Loss: 7.7289


Epoch 40/50: 100%|██████████| 36/36 [00:06<00:00,  5.89it/s]


Epoch 40/50 - Distillation Loss: 7.7283


Epoch 41/50: 100%|██████████| 36/36 [00:05<00:00,  6.24it/s]


Epoch 41/50 - Distillation Loss: 7.7287


Epoch 42/50: 100%|██████████| 36/36 [00:05<00:00,  6.02it/s]


Epoch 42/50 - Distillation Loss: 7.7286


Epoch 43/50: 100%|██████████| 36/36 [00:05<00:00,  6.21it/s]


Epoch 43/50 - Distillation Loss: 7.7293


Epoch 44/50: 100%|██████████| 36/36 [00:06<00:00,  5.95it/s]


Epoch 44/50 - Distillation Loss: 7.7284


Epoch 45/50: 100%|██████████| 36/36 [00:06<00:00,  5.96it/s]


Epoch 45/50 - Distillation Loss: 7.7294


Epoch 46/50: 100%|██████████| 36/36 [00:06<00:00,  5.85it/s]


Epoch 46/50 - Distillation Loss: 7.7289


Epoch 47/50: 100%|██████████| 36/36 [00:06<00:00,  5.90it/s]


Epoch 47/50 - Distillation Loss: 7.7284


Epoch 48/50: 100%|██████████| 36/36 [00:05<00:00,  6.21it/s]


Epoch 48/50 - Distillation Loss: 7.7289


Epoch 49/50: 100%|██████████| 36/36 [00:05<00:00,  6.07it/s]


Epoch 49/50 - Distillation Loss: 7.7289


Epoch 50/50: 100%|██████████| 36/36 [00:05<00:00,  6.14it/s]


Epoch 50/50 - Distillation Loss: 7.7288

--- FINAL EVALUATION OF DISTILLED STUDENT MODEL ---
Student Model Accuracy: 13.19%

Classification Report:
              precision    recall  f1-score   support

     neutral       0.00      0.00      0.00        19
        calm       0.00      0.00      0.00        38
       happy       0.00      0.00      0.00        38
         sad       0.13      1.00      0.23        38
       angry       0.00      0.00      0.00        39
     fearful       0.00      0.00      0.00        39
     disgust       0.00      0.00      0.00        38
    surprise       0.00      0.00      0.00        39

    accuracy                           0.13       288
   macro avg       0.02      0.12      0.03       288
weighted avg       0.02      0.13      0.03       288

