# Reconnaissance des Émotions Audio avec EmoDB et ResNet-50

Ce notebook implémente un pipeline complet pour la reconnaissance des émotions à partir de données audio du dataset EmoDB. Les étapes comprennent :
1. Téléchargement et préparation du dataset EmoDB.
2. Prétraitement audio : rééchantillonnage, padding.
3. Extraction de caractéristiques : conversion en mel-spectrogrammes.
4. Normalisation des mel-spectrogrammes.
5. Configuration et entraînement d'un modèle ResNet-50 pré-entraîné.
6. Évaluation du modèle.

## 1. Installation des dépendances

In [1]:
!pip install librosa gdown soundfile scikit-learn torch torchvision

Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch)
  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch)
  Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch)
  Downloading nvidia_nvjitlink_cu12-1

## 2. Imports et Configuration Initiale

In [2]:
import os
import gdown
import zipfile
import librosa
import soundfile as sf
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, random_split
import torchvision.models as models
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

# Configuration pour la reproductibilité
torch.manual_seed(42)
np.random.seed(42)

# Déterminer le périphérique (GPU si disponible, sinon CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


## 3. Chargement et Préparation du Dataset EmoDB

In [3]:
# Étape 1 : Chargement des données
# Télécharger le dataset EmoDB et organiser les fichiers
if not os.path.exists('emodb/wav'): # Vérifie si le dossier final existe
    print("Downloading EmoDB dataset...")
    url = 'http://emodb.bilderbar.info/download/download.zip'
    output_zip = 'emodb.zip'
    gdown.download(url, output_zip, quiet=False)
    
    # Créer le répertoire de base 'emodb' s'il n'existe pas
    if not os.path.exists('emodb'):
        os.makedirs('emodb')
        
    # Décompresser le fichier zip
    with zipfile.ZipFile(output_zip, 'r') as zip_ref:
        zip_ref.extractall('emodb_extracted') # Extraire dans un dossier temporaire
    
    # EmoDB extrait les fichiers dans un sous-dossier 'wav', nous le déplaçons/renommons pour correspondre à la structure attendue
    source_wav_dir = os.path.join('emodb_extracted', 'wav')
    target_wav_dir = os.path.join('emodb', 'wav')
    
    if os.path.exists(source_wav_dir):
        if not os.path.exists(target_wav_dir):
             os.makedirs(os.path.dirname(target_wav_dir), exist_ok=True) # Assure que 'emodb' existe
        os.rename(source_wav_dir, target_wav_dir)
        print(f"Moved '{source_wav_dir}' to '{target_wav_dir}'")
    else:
        print(f"Error: Source WAV directory '{source_wav_dir}' not found after extraction.")
        
    # Nettoyer les dossiers intermédiaires et le zip
    if os.path.exists('emodb_extracted'):
        # Vérifier s'il reste des fichiers/dossiers inattendus avant de supprimer
        remaining_items = os.listdir('emodb_extracted')
        if not remaining_items or (len(remaining_items) == 1 and remaining_items[0] == 'wav' and not os.path.exists(source_wav_dir)):
            os.rmdir('emodb_extracted') # Supprime si vide ou si 'wav' a été déplacé
        elif os.path.isdir(os.path.join('emodb_extracted', 'lab')) and len(remaining_items) <=2 : # Cas typique avec 'lab'
            import shutil
            shutil.rmtree('emodb_extracted') # Supprime le dossier et son contenu (comme 'lab')
            print("Cleaned up 'emodb_extracted' directory.")
        else:
            print(f"Warning: Unexpected files/folders in emodb_extracted: {remaining_items}. Manual cleanup might be needed.")

    if os.path.exists(output_zip):
        os.remove(output_zip)
else:
    print("EmoDB dataset (emodb/wav) already found.")

# Chemin vers les fichiers WAV
wav_dir = 'emodb/wav'
if not os.path.exists(wav_dir) or not os.listdir(wav_dir):
    print(f"Error: WAV directory '{wav_dir}' is empty or does not exist. Please check the download and extraction steps.")
    # exit() # En Colab, on évite exit() pour ne pas tuer le kernel, mais on signale l'erreur
else:
    wav_files = [f for f in os.listdir(wav_dir) if f.endswith('.wav')]
    print(f"Nombre de fichiers WAV : {len(wav_files)}")

Downloading EmoDB dataset...


Downloading...
From: http://emodb.bilderbar.info/download/download.zip
To: /kaggle/working/emodb.zip
100%|██████████| 40.6M/40.6M [01:07<00:00, 598kB/s]


Moved 'emodb_extracted/wav' to 'emodb/wav'
Nombre de fichiers WAV : 535


## 4. Prétraitement Audio : Rééchantillonnage

In [4]:
new_sr = 22050 # Fréquence d'échantillonnage cible standard
resampled_dir = 'emodb/resampled'
os.makedirs(resampled_dir, exist_ok=True)

print("Resampling files...")
if 'wav_files' in locals(): # S'assurer que wav_files est défini
    for wav_file in wav_files:
        file_path = os.path.join(wav_dir, wav_file)
        try:
            audio, sr = librosa.load(file_path, sr=None)
            audio_resampled = librosa.resample(y=audio, orig_sr=sr, target_sr=new_sr)
            sf.write(os.path.join(resampled_dir, wav_file), audio_resampled, new_sr)
        except Exception as e:
            print(f"Error processing {wav_file} during resampling: {e}")
    print("Resampling complete.")
else:
    print("wav_files not defined. Skipping resampling. Check previous cell for errors.")

Resampling files...
Resampling complete.


## 5. Prétraitement Audio : Padding

In [5]:
target_duration = 10  # secondes
target_samples = target_duration * new_sr
padded_dir = 'emodb/padded'
os.makedirs(padded_dir, exist_ok=True)

print("Padding files...")
if 'wav_files' in locals(): # S'assurer que wav_files est défini
    for wav_file in wav_files: # Assurez-vous que wav_files contient les noms de base des fichiers
        file_path = os.path.join(resampled_dir, wav_file)
        try:
            audio, sr = librosa.load(file_path, sr=new_sr)
            current_samples = len(audio)
            if current_samples < target_samples:
                padding_needed = target_samples - current_samples
                padding = np.zeros(padding_needed)
                audio_padded = np.concatenate((audio, padding))
            else:
                audio_padded = audio[:target_samples]
            sf.write(os.path.join(padded_dir, wav_file), audio_padded, new_sr)
        except Exception as e:
            print(f"Error processing {wav_file} during padding: {e}")
    print("Padding complete.")
else:
    print("wav_files not defined. Skipping padding. Check previous cells for errors.")

Padding files...
Padding complete.


## 6. Extraction de Caractéristiques : Mel-Spectrogrammes

In [6]:
hop_length = 256
win_length = 1024
n_fft = win_length 
n_mels = 80
mel_dir = 'emodb/mel_spectrograms'
os.makedirs(mel_dir, exist_ok=True)

print("Converting to mel-spectrograms...")
if 'wav_files' in locals(): # S'assurer que wav_files est défini
    for wav_file in wav_files:
        file_path = os.path.join(padded_dir, wav_file)
        try:
            audio, sr = librosa.load(file_path, sr=new_sr)
            mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr, n_fft=n_fft, hop_length=hop_length, win_length=win_length, n_mels=n_mels)
            mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)
            np.save(os.path.join(mel_dir, wav_file.replace('.wav', '.npy')), mel_spec_db)
        except Exception as e:
            print(f"Error processing {wav_file} during mel-spectrogram conversion: {e}")
    print("Mel-spectrogram conversion complete.")
else:
    print("wav_files not defined. Skipping mel-spectrogram conversion. Check previous cells for errors.")

Converting to mel-spectrograms...
Mel-spectrogram conversion complete.


## 7. Normalisation des Mel-Spectrogrammes (Z-score)

In [7]:
normalized_dir = 'emodb/normalized_mel'
os.makedirs(normalized_dir, exist_ok=True)

print("Normalizing mel-spectrograms...")
if os.path.exists(mel_dir):
    mel_npy_files = [f for f in os.listdir(mel_dir) if f.endswith('.npy')]
    if mel_npy_files:
        for mel_file in mel_npy_files:
            file_path = os.path.join(mel_dir, mel_file)
            try:
                mel_spec = np.load(file_path)
                mean = np.mean(mel_spec)
                std = np.std(mel_spec)
                if std == 0: # Éviter la division par zéro
                    mel_spec_normalized = mel_spec - mean
                else:
                    mel_spec_normalized = (mel_spec - mean) / std
                np.save(os.path.join(normalized_dir, mel_file), mel_spec_normalized)
            except Exception as e:
                print(f"Error processing {mel_file} during normalization: {e}")
        print("Normalization complete.")
    else:
        print(f"No .npy files found in {mel_dir}. Skipping normalization.")
else:
    print(f"Mel directory {mel_dir} not found. Skipping normalization.")

Normalizing mel-spectrograms...
Normalization complete.


## 8. Vérification et Préparation des Données pour le Modèle

In [8]:
if os.path.exists(normalized_dir) and os.listdir(normalized_dir):
    sample_file = os.listdir(normalized_dir)[0]
    sample_mel = np.load(os.path.join(normalized_dir, sample_file))
    print(f"Forme du mel-spectrogramme normalisé ({sample_file}) : {sample_mel.shape}")
else:
    print(f"Warning: No normalized mel-spectrograms found in {normalized_dir}.")

# Définir les émotions et le mappage pour EmoDB
# Codes EmoDB: W(Wut/Anger), L(Langeweile/Boredom), E(Ekel/Disgust), A(Angst/Fear), F(Freude/Happiness), T(Trauer/Sadness), N(Neutral)
emotions_map = {
    'W': 'angry',
    'L': 'boredom',
    'E': 'disgust',
    'A': 'fear',
    'F': 'happy',
    'T': 'sad',
    'N': 'neutral'
}
# Liste ordonnée des émotions pour l'encodeur et le modèle
emotion_list = ['neutral', 'happy', 'sad', 'angry', 'fear', 'disgust', 'boredom']

Forme du mel-spectrogramme normalisé (03a02Ta.npy) : (80, 862)


## 9. Chargement des Données et Encodage des Étiquettes

In [9]:
data = []
labels_str = [] # Stocker les étiquettes textuelles
print("Loading data for model...")
if os.path.exists(normalized_dir):
    normalized_npy_files = [f for f in os.listdir(normalized_dir) if f.endswith('.npy')]
    if normalized_npy_files:
        for file in normalized_npy_files:
            # Le nom du fichier EmoDB est structuré, par exemple: 03a01Fa.wav
            # Le 5ème caractère (index 5 pour '03a01Fa.npy') est le code de l'émotion
            if len(file) > 5: # Vérification de base de la longueur du nom de fichier
                emotion_code = file[5] 
                emotion = emotions_map.get(emotion_code)

                if emotion:
                    mel_spec = np.load(os.path.join(normalized_dir, file))
                    data.append(mel_spec)
                    labels_str.append(emotion)
                else:
                    print(f"Skipped file: {file} - Unknown emotion code: {emotion_code}")
            else:
                print(f"Skipped file: {file} - Filename too short to extract emotion code.")
        print(f"Total number of loaded files for model: {len(data)}")
    else:
        print(f"No .npy files found in {normalized_dir} for model loading.")
else:
    print(f"Normalized directory {normalized_dir} not found. Cannot load data for model.")

if not data:
    print("Error: No data loaded for the model. Further steps might fail.")
    # exit() # En Colab, on évite exit()
else:
    # Encoder les étiquettes
    label_encoder = LabelEncoder()
    label_encoder.fit(emotion_list) # Adapter l'encodeur à la liste complète des émotions
    labels_encoded = label_encoder.transform(labels_str)

Loading data for model...
Total number of loaded files for model: 535


## 10. Définition du Dataset Personnalisé PyTorch

In [10]:
class EmoDBDataset(Dataset):
    def __init__(self, data, labels_encoded):
        self.data = data
        self.labels = labels_encoded # Stocker les étiquettes encodées numériquement

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        mel_spec = torch.tensor(self.data[idx], dtype=torch.float32).unsqueeze(0)  # Ajouter une dimension de canal
        label = torch.tensor(self.labels[idx], dtype=torch.long) # Retourner l'indice de classe
        return mel_spec, label

## 11. Création des Datasets et DataLoaders

In [11]:
if 'data' in locals() and len(data) > 0:
    dataset = EmoDBDataset(data, labels_encoded)

    # Diviser les données en ensembles d'entraînement et de validation
    train_size = int(0.8 * len(dataset))
    val_size = len(dataset) - train_size
    
    if train_size == 0 or val_size == 0:
        print(f"Error: Not enough data to split. Train size: {train_size}, Val size: {val_size}. Loaded: {len(dataset)}")
        # exit() # En Colab, on évite exit()
    else:
        train_dataset, val_dataset = random_split(dataset, [train_size, val_size], generator=torch.Generator().manual_seed(42))
        # Créer des DataLoaders
        train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
        print(f"Dataset split: Train {len(train_dataset)}, Validation {len(val_dataset)}")
else:
    print("Data not available for creating datasets. Check previous cells.")

Dataset split: Train 428, Validation 107


## 12. Configuration du Modèle ResNet-50

In [12]:
# Utiliser weights=models.ResNet50_Weights.IMAGENET1K_V1 pour les versions plus récentes de torchvision
try:
    model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
    print("Using ResNet50 with ResNet50_Weights.IMAGENET1K_V1")
except TypeError:
    print("Using deprecated 'pretrained=True' for ResNet50.")
    model = models.resnet50(pretrained=True)

model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)  # Adapter pour 1 canal (mel-spectrogrammes)
model.fc = nn.Linear(model.fc.in_features, len(emotion_list))  # Adapter pour le nombre d'émotions
model = model.to(device) # Déplacer le modèle sur le périphérique

print("Model configured:")
print(f"  Input channels for conv1: {model.conv1.in_channels}")
print(f"  Output features for fc: {model.fc.out_features}")

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 131MB/s]


Using ResNet50 with ResNet50_Weights.IMAGENET1K_V1
Model configured:
  Input channels for conv1: 1
  Output features for fc: 7


## 13. Définition de la Fonction de Perte et de l'Optimiseur

In [13]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)
print("Loss function and optimizer defined.")

Loss function and optimizer defined.


## 14. Fonction d'Entraînement du Modèle

In [14]:
def train_model_fn(model, train_loader, val_loader, criterion, optimizer, epochs=10):
    print(f"\nStarting training for {epochs} epochs on {device}...")
    for epoch in range(epochs):
        model.train()
        train_loss = 0.0
        for mel_specs, labels in train_loader:
            mel_specs, labels = mel_specs.to(device), labels.to(device) # Déplacer les données
            
            optimizer.zero_grad()
            outputs = model(mel_specs)
            loss = criterion(outputs, labels) # labels sont maintenant des indices de classe
            loss.backward()
            optimizer.step()
            train_loss += loss.item() * mel_specs.size(0)

        # Validation
        model.eval()
        val_loss = 0.0
        correct = 0
        total = 0
        with torch.no_grad():
            for mel_specs, labels in val_loader:
                mel_specs, labels = mel_specs.to(device), labels.to(device)
                outputs = model(mel_specs)
                loss = criterion(outputs, labels)
                val_loss += loss.item() * mel_specs.size(0)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
        
        avg_train_loss = train_loss / len(train_loader.dataset)
        avg_val_loss = val_loss / len(val_loader.dataset)
        val_accuracy = 100 * correct / total

        print(f"Epoch {epoch+1}/{epochs}, Train Loss: {avg_train_loss:.4f}, Val Loss: {avg_val_loss:.4f}, Val Accuracy: {val_accuracy:.2f}%")
    print("Training finished.")

## 15. Exécution de l'Entraînement

In [15]:
if 'train_loader' in locals() and 'val_loader' in locals():
    train_model_fn(model, train_loader, val_loader, criterion, optimizer, epochs=800) # epochs=20 est un exemple, ajustez
else:
    print("train_loader or val_loader not defined. Skipping training. Check previous cells for errors.")


Starting training for 800 epochs on cuda...
Epoch 1/800, Train Loss: 1.7092, Val Loss: 1.8468, Val Accuracy: 26.17%
Epoch 2/800, Train Loss: 1.1117, Val Loss: 1.6072, Val Accuracy: 32.71%
Epoch 3/800, Train Loss: 0.5636, Val Loss: 1.7898, Val Accuracy: 38.32%
Epoch 4/800, Train Loss: 0.2531, Val Loss: 1.1773, Val Accuracy: 61.68%
Epoch 5/800, Train Loss: 0.1245, Val Loss: 0.9882, Val Accuracy: 64.49%
Epoch 6/800, Train Loss: 0.1287, Val Loss: 0.6434, Val Accuracy: 79.44%
Epoch 7/800, Train Loss: 0.0880, Val Loss: 0.9480, Val Accuracy: 72.90%
Epoch 8/800, Train Loss: 0.1043, Val Loss: 1.1125, Val Accuracy: 67.29%
Epoch 9/800, Train Loss: 0.0761, Val Loss: 1.6932, Val Accuracy: 53.27%
Epoch 10/800, Train Loss: 0.0896, Val Loss: 0.9366, Val Accuracy: 73.83%
Epoch 11/800, Train Loss: 0.1076, Val Loss: 1.2620, Val Accuracy: 65.42%
Epoch 12/800, Train Loss: 0.1468, Val Loss: 1.2042, Val Accuracy: 64.49%
Epoch 13/800, Train Loss: 0.0855, Val Loss: 0.8908, Val Accuracy: 74.77%
Epoch 14/800, T

## 16. Fonction d'Évaluation du Modèle

In [16]:
def evaluate_model_fn(model, val_loader, label_encoder):
    print("\nEvaluating model...")
    model.eval()
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for mel_specs, labels in val_loader:
            mel_specs, labels = mel_specs.to(device), labels.to(device)
            outputs = model(mel_specs)
            preds = torch.argmax(outputs, dim=1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    if not all_labels or not all_preds:
        print("No labels or predictions to evaluate.")
        return
        
    accuracy = accuracy_score(all_labels, all_preds)
    print(f"Final Validation Accuracy: {accuracy * 100:.2f}%")
    
    # Afficher les prédictions pour quelques exemples (optionnel)
    if len(all_labels) > 0 and len(all_preds) > 0 and label_encoder is not None:
        print("\nSample predictions (true vs predicted):")
        for i in range(min(10, len(all_labels))):
            try:
                true_label_str = label_encoder.inverse_transform([all_labels[i]])[0]
                pred_label_str = label_encoder.inverse_transform([all_preds[i]])[0]
                print(f"Sample {i+1}: True='{true_label_str}', Predicted='{pred_label_str}'")
            except IndexError as e:
                print(f"Error decoding labels for sample {i+1}: {e}. Raw labels: True={all_labels[i]}, Pred={all_preds[i]}")
            except Exception as e:
                 print(f"An unexpected error occurred while decoding labels: {e}")

## 17. Exécution de l'Évaluation

In [17]:
if 'val_loader' in locals() and 'label_encoder' in locals():
    evaluate_model_fn(model, val_loader, label_encoder)
else:
    print("val_loader or label_encoder not defined. Skipping evaluation.")

print("\nNotebook execution finished.")


Evaluating model...
Final Validation Accuracy: 76.64%

Sample predictions (true vs predicted):
Sample 1: True='angry', Predicted='angry'
Sample 2: True='fear', Predicted='fear'
Sample 3: True='boredom', Predicted='boredom'
Sample 4: True='fear', Predicted='fear'
Sample 5: True='angry', Predicted='angry'
Sample 6: True='angry', Predicted='angry'
Sample 7: True='neutral', Predicted='neutral'
Sample 8: True='happy', Predicted='happy'
Sample 9: True='angry', Predicted='angry'
Sample 10: True='sad', Predicted='sad'

Notebook execution finished.
