# Notebook 1: Audio to Spectrogram
This notebook converts raw audio files from the RAVDESS dataset into mel spectrogram images, which are more suitable for machine learning models.
These spectrograms capture both the time and frequency components of speech and form the input for the CNN used in emotion classification.

## 1. 📦 Imports and Setup
We start by importing the required libraries. These include:
- `os` and `pandas` for file and data handling
- `librosa` for audio processing and feature extraction
- `matplotlib` and `cv2` for visualizing and saving spectrograms
- `warnings` and `tqdm` to keep the process clean and track progress

In [6]:
# Cell 1: Imports
import os
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")


## 2. 📁 Dataset Directory Configuration
Define the input and output folders:
- `INPUT_DIR` should point to the base folder containing subfolders like `Actor_01`, `Actor_02`, etc.
- `OUTPUT_DIR` is where the generated mel spectrogram images will be saved.
This ensures we keep raw audio and processed images separate for organization and reusability.

In [7]:
# Cell 2: Directory Setup
INPUT_DIR = "dataset"
OUTPUT_DIR = "spectrograms"

os.makedirs(OUTPUT_DIR, exist_ok=True)


## 3. 😃 Emotion Labels
The filenames in the RAVDESS dataset encode metadata, including the emotion class using a 2-digit code.
We use a mapping dictionary (`emotion_map`) to translate these numerical codes into human-readable labels like 'happy', 'sad', etc.
These labels will serve as our targets (Y) when training the model.

In [8]:
# Cell 3: Emotion Label Mapping (RAVDESS standard)
emotion_map = {
    '01': 'neutral',
    '02': 'calm',
    '03': 'happy',
    '04': 'sad',
    '05': 'angry',
    '06': 'fearful',
    '07': 'disgust',
    '08': 'surprised'
}


## 4. 🔊 Function to Generate Mel Spectrograms
We define a reusable function to:
- Load audio using `librosa`
- Generate a mel spectrogram (a perceptually motivated transformation)
- Convert it to decibels and save it as a clean PNG image using `matplotlib`
We keep the figure silent (no axes, titles) to ensure model input consistency.

In [9]:
# Cell 4: Helper Function to Create and Save Mel Spectrograms
def save_mel_spectrogram(audio_path, save_path):
    y, sr = librosa.load(audio_path, sr=22050)
    mel = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
    mel_db = librosa.power_to_db(mel, ref=np.max)

    plt.figure(figsize=(2.24, 2.24), dpi=100)
    librosa.display.specshow(mel_db, sr=sr, hop_length=512)
    plt.axis('off')
    plt.tight_layout()
    plt.savefig(save_path, bbox_inches='tight', pad_inches=0)
    plt.close()


## 5. 🔄 Process and Save Spectrogram Metadata
We walk through each actor folder recursively and for each `.wav` file:
- Generate a mel spectrogram image
- Save it with a descriptive filename
- Record the image path, emotion label, and actor ID in a DataFrame
This metadata is crucial for model training, evaluation, and reproducibility.

In [10]:
# Cell 5: Process All Audio Files from Subdirectories and Save Metadata
data = []

for root, dirs, files in os.walk(INPUT_DIR):
    for filename in files:
        print(f"Processing {filename} in {root}")
        if filename.endswith(".wav"):
            emotion_id = filename.split("-")[2]  # e.g., '03'
            emotion_label = emotion_map.get(emotion_id, "unknown")

            audio_path = os.path.join(root, filename)
            
            # Create a flat image filename for easier access
            relative_actor = os.path.basename(root)
            image_filename = f"{relative_actor}_{filename.replace('.wav', '.png')}"
            save_path = os.path.join(OUTPUT_DIR, image_filename)
            
            save_mel_spectrogram(audio_path, save_path)

            data.append({
                "file": filename,
                "image_path": save_path,
                "label": emotion_label,
                "actor": relative_actor
            })

df = pd.DataFrame(data)
df.to_csv("spectrogram_metadata.csv", index=False)
df.head()

Processing .DS_Store in dataset
Processing 03-01-05-01-02-01-16.wav in dataset/Actor_16
Processing 03-01-06-01-02-02-16.wav in dataset/Actor_16
Processing 03-01-06-02-01-02-16.wav in dataset/Actor_16
Processing 03-01-05-02-01-01-16.wav in dataset/Actor_16
Processing 03-01-07-01-01-01-16.wav in dataset/Actor_16
Processing 03-01-04-01-01-02-16.wav in dataset/Actor_16
Processing 03-01-04-02-02-02-16.wav in dataset/Actor_16
Processing 03-01-07-02-02-01-16.wav in dataset/Actor_16
Processing 03-01-08-02-02-01-16.wav in dataset/Actor_16
Processing 03-01-08-01-01-01-16.wav in dataset/Actor_16
Processing 03-01-03-02-02-02-16.wav in dataset/Actor_16
Processing 03-01-03-01-01-02-16.wav in dataset/Actor_16
Processing 03-01-02-02-01-01-16.wav in dataset/Actor_16
Processing 03-01-01-01-02-02-16.wav in dataset/Actor_16
Processing 03-01-02-01-02-01-16.wav in dataset/Actor_16
Processing 03-01-03-02-01-01-16.wav in dataset/Actor_16
Processing 03-01-03-01-02-01-16.wav in dataset/Actor_16
Processing 03-01

Unnamed: 0,file,image_path,label,actor
0,03-01-05-01-02-01-16.wav,spectrograms/Actor_16_03-01-05-01-02-01-16.png,angry,Actor_16
1,03-01-06-01-02-02-16.wav,spectrograms/Actor_16_03-01-06-01-02-02-16.png,fearful,Actor_16
2,03-01-06-02-01-02-16.wav,spectrograms/Actor_16_03-01-06-02-01-02-16.png,fearful,Actor_16
3,03-01-05-02-01-01-16.wav,spectrograms/Actor_16_03-01-05-02-01-01-16.png,angry,Actor_16
4,03-01-07-01-01-01-16.wav,spectrograms/Actor_16_03-01-07-01-01-01-16.png,disgust,Actor_16
