<a href="https://colab.research.google.com/github/elijkon/DeepLearning_MiniHackathon/blob/main/spectrograms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
Each of the 1440 files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 03-01-06-01-02-01-12.wav). These identifiers define the stimulus characteristics:

Filename identifiers

1) Modality (01 = full-AV, 02 = video-only, 03 = audio-only).

2) Vocal channel (01 = speech, 02 = song).

3) Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).

4) Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.

5) Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").

6) Repetition (01 = 1st repetition, 02 = 2nd repetition).

7) Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Filename example: 03-01-06-01-02-01-12.wav

Audio-only (03)
Speech (01)
Fearful (06)
Normal intensity (01)
Statement "dogs" (02)
1st Repetition (01)
12th Actor (12)
Female, as the actor ID number is even.
How to cite the RAVDESS

Academic citation
If you use the RAVDESS in an academic publication, please use the following citation: Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and
Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. https://doi.org/10.1371/journal.pone.0196391.

All other attributions
If you use the RAVDESS in a form other than an academic publication, such as in a blog post, school project, or non-commercial product,
please use the following attribution: "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)" by Livingstone & Russo is licensed under CC BY-NA-SC 4.0.
"""

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:

# STEP 2: Import libraries
import os
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm
from sklearn.model_selection import train_test_split

#
# STEP 3: Configure paths
input_dir = "/content/drive/MyDrive/school classes/deep learning/data/audio_speech_actors_01-24"  # unzipped RAVDESS folder (contains Actor_01, Actor_02, ...)
output_dir = "/content/drive/MyDrive/school classes/deep learning/data/spectrogram_dataset"  # where to save spectrograms

# STEP 4: Define emotion mapping
emotion_map = {
    1: "neutral",
    2: "calm",
    3: "happy",
    4: "sad",
    5: "angry",
    6: "fearful",
    7: "disgust",
    8: "surprised"
}


def wav_to_melspectrogram(wav_path, save_path):
    y, sr = librosa.load(wav_path, sr=None)  # load audio file
    mel_spec = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)  # mel spectrogram
    mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)  # convert to decibels

    # Save as image
    plt.figure(figsize=(2.5, 2.5))
    librosa.display.specshow(mel_spec_db, sr=sr, cmap="magma")
    plt.axis("off")
    plt.savefig(save_path, bbox_inches="tight", pad_inches=0)
    plt.close()

#
# STEP 6: Collect all wav files grouped by emotion
#
data_by_emotion = {emotion: [] for emotion in emotion_map.values()}

# os.walk → goes into Actor_01, Actor_02, etc.
for root, dirs, files in os.walk(input_dir):
    for file in files:
        if file.endswith(".wav"):
            parts = file.split("-")                 # e.g. 03-01-06-01-02-01-12.wav
            emotion_code = int(parts[2])            # 3rd number = emotion
            emotion_label = emotion_map[emotion_code]
            wav_path = os.path.join(root, file)     # full path including actor folder
            data_by_emotion[emotion_label].append((wav_path, file))


# STEP 7: Split into train/val/test and save
split_ratios = {"train": 0.7, "val": 0.15, "test": 0.15}

for emotion, files in data_by_emotion.items():
    # Split into train and test
    train_files, test_files = train_test_split(files, test_size=split_ratios["test"], random_state=42)
    # Split train further into train and val
    train_files, val_files = train_test_split(
        train_files,
        test_size=split_ratios["val"] / (1 - split_ratios["test"]),
        random_state=42
    )

    # Save splits into folders
    for split_name, split_files in zip(["train", "val", "test"], [train_files, val_files, test_files]):
        split_folder = os.path.join(output_dir, split_name, emotion)
        os.makedirs(split_folder, exist_ok=True)

        for wav_path, filename in tqdm(split_files, desc=f"{emotion} -> {split_name}"):
            save_path = os.path.join(split_folder, filename.replace(".wav", ".png"))
            wav_to_melspectrogram(wav_path, save_path)


print("Spectrograms saved in train/val/test structure at:", output_dir)


neutral -> train: 100%|██████████| 66/66 [00:55<00:00,  1.19it/s]
neutral -> val: 100%|██████████| 15/15 [00:08<00:00,  1.71it/s]
neutral -> test: 100%|██████████| 15/15 [00:09<00:00,  1.65it/s]
calm -> train: 100%|██████████| 134/134 [01:23<00:00,  1.60it/s]
calm -> val: 100%|██████████| 29/29 [00:18<00:00,  1.53it/s]
calm -> test: 100%|██████████| 29/29 [00:17<00:00,  1.64it/s]
happy -> train: 100%|██████████| 134/134 [01:23<00:00,  1.61it/s]
happy -> val: 100%|██████████| 29/29 [00:18<00:00,  1.53it/s]
happy -> test: 100%|██████████| 29/29 [00:18<00:00,  1.57it/s]
sad -> train: 100%|██████████| 134/134 [01:20<00:00,  1.66it/s]
sad -> val: 100%|██████████| 29/29 [00:18<00:00,  1.59it/s]
sad -> test: 100%|██████████| 29/29 [00:18<00:00,  1.55it/s]
angry -> train: 100%|██████████| 134/134 [01:25<00:00,  1.58it/s]
angry -> val: 100%|██████████| 29/29 [00:18<00:00,  1.57it/s]
angry -> test: 100%|██████████| 29/29 [00:18<00:00,  1.56it/s]
fearful -> train: 100%|██████████| 134/134 [01:21<

✅ Done! Spectrograms saved in train/val/test structure at: /content/drive/MyDrive/school classes/deep learning/data/spectrogram_dataset



