<a href="https://colab.research.google.com/github/niko-vaas/tutorial-notebooks/blob/main/Untitled2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let's start by allowing access to our Google Drive, where I put the [RAVDESS dataset](https://zenodo.org/records/1188976#.XvD_1mj0nD4). You may get a popup asking for permission to modify your Google Drive.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Now, let's import all libaries we need.

In [3]:
import os
import librosa
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

Now, we add our dataset and create a loader function that will load audio from my Google Drive and extract data from it.

In [4]:
data_dir = '/content/drive/MyDrive/RAVDESS'

def load_audio_file(file_path):
  # Load the audio file
  audio, sample_rate = librosa.load(file_path, sr=None)

  # Extract features
  features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
  return features

I'm going to generate a CSV of the file structure of the dataset. It's going to look like this:

```
| File Path | Emotion |
| --------- | ------- |
|audio1.wav | Happy   |
...etc.
```

The following code loads it and makes it easy for us to use.

In [8]:
def generate_ravdess_csv(dir_list):
  data = []
  for parent_dir in dir_list:
    for root, dirs, files in os.walk(parent_dir):
      for file in files:
        if file.endswith(".wav"):
          filename = file[:-4]  # Remove .wav extension
          parts = filename.split('-')
          modality, vocal_channel, emotion, intensity, statement, repetition, actor = map(int, parts)

        # Determine emotion and intensity label
        emotion_map = {
          1: "neutral",
          2: "calm",
          3: "happy",
          4: "sad",
          5: "angry",
          6: "fearful",
          7: "disgust",
          8: "surprised"
        }
        intensity_map = {
          1: "",
          2: "strongly "
        }
        emotion_label = intensity_map[intensity] + emotion_map[emotion]

        data.append({
            "file_path": os.path.join(root, file),
            "emotion": emotion_label
        })

  # Create DataFrame and return
  df = pd.DataFrame(data)
  return df

if __name__ == "__main__":
  dir_list = ["/content/drive/My Drive/Corpora/RAVDESS/Actor_10", "/content/drive/My Drive/Corpora/RAVDESS/Actor_10"]
  df = generate_ravdess_csv(dir_list)
  df.to_csv("ravdess_data.csv", index=False)

data = pd.read_csv('ravdess_data.csv')

Now, let's create the datasets to train our model. This step will take a while, as we are quite literally loading every single audio file in the >500 Mb folder.

In [9]:
X = []
y = []

for index, row in data.iterrows():
  file_path = row['file_path']
  emotion = row['emotion']
  features = load_audio_file(file_path)
  X.append(features)
  y.append(emotion)

Now, we split these sets into training and testing sets.

In [13]:
X = np.array(X)
y = np.array(y)

# Convert emotions to categorical labels
num_classes = len(np.unique(y))
y = to_categorical(y, num_classes)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
 random_state=42)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (120, 40) + inhomogeneous part.