<a href="https://colab.research.google.com/github/niko-vaas/tutorial-notebooks/blob/main/Untitled2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let's start by allowing access to our Google Drive, where I put the [RAVDESS dataset](https://zenodo.org/records/1188976#.XvD_1mj0nD4). You may get a popup asking for permission to modify your Google Drive.

[Here](https://drive.google.com/drive/folders/1_2Wkg9Vpk34rjJbWV9IioalhBOHu1uG1?usp=sharing) is the modified edition I will uses on Google Drive.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Now, let's import all libaries we need.

In [2]:
import os
import librosa
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf

Now, we add our dataset and create a loader function that will load audio from my Google Drive and extract data from it.

In [3]:
data_dir = '/content/drive/MyDrive/RAVDESS'

def load_audio_file(file_path):
  # Load the audio file
  audio, sample_rate = librosa.load(file_path, sr=None)

  # Extract features
  features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
  return features

I'm going to generate a CSV of the file structure of the dataset. It's going to look like this:

```
| File Path | Emotion |
| --------- | ------- |
|audio1.wav | Happy   |
...etc.
```

The following code loads it and makes it easy for us to use.

In [4]:
def generate_ravdess_csv(dir_list):
  data = []
  for parent_dir in dir_list:
    for root, dirs, files in os.walk(parent_dir):
      for file in files:
        if file.endswith(".wav"):
          filename = file[:-4]  # Remove .wav extension
          parts = filename.split('-')
          modality, vocal_channel, emotion, intensity, statement, repetition, actor = map(int, parts)

        # Determine emotion and intensity label
        emotion_map = {
          1: "neutral",
          2: "calm",
          3: "happy",
          4: "sad",
          5: "angry",
          6: "fearful",
          7: "disgust",
          8: "surprised"
        }
        intensity_map = {
          1: "",
          2: "strongly "
        }
        emotion_label = intensity_map[intensity] + emotion_map[emotion]

        data.append({
            "file_path": os.path.join(root, file),
            "emotion": emotion_label
        })

  # Create DataFrame and return
  df = pd.DataFrame(data)
  return df

if __name__ == "__main__":
  dir_list = ["/content/drive/My Drive/Corpora/RAVDESS/Actor_10", "/content/drive/My Drive/Corpora/RAVDESS/Actor_10"]
  df = generate_ravdess_csv(dir_list)
  df.to_csv("ravdess_data.csv", index=False)

data = pd.read_csv('ravdess_data.csv')

Now, let's create the datasets to train our model. This step will take a while, as we are quite literally loading every single audio file in the >500 Mb folder.

In [26]:
X = []
y = []

for index, row in data.iterrows():
  file_path = row['file_path']
  emotion = row['emotion']
  features = load_audio_file(file_path)
  X.append(features)
  y.append(emotion)

X_train_padded = np.concatenate([np.expand_dims(x, axis=0) for x in X_train], axis=0)

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 2, the array at index 0 has size 363 and the array at index 1 has size 338

Now, we split these sets into training and testing sets.

In [23]:
X_train = np.stack(X_train, axis=0)  # Adjust axis if needed
y = np.array(y) # we don't need to do this for X because it is already a numpy array.

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
 random_state=42)

ValueError: all input arrays must have the same shape

Now begins the easy part: building the model. Let's import some specific keras libraries.

In [8]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten

NameError: name 'X_train' is not defined

Let's create our model and compile it.

In [13]:
num_classes = len(np.unique(y))
model = Sequential([
  Flatten(input_shape=X_train[0].shape),  # Flatten the input
  Dense(128, activation='relu'),
  Dropout(0.2),
  Dense(num_classes, activation='softmax')  # Output layer
])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

  super().__init__(**kwargs)


Now, we start training on the datasets we spent so long creating. This step will take a while.

In [14]:
model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

ValueError: Data cardinality is ambiguous. Make sure all arrays contain the same number of samples.'x' sizes: 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40
'y' sizes: 96


In [24]:
for i in range(len(X_train)):
    print(f"Sample {i} shape: {X_train[i].shape}")

Sample 0 shape: (40, 363)
Sample 1 shape: (40, 338)
Sample 2 shape: (40, 326)
Sample 3 shape: (40, 329)
Sample 4 shape: (40, 313)
Sample 5 shape: (40, 388)
Sample 6 shape: (40, 335)
Sample 7 shape: (40, 332)
Sample 8 shape: (40, 310)
Sample 9 shape: (40, 320)
Sample 10 shape: (40, 304)
Sample 11 shape: (40, 357)
Sample 12 shape: (40, 420)
Sample 13 shape: (40, 341)
Sample 14 shape: (40, 457)
Sample 15 shape: (40, 351)
Sample 16 shape: (40, 366)
Sample 17 shape: (40, 335)
Sample 18 shape: (40, 351)
Sample 19 shape: (40, 310)
Sample 20 shape: (40, 392)
Sample 21 shape: (40, 335)
Sample 22 shape: (40, 348)
Sample 23 shape: (40, 420)
Sample 24 shape: (40, 326)
Sample 25 shape: (40, 388)
Sample 26 shape: (40, 363)
Sample 27 shape: (40, 323)
Sample 28 shape: (40, 313)
Sample 29 shape: (40, 357)
Sample 30 shape: (40, 332)
Sample 31 shape: (40, 341)
Sample 32 shape: (40, 351)
Sample 33 shape: (40, 348)
Sample 34 shape: (40, 338)
Sample 35 shape: (40, 351)
Sample 36 shape: (40, 310)
Sample 37 s

In [2]:
import numpy as np

# Determine the maximum length
max_len = max([x.shape[1] for x in X_train])

# Create a padded array with zeros
X_padded = np.zeros((len(X_train), max_len))

# Fill the padded array with data
for i, x in enumerate(X_train):
    X_padded[i, :x.shape[1]] = x

# Now X_padded is a NumPy array with consistent shape
X_train = X_padded

NameError: name 'X_train' is not defined