<a href="https://colab.research.google.com/github/emilstahl97/Scalable-Machine-Learning-and-Deep-Learning-ID2223/blob/notebooks/SpeechRecognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Speech Emotion Detection

#### RAVDESS Dataset:

- The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) is licensed under CC BY-NA-SC 4.0. and can be downloaded free of charge at https://zenodo.org/record/1188976.
- The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7356 files (total size: 24.8 GB). 
- The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. 
- Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. 
- Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. - All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound).  

For this analysis, the below file types have been used:
- Audio speech files. 
- Additionally, the speech from the video files have been extracted by converting the MP4 files to WAV format.

File naming convention: Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics: 

Filename identifiers 
- Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
- Vocal channel (01 = speech, 02 = song).
- Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
- Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
- Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
- Repetition (01 = 1st repetition, 02 = 2nd repetition).
- Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

In [34]:
from google.colab import drive
import os

# README - Execute this cell to mount the notebook in your google drive. 
# Execute the cell and follow the link to sign and, paste the given key in the little text box. The credentials are only available for you. 

drive.mount('/content/drive', force_remount=True)

if not os.path.exists("/content/drive/MyDrive/audio-dataset"):
  print("Pulling dataset")
  os.makedirs("/content/drive/MyDrive/audio-dataset")
  !git clone https://github.com/emilstahl97/Audio-dataset.git
else:
  print("Dataset already exists")

os.chdir("/content/drive/MyDrive/audio-dataset/Audio-dataset/Audio-dataset")
#os.chdir("/content/drive/MyDrive/audio-dataset/Audio-dataset/Rawdata")

!git pull
!ls

RAVDESS_PATH = "./RAVDESS"
SAVEE_PATH = "./SAVEE"
SAVED_MODELS_PATH = "../saved_models"

if not os.path.exists("/content/drive/MyDrive/ID2223/project/mfcc"):
  os.makedirs("/content/drive/MyDrive/ID2223/project/mfcc/")
  
if not os.path.exists("/content/drive/MyDrive/ID2223/project/emotion"):
  os.makedirs("/content/drive/MyDrive/ID2223/project/emotion/")

mfcc_file_path = "/content/drive/MyDrive/ID2223/project/mfcc/mfcc.npy"
emotion_file_path = "/content/drive/MyDrive/ID2223/project/emotion/emotion.npy"

Mounted at /content/drive
Dataset already exists
Your configuration specifies to merge with the ref 'refs/heads/main'
from the remote, but no such ref was fetched.
'"'   RAVDESS   SAVEE


In [5]:
!pip install --upgrade tensorflow_hub
!pip install --upgrade tensorflow
!pip install --upgrade keras
!pip install --upgrade numpy
!pip install --upgrade matplotlib
!pip install --upgrade librosa
!pip install --upgrade scipy
!pip install --upgrade scikit-learn
!pip install --upgrade pandas

Collecting numpy
  Downloading numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
[K     |████████████████████████████████| 15.7 MB 4.0 MB/s 
[?25hInstalling collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.5
    Uninstalling numpy-1.19.5:
      Successfully uninstalled numpy-1.19.5
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.5 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.[0m
Successfully installed numpy-1.21.5


Collecting matplotlib
  Downloading matplotlib-3.5.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
[K     |████████████████████████████████| 11.2 MB 4.3 MB/s 
Collecting fonttools>=4.22.0
  Downloading fonttools-4.28.5-py3-none-any.whl (890 kB)
[K     |████████████████████████████████| 890 kB 38.4 MB/s 
Installing collected packages: fonttools, matplotlib
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.2.2
    Uninstalling matplotlib-3.2.2:
      Successfully uninstalled matplotlib-3.2.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.5 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.[0m
Successfully installed fonttools-4.28.5 matplotlib-3.5.1


Collecting scipy
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
[K     |████████████████████████████████| 38.1 MB 1.2 MB/s 
Installing collected packages: scipy
  Attempting uninstall: scipy
    Found existing installation: scipy 1.4.1
    Uninstalling scipy-1.4.1:
      Successfully uninstalled scipy-1.4.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.21.5 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.[0m
Successfully installed scipy-1.7.3
Collecting pandas
  Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
[K     |████████████████████████████████| 11.3 MB 4.1 MB/s 
Installing collected packages: pandas
  Attempt

In [6]:
# Import libraries 
import librosa
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from matplotlib.pyplot import specgram
import pandas as pd
import glob 
from sklearn.metrics import confusion_matrix
import IPython.display as ipd  
import os
import sys
import warnings
import subprocess

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report,confusion_matrix, accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Input, Flatten, Dropout, Activation, BatchNormalization
from keras.layers import Conv1D, MaxPooling1D
from keras.models import Model

if not sys.warnoptions:
    warnings.simplefilter("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 


In [7]:
# Define a function to convert the video files(MP4 format) to audio files(WAV format)

def fn_ConvertMP4ToWAV(v_path_VideoFiles, v_path_ConvertedVideoFiles):
    """ Convert MP4 video files to WAV audio files. """
    i = 0
    for root, dirs, files in os.walk(v_path_VideoFiles, topdown=False):        
        for name in files:            
            # Remove files that do not have audio
            i += 1
            if not name.startswith('02'):
                print(f"Converting file {str(i)}: ", str(name))
                command = 'ffmpeg -i ' + root + '\\' + name + ' ' + '-ab 160k -ac 2 -ar 44100 -vn' + ' ' + v_path_ConvertedVideoFiles + '\\' + name[:-3] + 'wav'            
                subprocess.call(command, shell=True)

In [8]:
# Define the path variables

path_AudioFiles = './RAVDESS'
path_VideoFiles = './RAVDESS'
path_ConvertedVideoFiles = './converted'

In [9]:
# Call fn_ConvertMP4ToWAV to convert MP4 video files to WAV audio files

% time fn_ConvertMP4ToWAV(path_VideoFiles, path_ConvertedVideoFiles)

Converting file 1:  03-01-01-01-01-01-01.wav
Converting file 2:  03-01-01-01-01-02-01.wav
Converting file 3:  03-01-01-01-02-01-01.wav
Converting file 4:  03-01-01-01-02-02-01.wav
Converting file 5:  03-01-02-01-01-01-01.wav
Converting file 6:  03-01-02-01-01-02-01.wav
Converting file 7:  03-01-02-01-02-01-01.wav
Converting file 8:  03-01-02-01-02-02-01.wav
Converting file 9:  03-01-02-02-01-01-01.wav
Converting file 10:  03-01-02-02-01-02-01.wav
Converting file 11:  03-01-02-02-02-01-01.wav
Converting file 12:  03-01-02-02-02-02-01.wav
Converting file 13:  03-01-03-01-01-01-01.wav
Converting file 14:  03-01-03-01-01-02-01.wav
Converting file 15:  03-01-03-01-02-01-01.wav
Converting file 16:  03-01-03-01-02-02-01.wav
Converting file 17:  03-01-03-02-01-01-01.wav
Converting file 18:  03-01-03-02-01-02-01.wav
Converting file 19:  03-01-03-02-02-01-01.wav
Converting file 20:  03-01-03-02-02-02-01.wav
Converting file 21:  03-01-04-01-01-01-01.wav
Converting file 22:  03-01-04-01-01-02-01.w

### Explore the Data

In [10]:
# Define a function to play the audion track and plot the audio wave and MFCC.

def fn_PlayAudio_PlotAudioWave_PlotMFCC(v_file):
    """Play and plot the audio wave and MFCC for a given audio track"""
    
    # Play the audio 
    ipd.Audio(v_file)

    # Plot the audio wave
    data, sampling_rate = librosa.load(v_file)
    plt.figure(figsize=(15, 5))
    librosa.display.waveplot(data, sr=sampling_rate)
    
    X, sample_rate = librosa.load(v_file, res_type='kaiser_fast')  
    mfcc = librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40) 

    # Plot the MFCC
    plt.figure(figsize=(20, 15))
    plt.subplot(3,1,1)
    librosa.display.specshow(mfcc, x_axis='time')
    plt.ylabel('MFCC')
    plt.colorbar()

    return ipd.Audio(v_file)

In [11]:
# Define a function to compare the audio tracks and plot their waves

def fn_CompareAudio_Plot(v_Path1, v_Path2, v_label1, v_label2):
    """Compare 2 audio tracks and plot them"""

    i = 0
    mfcc = {}
    for path in [v_Path1, v_Path2]:
        X, sample_rate = librosa.load(path, res_type='kaiser_fast')  
        mfcc[i] = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=10), axis=0)
        i = i + 1

    # Audio wave
    plt.figure(figsize=(20, 15))
    plt.subplot(3,1,1)
    plt.plot(mfcc[0], label=v_label1)
    plt.plot(mfcc[1], label=v_label2)
    plt.legend()

In [12]:
# Define a function to generate MFCC from the audio files

def fn_MFCC_Emotion(v_path_AudioFiles, v_path_ConvertedVideoFiles):
    "Feature Generation: MFCC and Emotion"
    print("in mfcc")
    i = 0
    for path in [v_path_AudioFiles, v_path_ConvertedVideoFiles]:
        for root, dirs, files in os.walk(path):
            for name in files:
                i = i + 1
                print(name, i)

                X, sample_rate = librosa.load(os.path.join(str(root),str(name)), res_type='kaiser_fast')  
                v_mfcc = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0) 

                v_emotion = int(name[7:8]) - 1 
                
                lst_mfcc.append(v_mfcc)
                lst_emotion.append(v_emotion)
    return lst_mfcc, lst_emotion

In [37]:
import os

lst_mfcc = []
lst_emotion = []
mfcc = []
emotion = []

if not os.path.exists(mfcc_file_path) or not os.path.exists(emotion_file_path):
    print("Running fn_MFCC_Emotion...")
    %time mfcc, emotion =  fn_MFCC_Emotion(path_AudioFiles, path_ConvertedVideoFiles)

    print(mfcc[0:3], emotion[0:3])

    np.save(mfcc_file_path, mfcc)
    np.save(emotion_file_path, emotion)
  
  
elif os.path.exists(mfcc_file_path) and os.path.exists(emotion_file_path):
  print("Loading mfcc and emotion from local filesystem...")
  mfcc = np.load(mfcc_file_path, allow_pickle=True)
  emotion = np.load(emotion_file_path, allow_pickle=True)

X = np.array(mfcc)
y = np.array(emotion)

print(X.shape, y.shape)

Loading mfcc and emotion from local filesystem...
(2880, 40) (2880,)


In [38]:
# Split the data into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=9)

### Model 1: Decision Tree

In [39]:
dt = DecisionTreeClassifier(criterion='entropy', max_depth=10, min_samples_leaf = 3, 
                                 random_state= 9)

%time dt.fit(X_train, y_train)

CPU times: user 212 ms, sys: 0 ns, total: 212 ms
Wall time: 209 ms


DecisionTreeClassifier(criterion='entropy', max_depth=10, min_samples_leaf=3,
                       random_state=9)

In [40]:
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)

print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.68      0.64      0.66        53
           1       0.74      0.72      0.73       128
           2       0.55      0.52      0.53       104
           3       0.52      0.45      0.48       123
           4       0.66      0.65      0.65       114
           5       0.69      0.67      0.68       107
           6       0.66      0.45      0.53       111
           7       0.44      0.69      0.53       124

    accuracy                           0.60       864
   macro avg       0.62      0.60      0.60       864
weighted avg       0.61      0.60      0.60       864



### Model 2: Random Forest

In [None]:
rf = RandomForestClassifier(criterion="gini", max_depth=10, max_features="sqrt", 
                                 max_leaf_nodes = 100, min_samples_leaf = 3, min_samples_split = 20, 
                                 n_estimators= 20000, random_state= 9)

%time rf.fit(X_train, y_train)

In [None]:
y_pred = rf.predict(X_test)

print(classification_report(y_test,y_pred))

### Model 3: XGBoost

In [None]:
XGB = XGBClassifier(n_estimators=2000, gamma=0.5,learning_rate=0.1, max_depth = 10)

%time XGB.fit(X_train, y_train)

In [None]:
y_pred = XGB.predict(X_test)

print(classification_report(y_test,y_pred))

### Model 4: CNN (Convolutional  Neural Network)

In [None]:
X_train = np.expand_dims(X_train, axis=2)
X_test = np.expand_dims(X_test, axis=2)

X_train.shape, X_test.shape

In [None]:

from tensorflow.keras import optimizers

model = Sequential()
model.add(Conv1D(256, 8, padding='same',input_shape=(X_train.shape[1],1))) 
model.add(Activation('relu'))
model.add(Conv1D(256, 8, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(MaxPooling1D(pool_size=(4)))
model.add(Conv1D(128, 8, padding='same'))
model.add(Activation('relu'))
model.add(Conv1D(128, 8, padding='same'))
model.add(Activation('relu'))
model.add(Conv1D(128, 8, padding='same'))
model.add(Activation('relu'))
model.add(Conv1D(128, 8, padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(MaxPooling1D(pool_size=(4)))
model.add(Conv1D(64, 8, padding='same'))
model.add(Activation('relu'))
model.add(Conv1D(64, 8, padding='same'))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(8)) 
model.add(Activation('softmax'))
opt = optimizers.RMSprop(learning_rate=0.00001, decay=1e-6)

In [None]:
model.summary()

In [None]:
import os
if os.path.exists("/content/drive/MyDrive/savedmodels/cnn.h5"):
  print("Loading pre-trained model")
  model = keras.models.load_model("/content/drive/MyDrive/savedmodels/cnn.h5")
else:
  model.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
  %time cnn=model.fit(X_train, y_train, batch_size=16, epochs=1000, validation_data=(X_test, y_test))

In [None]:
import os
model_name = 'cnn.h5'
save_dir = os.path.join(os.getcwd(), "/content/drive/MyDrive/savedmodels/")
# Save model and weights
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
model_path = os.path.join(save_dir, model_name)
model.save(model_path)
print('Saved trained model at %s ' % model_path)

In [None]:
# Plot the model loss and accuracy 

plt.plot(cnn.history['loss'])
plt.plot(cnn.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()

plt.plot(cnn.history['acc'])
plt.plot(cnn.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

In [None]:
y_pred = model.predict_classes(X_test)

In [None]:
print(classification_report(y_test, y_pred))

In [None]:
print(confusion_matrix(y_test, y_pred))

# 0 = neutral, 1 = calm, 2 = happy, 3 = sad, 4 = angry, 5 = fearful, 6 = disgust, 7 = surprised

####                                                                                THANK YOU !!