#  SVM Classifier for  SER (Speech Emotion Recognition)

Support Vector Machines (SVM) with non-linear kernels are often the most successfully applied algorithms for speech emotion recognition. 

A SVM using non-linear kernel transforms the input feature vectors into a higher dimensional feature space using a kernel mapping function. By choosing appropriate non-linear kernels functions, classifiers that are non-linear in the original space can become linear in the feature space.

## Dataset

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS),and it is free to download. This dataset has 7356 files rated by 247 individuals 10 times on emotional validity, intensity, and genuineness. here the Speech files of all actors (01-24) will be used and the files are available under path ../datasets/RAVDESS. It contains 1440 files: 60 trials per actor x 24 actors = 1440. 
Filename identifiers:
<ol>
<li>Modality (01 = full-AV, 02 = video-only, 03 = audio-only).</li>
<li>Vocal channel (01 = speech, 02 = song).</li>
<li>Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).</li>
<li>Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.</li>
<li>Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").</li>
<li>Repetition (01 = 1st repetition, 02 = 2nd repetition)..</li>
<li>Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).</li>
</ol>

## Downloading Libraries

In [1]:
#!pip install librosa soundfile numpy sklearn pyaudio

In [2]:
#!pip install soundfile

In [3]:
# pip install seaborn

## Libraries Import

In [4]:
# Import our libraries

import os,glob
import librosa
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import time
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import seaborn as sn
from sklearn.svm import SVC
import seaborn as sn
from sklearn.model_selection import cross_val_score
import librosa.display
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import soundfile
import os.path

## Feature Extraction

This function loads the file give the file path and after resampling and computing MFCC features, returns the features. We have selected the no. of MFCCs as 40.


In [5]:
#Extract features (mfcc, chroma, mel) from a given sound file(with path)
def extract_feature(file_name, mfcc):
    X, sample_rate = librosa.load(os.path.join(file_name), res_type='kaiser_fast')
    result=np.array([])
    if mfcc:
        mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
        result=np.hstack((result, mfccs))
    return result

In [6]:
#Emotions in the RAVDESS dataset, it will not take the neutral into the modeling process
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fear',
  '07':'disgust',
  '08':'surprise'
}
#Emotions to observe(all emotions except neutral)
observed_emotions=['neutral','calm', 'happy', 'sad','angry','fear', 'disgust','surprise']

## Retrieve RAVDESS dataset from File System

In [7]:
#audio file path in file system
RAV = "../datasets/RAVDESS/"
CSV = "../datasets/CSV/"
# test to run one example 
dir_list = os.listdir(RAV+"Actor_01/")
dir_list[0:5]

['03-01-01-01-01-01-01.wav',
 '03-01-01-01-01-02-01.wav',
 '03-01-01-01-02-01-01.wav',
 '03-01-01-01-02-02-01.wav',
 '03-01-02-01-01-01-01.wav']

In [8]:
#Load the data and extract features for each sound file
def load_data(test_size=0.2): 
    x,y=[],[]
    
    # feature to extract
    mfcc = True 
    path= RAV+"Actor_*/*.wav"   
    for file in glob.glob(path):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]] #to get emotion according to filename. dictionary emotions is defined above.
        if emotion not in observed_emotions: #options observed_emotions - RAVDESS and TESS, ravdess_emotions for RAVDESS only
            continue
        feature=extract_feature(file, mfcc)
        x.append(feature)
        y.append(emotion)
    return {"X":x,"y":y}

In [9]:
#load data into memory
Trial_dict = load_data(test_size = 0.3)

In [10]:
X = pd.DataFrame(Trial_dict["X"])
y = pd.DataFrame(Trial_dict["y"])
X.shape, y.shape

((1440, 40), (1440, 1))

In [11]:
#renaming the label column to emotion, store the attributes and label into dataframe
y=y.rename(columns= {0: 'emotion'})
data = pd.concat([X, y], axis =1)
#view the first 10 record
data.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,31,32,33,34,35,36,37,38,39,emotion
0,-707.226318,68.469788,-11.61132,22.716902,-0.303072,5.822211,-6.0815,-2.655389,-9.960321,-5.103168,...,-2.744597,-2.285304,-2.167415,-2.929328,-1.793007,-0.758816,-2.190662,-2.976669,-2.087532,neutral
1,-703.38324,70.197769,-15.213277,27.412649,-0.247905,5.837488,-4.831208,-4.569034,-10.60791,-3.820005,...,-3.116829,-1.922442,-2.564388,-2.96968,-0.872745,-1.239757,-3.162743,-2.940819,-2.609301,neutral
2,-700.794006,70.959595,-11.694939,23.595743,-2.463483,6.388802,-5.021149,-4.631995,-9.482592,-5.633955,...,-2.547276,-2.147709,-1.945596,-2.771029,-1.836579,-1.192164,-2.6774,-3.442389,-2.3996,neutral
3,-694.82605,69.669205,-9.815083,23.888597,-1.381263,8.620013,-4.805181,-6.055672,-9.54414,-5.380029,...,-2.928294,-2.366574,-3.042785,-3.062823,-1.447675,-0.979923,-2.325146,-3.415248,-3.38712,neutral
4,-737.437988,77.273209,-11.190391,26.755884,-1.537418,8.085284,-7.015995,-3.00623,-8.947398,-7.313802,...,-2.158252,-1.256279,-2.996467,-1.348934,-0.796627,-1.648554,-1.981015,-3.455514,-3.098281,calm
5,-716.575806,79.555,-9.764153,21.266054,1.637321,6.441747,-7.460036,-3.939013,-7.493039,-6.710302,...,-3.157654,-2.751998,-3.946696,-3.424494,-1.561885,-1.81281,-3.330558,-3.927409,-4.398661,calm
6,-708.541504,81.566536,-14.513219,26.434237,-4.558772,9.789379,-5.627738,-3.479167,-8.197019,-6.570935,...,-2.898625,-1.729264,-3.63845,-3.722741,-2.391207,-1.85956,-2.648129,-4.284513,-3.400892,calm
7,-709.393677,84.234299,-13.381046,25.719385,-1.773599,8.669509,-5.068448,-4.345946,-7.930424,-6.252995,...,-2.574771,-1.703177,-2.932248,-3.322625,-1.913512,-1.099499,-2.460561,-3.341412,-3.574461,calm
8,-746.07428,87.865776,-12.613696,27.847467,-2.255568,9.403696,-8.469919,-1.561309,-5.440102,-6.947087,...,-4.273163,-2.579099,-3.169269,-4.250158,-0.516558,-2.046174,-3.282803,-4.088738,-3.995429,calm
9,-709.888733,83.872131,-16.015291,24.925694,-3.676042,7.734766,-8.870955,-4.643577,-7.436958,-6.956243,...,-3.525422,-3.342165,-3.688268,-4.542849,-1.813079,-1.500344,-2.933184,-3.79641,-4.100177,calm


In [12]:
#to shuffle the data
data = data.reindex(np.random.permutation(data.index))
#view the first 10 record after shuffle
data.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,31,32,33,34,35,36,37,38,39,emotion
36,-609.004395,76.845642,-14.888309,28.807346,-2.702205,5.189587,-8.343067,-7.548768,-10.870005,-4.570981,...,-2.347911,-1.864466,-1.747489,-3.931832,-1.779631,-0.355756,-0.414589,-0.848949,0.057137,fear
1025,-712.631409,69.29776,-15.960979,25.363178,-10.164749,3.10094,-3.779842,-11.178854,-4.597929,-4.668557,...,0.975181,2.828439,2.157255,1.036434,1.004172,0.304499,1.125913,2.925911,4.209477,calm
397,-646.509827,73.816666,-25.309353,22.967854,-6.723145,-2.12034,-18.04644,-11.309408,-9.757098,-3.757479,...,7.06753,6.445099,9.290625,9.819832,6.880746,5.275755,5.206872,7.118198,7.838283,fear
1428,-620.201843,70.002907,-24.914093,10.987709,-27.066187,-2.287763,-14.195181,-16.437902,-7.25544,-3.984179,...,-1.190491,-1.268249,-0.391729,-0.547336,-0.354244,-0.1265,0.031642,-1.225806,-0.873205,disgust
316,-441.296875,34.237392,-68.47316,8.243364,-12.826751,-14.301535,-24.665035,-12.593336,-9.847479,4.736321,...,2.994164,3.51914,5.466554,4.801404,3.088497,3.574539,4.135554,3.659224,5.5307,happy
292,-648.514465,75.21479,-12.750978,26.681688,-7.513033,4.592214,-8.354985,-0.440473,-2.600385,-4.912567,...,1.671791,2.728211,2.625202,2.602403,1.571128,0.163951,0.451711,0.545858,2.732122,surprise
1100,-731.447388,85.366699,0.675145,33.064617,-2.090637,21.219025,2.120228,0.309854,-0.382266,-0.911686,...,-1.606565,-1.065857,-0.865149,-0.837827,0.689016,-1.072334,-2.168283,-2.170758,-1.807704,sad
1173,-377.792755,43.342861,-47.384853,8.843512,-34.239628,-9.231688,-13.71347,-12.492782,-5.270347,-6.70583,...,0.485321,1.437978,0.701364,0.637728,1.875052,0.642962,0.172344,1.035005,1.483239,angry
843,-644.878601,85.622337,-10.219476,26.023291,2.271103,5.725358,0.657351,-0.294617,-2.395996,-1.02121,...,-0.732088,-0.236803,-0.522722,-0.043064,0.284817,-0.606395,0.652074,-0.96646,0.103948,neutral
660,-694.867737,75.670403,-15.830236,22.82078,-7.800847,1.406281,-4.516475,-9.234518,-8.677084,-7.546496,...,2.193969,3.387321,5.181444,4.672852,4.29501,3.764876,2.233306,-0.355897,-2.060589,neutral


In [13]:
sns.boxplot(x='emotion',y=15, data=data)

NameError: name 'sns' is not defined

In [None]:
# save the shuffled data into csv file for future use.
if os.path.isfile(CSV+"mfcc_feature.csv"):
    print ("File already exist! Delete it first from file system if you want to replace old data")
else:
    data.to_csv(CSV+"mfcc_feature.csv")
#to view the data dimension 
data.shape

In [None]:
#check all the fields(columns) in the data
data.columns

In [None]:
#separating features and target outputs
#X variable contains attributes while y variable contains corresponding labels.
X = data.drop('emotion', axis = 1).values
y = data['emotion'].values
#check the shape of x and y
X.shape, y.shape

## SVM (Support Vector Machine) Classifier and Fit/Train the Model
Implementing SVM with Scikit-Learn

To split data into training and test sets. The model_selection library of the Scikit-Learn library contains the train_test_split method which is used to split data into training and test sets.

to train SVM on the training data. Scikit-Learn contains the svm library, which contains built-in classes for different SVM algorithms. 

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
svclassifier = SVC(kernel = 'linear')

In [None]:
svclassifier.fit(X_train, y_train)

In [None]:
y_pred = svclassifier.predict(X_test)

In [None]:
accuracy=accuracy_score(y_test, y_pred)
#Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))
#Print the report
print("Statistics:")
print(classification_report(y_test,y_pred))

print("Confusion Matrix:")
matrix = confusion_matrix(y_test,y_pred)
df_matrix = pd.DataFrame(matrix)
sn.heatmap(df_matrix, annot=True, fmt='')
plt.show()

In [None]:
#calculate and display the train and test accuracy
train_accuracy = float(svclassifier.score(X_train, y_train)*100)
print("Train Accuracy: {:.2f}%".format(train_accuracy))
test_accuracy = float(svclassifier.score(X_test, y_test)*100)
print("Test Accuracy: {:.2f}%".format(test_accuracy))

## Cross-Validation
after calculating the training accuracy and comparing with testing accuracy, to check for overfitting of data.
5-fold cross validation of the dataset with the SVC classifier.

In [None]:
# no. of folds cv = 5
cv_results = cross_val_score(svclassifier, X, y, cv = 5)
print(cv_results)

## Scaling

When performing the linear SVM classification, it is often helpful to normalize the training data and scale the test data with the mean and standard deviation of training data.

In [None]:
#splitting dataset into train/ test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

In [None]:
# Setup the pipeline steps and create the pipeline and fit the pipeline to the training set: svc_scaled
steps = [('scaler', StandardScaler()),
        ('SVM', SVC())]
pipeline = Pipeline(steps)
svc_scaled = pipeline.fit(X_train, y_train)

In [None]:
# fit a classifier to the unscaled data, Compute and print metrics
svc_unscaled = SVC(kernel = 'linear').fit(X_train, y_train)
Scaling_accuracy = float(svc_scaled.score(X_test, y_test)*100)
Non_Scaling_accuracy = float(svc_unscaled.score(X_test, y_test)*100)
print('Accuracy with Scaling: {:.2f}%'.format(Scaling_accuracy))
print('Accuracy without Scaling: {:.2f}%'.format(Non_Scaling_accuracy))

## Generalization check
Checking for overfitting or underfitting by comparing the training and testing scores of the model

In [None]:
train_accuracy = float(svc_scaled.score(X_train, y_train)*100)
print("Train Accuracy: {:.2f}%".format(train_accuracy))
test_accuracy = float(svc_scaled.score(X_test, y_test)*100)
print("Test Accuracy: {:.2f}%".format(test_accuracy))

In [None]:
scaled_predictions = svc_scaled.predict(X_test)

In [None]:
accuracy=accuracy_score(y_test, scaled_predictions)

#Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))

#Print the report
print("Statistics:")
print(classification_report(y_test,scaled_predictions))

print("Confusion Matrix:")
cm = confusion_matrix(y_test,scaled_predictions)
df_cm = pd.DataFrame(cm)
sn.heatmap(df_cm, annot=True, fmt='')
plt.show()

In [None]:
cv_results = cross_val_score(svc_scaled, X, y, cv = 5)
print(cv_results)