<a href="https://colab.research.google.com/github/spriyam095/LeuronN/blob/master/Speech%20Emotion%20Recogniser/EmotionRecognizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#                 **SPEECH EMOTION RECOGNITON**

Speech emotion recognition is a simple Python mini-project, which we are going to practice. But before we go further we need to install a library(***soundfile***) to operate with our audio files.

In [3]:
pip install soundfile 

Collecting soundfile
  Downloading https://files.pythonhosted.org/packages/eb/f2/3cbbbf3b96fb9fa91582c438b574cff3f45b29c772f94c400e2c99ef5db9/SoundFile-0.10.3.post1-py2.py3-none-any.whl
Installing collected packages: soundfile
Successfully installed soundfile-0.10.3.post1


And because at the start of the Model I faced certain problems with sklearn_version 0.20+... . Therefore I got rid of it and went further with sklearn_version 0.19.1

In [4]:
pip uninstall sklearn

Uninstalling sklearn-0.0:
  Would remove:
    /usr/local/lib/python3.6/dist-packages/sklearn-0.0.dist-info/*
Proceed (y/n)? y
  Successfully uninstalled sklearn-0.0


In [5]:
pip install scikit-learn==0.19.1

Collecting scikit-learn==0.19.1
[?25l  Downloading https://files.pythonhosted.org/packages/3d/2d/9fbc7baa5f44bc9e88ffb7ed32721b879bfa416573e85031e16f52569bc9/scikit_learn-0.19.1-cp36-cp36m-manylinux1_x86_64.whl (12.4MB)
[K     |████████████████████████████████| 12.4MB 178kB/s 
[31mERROR: yellowbrick 0.9.1 has requirement scikit-learn>=0.20, but you'll have scikit-learn 0.19.1 which is incompatible.[0m
[31mERROR: imbalanced-learn 0.4.3 has requirement scikit-learn>=0.20, but you'll have scikit-learn 0.19.1 which is incompatible.[0m
[?25hInstalling collected packages: scikit-learn
  Found existing installation: scikit-learn 0.21.3
    Uninstalling scikit-learn-0.21.3:
      Successfully uninstalled scikit-learn-0.21.3
Successfully installed scikit-learn-0.19.1


Now we import necessary libraries. I discovered two new libraries in making this project. **Glob** and **Soundfile**.

In [1]:
import librosa
import soundfile
import glob , os , pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
import sklearn
print(sklearn.__version__)

0.19.1


Now here I mount my google drive to access my dataset to train on.

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


Further below is the code to **extract features of those audios** whose emotions are mentioned in the dictionary further below in this notebook.

In [0]:
#Extract features (mfcc, chroma, mel) from a sound file

def extract_feature(file_name, mfcc, chroma, mel):

    with soundfile.SoundFile(file_name) as sound_file:

        X = sound_file.read(dtype="float32")
        sample_rate=sound_file.samplerate
        if chroma:
            stft=np.abs(librosa.stft(X))
        result=np.array([])
        if mfcc:
            mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result=np.hstack((result, mfccs))
        if chroma:
            chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
            result=np.hstack((result, chroma))
        if mel:
            mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
            result=np.hstack((result, mel))
    return result

Here we Initialize our **Dictionaries**. One may add any emotion from **Dict:** *emotions* to **Dict:** *observed_emotions*

In [0]:
#Emotions in the RAVDESS dataset

emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}
#Emotions to observe
observed_emotions=['calm', 'happy', 'fearful', 'disgust']

Now, let’s load the data with a function load_data() – this takes in the relative size of the test set as parameter. x and y are empty lists; we’ll use the glob() function from the glob module to get all the pathnames for the sound files in our dataset. The pattern we use for this is: “/content/gdrive/My Drive/Colab Notebooks/Emotionaudio/Actor_*/*.wav” as we are importing it from my google drive and **our dataset looks like this**

![alt text](https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/sites/2/2019/09/dataset-simple-python-project.png)

So, for each such path, we get the basename of the file, the emotion by splitting the name around ‘-’ and extracting the third value:

![alt text](https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/sites/2/2019/09/dataset-2-interesting-python-projects.png)

Using our emotions dictionary, this number is turned into an emotion, and our function checks whether this emotion is in our list of observed_emotions; if not, it continues to the next file. It makes a call to extract_feature and stores what is returned in ‘feature’. Then, it appends the feature to x and the emotion to y. So, the list x holds the features and y holds the emotions. We call the function train_test_split with these, the test size, and a random state value, and return that.

In [0]:
#Load the data and extract features for each sound file

def load_data(test_size=0.2):

    x,y=[],[]

    for file in glob.glob("/content/gdrive/My Drive/Colab Notebooks/Emotionaudio/Actor_*/*.wav"):
      
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]

        if emotion not in observed_emotions:
            continue
        feature=extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)

    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

Time to split the dataset into training and testing sets! Let’s keep the test set 20% of everything and use the load_data function for this.

In [0]:
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=load_data(test_size=0.20)

Next we check the shapes of our train and test respectively.

In [7]:
#Get the shape of the training and testing datasets
print((x_train.shape[0], x_test.shape[0]))

(614, 154)


Now we see number of features each audio has to be studied.

In [8]:
#Get the number of features extracted
print(f'Features extracted: {x_train.shape[1]}')

Features extracted: 180


Now, let’s initialize an MLPClassifier. This is a Multi-layer Perceptron Classifier; it optimizes the log-loss function using LBFGS or stochastic gradient descent. Unlike SVM or Naive Bayes, the MLPClassifier has an internal neural network for the purpose of classification. This is a feedforward ANN model.

In [0]:
#Initialize the Multi Layer Perceptron Classifier
model=MLPClassifier(alpha=0.01, batch_size=230, epsilon=1e-08, hidden_layer_sizes=(280,), learning_rate='adaptive', max_iter=1000)

Next we train it on our training data.

In [50]:
#Training the model
model.fit(x_train,y_train)

MLPClassifier(activation='relu', alpha=0.01, batch_size=230, beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(280,), learning_rate='adaptive',
       learning_rate_init=0.001, max_iter=1000, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

Next we test our model on test_data and see the results.

In [51]:
#Predict for the test set
y_pred=model.predict(x_test)
print(y_pred)

['calm' 'disgust' 'happy' 'happy' 'disgust' 'calm' 'happy' 'happy' 'calm'
 'happy' 'happy' 'happy' 'happy' 'happy' 'calm' 'fearful' 'calm' 'happy'
 'disgust' 'calm' 'calm' 'fearful' 'calm' 'calm' 'happy' 'calm' 'disgust'
 'disgust' 'calm' 'disgust' 'happy' 'disgust' 'happy' 'disgust' 'fearful'
 'calm' 'calm' 'fearful' 'calm' 'calm' 'happy' 'disgust' 'calm' 'calm'
 'happy' 'calm' 'disgust' 'happy' 'calm' 'fearful' 'happy' 'fearful'
 'calm' 'fearful' 'calm' 'calm' 'calm' 'calm' 'calm' 'calm' 'calm' 'calm'
 'happy' 'fearful' 'disgust' 'calm' 'calm' 'calm' 'calm' 'calm' 'fearful'
 'fearful' 'happy' 'fearful' 'fearful' 'disgust' 'calm' 'happy' 'disgust'
 'fearful' 'fearful' 'disgust' 'happy' 'calm' 'fearful' 'calm' 'calm'
 'disgust' 'disgust' 'disgust' 'fearful' 'calm' 'calm' 'fearful' 'happy'
 'calm' 'calm' 'calm' 'calm' 'disgust' 'fearful' 'calm' 'disgust' 'calm'
 'fearful' 'calm' 'happy' 'happy' 'calm' 'fearful' 'fearful' 'fearful'
 'calm' 'fearful' 'fearful' 'calm' 'calm' 'fearful' 'cal

Now we check its accuracy on our test data.

In [52]:
#Calculate the accuracy of our model
accuracy=accuracy_score(y_true=y_test, y_pred=y_pred)

#Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))

Accuracy: 66.23%


For now its 66.23%. But one can get even better results with few improvsations.

# THE END