# Speech Emotion Recognition

Voice often reflects underlying emotion through tone and pitch. Based on this fact, Speech Emotion Recognition (SER) has been developed, which is the act of attempting to recognize human emotion and affective states from speech. As such, in this python project I have tried building a model which will be able to recognize emotion from sound files.

### Installing the necessary libraries

In [1]:
!pip install librosa



In [2]:
!pip install pyaudio



In [4]:
!pip install soundfile



### Importing the required libraries

In [5]:
import librosa
import soundfile
import os, glob, pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

### Extracting features from a sound file

In [8]:
def extract_feature(file_name, mfcc, chroma, mel):
    with soundfile.SoundFile(file_name) as sound_file:
        X = sound_file.read(dtype="float32")
        sample_rate=sound_file.samplerate
        if chroma:
            stft=np.abs(librosa.stft(X))
        result=np.array([])
        if mfcc:
            mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result=np.hstack((result, mfccs))
        if chroma:
            chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
            result=np.hstack((result, chroma))
        if mel:
            mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
            result=np.hstack((result, mel))
        return result

### Given and required emotions

In [9]:
# Emotions in the RAVDESS dataset
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}

# Emotions to observe
observed_emotions=['calm', 'happy', 'fearful', 'disgust']

### Loading the dataset and extracting features for each sound file

In [16]:
def load_data(test_size=0.2):
    x,y=[],[]
    for file in glob.glob("/Users/anweashasaha/Downloads/speech-emotion-recognition-ravdess-data/Actor_*/*.wav"):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]
        if emotion not in observed_emotions:
            continue
        feature=extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

### Splitting the dataset

In [17]:
x_train,x_test,y_train,y_test=load_data(test_size=0.25)

In [18]:
print((x_train.shape[0], x_test.shape[0]))

(576, 192)


In [20]:
print(f'Features extracted: {x_train.shape[1]}')

Features extracted: 180


### Initializing the Multi Layer Perceptron Classifier

In [22]:
model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)

### Training the model

In [23]:
model.fit(x_train,y_train)

MLPClassifier(alpha=0.01, batch_size=256, hidden_layer_sizes=(300,),
              learning_rate='adaptive', max_iter=500)

### Predicting for the test set

In [24]:
y_pred=model.predict(x_test)

### Calculating the accuracy

In [25]:
accuracy=accuracy_score(y_true=y_test, y_pred=y_pred)

print("Accuracy: {:.2f}%".format(accuracy*100))

Accuracy: 75.00%


We obtain an accuracy of 75% from this model.