# Problem statement:-

Speech emotion detection, the best ever python mini project. The best example of it can be seen at call centers. If you ever noticed, call centers employees never talk in the same manner, their way of pitching/talking to the customers changes with customers. Now, this does happen with common people too, but how is this relevant to call centers? Here is your answer, the employees recognize customers’ emotions from speech, so they can improve their service and convert more people. In this way, they are using speech emotion detection.

In [2]:
import pandas as pd
import librosa
import soundfile
import os, glob, pickle
import numpy as np
from tqdm import tqdm
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')

In [3]:
import os
for dirname, _, filenames in os.walk('/ml_project/ravdess-emotional-speech-audio'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [4]:
def extract_feature(file_name):
    X, sample_rate = librosa.load(file_name)
    stft=np.abs(librosa.stft(X))
    result=np.array([])
    mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
    result=np.hstack((result, mfccs))
    chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
    result=np.hstack((result, chroma))
    mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
    result=np.hstack((result, mel))
    return result

In [5]:
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}

def gender(g):
    if int(g[0:2]) % 2 == 0:
        return 'female'
    else:
        return 'male'

In [6]:
def load_data(test_size=0.2):
    x,y=[],[]
    for file in tqdm(glob.glob("../ml_project/ravdess-emotional-speech-audio/Actor_*/*.wav")):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]] + '_' + gender(file_name.split("-")[-1])
        feature=extract_feature(file)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, random_state=1)

In [7]:
X_train, X_val, y_train, y_val = load_data()

100%|██████████████████████████████████████████████████████████████████████████████| 1440/1440 [07:48<00:00,  3.07it/s]


In [8]:
print((X_train.shape[0], X_val.shape[0]))

(1152, 288)


In [9]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)

In [10]:
print(f'Features extracted: {X_train.shape[1]}')

Features extracted: 180


In [11]:
from sklearn.neural_network import MLPClassifier

model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)
model.fit(X_train,y_train)
print(model.score(X_train, y_train))

1.0


In [12]:
y_pred=model.predict(X_val)
print(model.score(X_val, y_val))

0.6875


In [13]:
df=pd.DataFrame({'Actual': y_val, 'Predicted':y_pred})
df

Unnamed: 0,Actual,Predicted
0,disgust_female,sad_female
1,angry_female,angry_female
2,fearful_male,fearful_male
3,surprised_male,surprised_male
4,surprised_female,surprised_female
...,...,...
283,calm_male,calm_male
284,calm_female,calm_female
285,sad_male,sad_male
286,happy_male,happy_male
