<a href="https://www.kaggle.com/code/farhadkhan66/speechemotionrecognition?scriptVersionId=144128984" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

### Emotion Classification in Audio using LSTM model and TESS Dataset

Emotion classification in speech can have various applications, from sentiment analysis in customer service calls to human-computer interaction. The following is a deep learning project that uses the Toronto Emotional Speech Set [(TESS)](https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess) dataset and Long Short-Term Memory (LSTM) neural networks to classify emotions in audio. The Toronto Emotional Speech Set (TESS) dataset contains audio recordings of seven different emotions. The goal of this project is to build a deep-learning model that can accurately classify these emotions based on audio data. Here we perform the following steps:
- Data Preparation
- Feature Extraction
- Data Splitting and Encoding
- Building the LSTM Model
- Model Training
- Evaluation
- Visualizing Results
- Making Predictions

In [None]:
import pandas as pd
import numpy as np
import os
import seaborn as sns
import matplotlib.pyplot as plt
import librosa
import librosa.display
from IPython.display import Audio
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load data from the TESS dataset
data = []
labels = []
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        data.append(os.path.join(dirname, filename))
        label = filename.split('_')[-1]
        label = label.split('.')[0]
        labels.append(label.lower())
print('Dataset is Loaded')

In [None]:
len(data)

In [None]:
data[:3]

In [None]:
len(labels)

In [None]:
# Function to extract MFCC features from audio files
def extractMfcc(filename):
    y, sr = librosa.load(filename, duration=3, offset=0.5)
    mfcc = np.mean(librosa.feature.mfcc(y=y, sr=sr, n_mfcc=40).T, axis=0)
    return mfcc

In [None]:
# Iterate through the list of data and extract and append MFCC features
for i, file_path in enumerate(data):
    mfccFeatures = extractMfcc(file_path)
    data[i] = mfccFeatures 

In [None]:
# Convert the list of file paths to a NumPy array
dataArray = np.array(data)

In [None]:
# Expand dimensions to create a 3D input tensor
xData = np.expand_dims(dataArray, -1)
xData.shape

In [None]:
# Import necessary libraries for data splitting and encoding
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

In [None]:
# Encode labels using One-Hot Encoding
encoder = OneHotEncoder()
yData = encoder.fit_transform(np.array(labels).reshape(-1,1)).toarray()

In [None]:
yData.shape

In [None]:
# Split data into training and testing sets
xTrain, xTest, yTrain, yTest = train_test_split(xData, yData, random_state=10, test_size=0.2)
xTrain.shape, yTrain.shape, xTest.shape, yTest.shape

### LSTM Model
The core of our emotion classification model is an LSTM neural network. Our model architecture consists of two LSTM layers followed by dropout layers to prevent overfitting. Finally, we have a fully connected layer with softmax activation to classify emotions.

In [None]:
# Import TensorFlow and create an LSTM model
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

In [None]:
# Create an LSTM model
model = Sequential()

# Add an LSTM layer
model.add(LSTM(units=128, input_shape=(xTrain.shape[1], xTrain.shape[2]), return_sequences=True))
model.add(Dropout(0.2))

# Add another LSTM layer
model.add(LSTM(units=128))
model.add(Dropout(0.2))

# Add a fully connected layer with softmax activation for classification
model.add(Dense(units=7, activation='softmax'))  # Change units to match the number of classes

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

In [None]:
# Train the model
history = model.fit(xTrain, yTrain, epochs=30, batch_size=32, validation_split=0.2)

In [None]:
# Evaluate the model on the test set
testLoss, testAcc = model.evaluate(xTest, yTest)
print(f'Test accuracy: {testAcc*100:.2f}%')

In [None]:
# Plot training and validation accuracy and loss
plt.figure(figsize=(10, 3))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Predict on the test data
predTest = model.predict(xTest)
yPred = encoder.inverse_transform(predTest)
yTest = encoder.inverse_transform(yTest)

In [None]:
# Create a DataFrame to compare predicted and actual labels
df = pd.DataFrame(columns=['Predicted Labels', 'Actual Labels'])
df['Predicted Labels'] = yPred.flatten()
df['Actual Labels'] = yTest.flatten()

df.head(10)

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
# Print classification report
print(classification_report(yTest, yPred))

In [None]:
# Save the model for future use
# model.save('emotionRecognitionModel.h5')