<a href="https://www.kaggle.com/code/mervetas/eeg-emotions-classification?scriptVersionId=144646627" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Classifying Emotions via EEG Features

**The aim of this notebook is to classify the correct labeled emotions from extracted features.**

The data was collected from two people (1 male, 1 female) for 3 minutes per state - **positive, neutral, negative.**\
TP9, AF7, AF8 and TP10 EEG dry electrodes were placed. Six minutes of resting neutral data is also recorded.

The input dataset consists of extracted features from EEG brainwaves where a static dataset was created via sliding window approach.
Overlapping windows consider wave data and many mathematical attributes are generated in order to describe the wave.

More information on EEG feature extraction can be found here:\
https://github.com/jordan-bird/eeg-feature-generation


In [None]:
import warnings
from warnings import filterwarnings
from sklearn.exceptions import ConvergenceWarning
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC, LinearSVC
from sklearn.metrics import confusion_matrix
import sys
import time

### Import dataset and view data info

In [None]:
df = pd.read_csv("/kaggle/input/eeg-brainwave-dataset-feeling-emotions/emotions.csv")
df.head()

In [None]:
palette = sns.color_palette("pastel")
plt.figure(figsize=(8, 6))
bars = df["label"].value_counts().plot(kind='bar', color=palette)
plt.ylabel('Value Counts')
plt.xticks(rotation=0)

for i, v in enumerate(df["label"].value_counts()):
    plt.text(i, v, str(v), ha='center', va='bottom', fontsize=12)

#### Our sample counts are relatively balanced, so no further adjustments to the samples needed 

In [None]:
print(df.info())

print("\n")
df.describe()

## Preprocessing

In [None]:
scaler = StandardScaler()
df_2 = df.drop(["label"], axis=1)

X = pd.DataFrame(scaler.fit_transform(df_2))

# Change label values
label_e = LabelEncoder()
df['label']=label_e.fit_transform(df['label'])
# neutral = 0, negative=1, positive=2

# 80% of data used for training, 20% for testing
y = df["label"]
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=48)

# Hyperparameter Tuning To Get the Optimal Parameter Values

In [None]:
import time
from tqdm.notebook import tqdm
from time import sleep
filterwarnings('ignore')

estimators = [
    ('svm', SVC()),
    ('lsvm', LinearSVC()),
    ('mlp', MLPClassifier())
]

param_grids = {
    'svm': {
        'C': [0.1, 1, 10, 100],
        'gamma': [0.01, 0.001, 0.0001],
        'kernel': ['rbf', 'linear', 'poly', 'sigmoid']
    },
    'lsvm': {
        'C': [0.1, 1, 10, 100],
        'max_iter': [1800, 2000, 2500]
    },
    'mlp': {
        'hidden_layer_sizes': [(100,), (100, 50), (100, 100)],
        'solver': ["lbfgs", "sgd", "adam"],
        'alpha': [0.001, 0.0001],
        'max_iter': [400, 600]
    }
}

best_params = {}

# Perform grid search for each estimator
for idx, (name, estimator) in enumerate(tqdm(estimators)):
    sleep(0.01)
    param_grid = param_grids[name]
    grid = GridSearchCV(estimator, param_grid, refit=True, verbose=0, n_jobs=-1)    
    
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ConvergenceWarning)
    
        grid.fit(X_train, y_train)   

    best_params[name] = grid.best_params_

    # Access the best parameters 
for name, best_params in best_params.items():
    print(f"Best Parameters for {name}:")
    print(best_params)

Best Parameters for svm:\
{'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}

Best Parameters for lsvm:\
{'C': 0.1, 'max_iter': 1800}

Best Parameters for mlp:\
{'alpha': 0.0001, 'hidden_layer_sizes': (100, 50), 'max_iter': 600, 'solver': 'lbfgs'}

## Let's Implement the best params in our models

In [None]:
models = {"SVM": SVC(C=100, gamma=0.0001, kernel='rbf'), "linearSVC": LinearSVC(C=0.1, max_iter=1800), "MLP": MLPClassifier(alpha=0.0001, hidden_layer_sizes=(100, 50), solver='lbfgs', max_iter=600)}


In [None]:
def model_clf(clf, name, xtrain, ytrain, xtest, ytest):
    clf.fit(xtrain, ytrain)
    accuracy = clf.score(xtest, ytest)
    print(f"Accuracy for {name} model: {accuracy}")
    return clf


In [None]:
clf_list = []
for name, clf in models.items():
    model = model_clf(clf, name, X_train, y_train, X_test, y_test)
    clf_list.append(model)

 ### Our parameter tuned results show:
    
*    SVM accuracy: **97%**
*    Linear SVC accuracy: **97%**
*    MLP accuracy: **98%**

## Our most accurate model is the MLP model (acc score = 98%)
Let's create a confusion matrix

In [None]:
y_pred = clf_list[2].predict(X_test)
cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt="d", cmap="Greens")
plt.title(f"Confusion Matrix for MLP Classifier")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

In [None]:
clr = classification_report(y_test, y_pred, target_names={"Neutral","Positive","Negative"})
print(" Classification Report ".center(60, "*"))
print(clr)

*Possible Improvements:
Hyperparameter tuning part takes quite long, the method to find the best parameters could be optimised.*

references: https://www.researchgate.net/publication/329403546_Mental_Emotional_Sentiment_Classification_with_an_EEG-based_Brain-machine_Interface