# Random Forest Classifier for Facial Emotion Recognition

This model is a Random Forest Classifier that has been trained to recognize face emotions. It classifies facial expressions into distinct emotion categories using a set of decision trees.


In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [2]:
data = pd.read_csv('fer2013.csv')


In [3]:
# Spliting into train, val, test sets
train_data = data[data.Usage == 'Training']
test_data = data[data.Usage == 'PrivateTest']
val_data = data[data.Usage == 'PublicTest']

In [4]:
import numpy as np
# Preprocessing images
def preprocess(data):
    X = []
    y = []
    for i in range(len(data)):
        img = data.iloc[i]['pixels'].split(' ')
        img = np.array(img, dtype='float32')
        img = img / 255.0
        X.append(img)
        y.append(data.iloc[i]['emotion'])
    X = np.array(X)
    y = np.array(y)
    return X, y

X_train, y_train = preprocess(train_data)
X_val, y_val = preprocess(val_data)
X_test, y_test = preprocess(test_data)

In [5]:
X_train = X_train.reshape(X_train.shape[0], -1)
X_val = X_val.reshape(X_val.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)



# Without Regularization


The model is trained using the training dataset provided.
1. During initialization, the number of estimators (decision trees), maximum depth, and random state are provided.
2. The model is adjusted to account for the training features and labels.

The model's performance is assessed using the test dataset after training.


In [6]:
rf = RandomForestClassifier(n_estimators=200, max_depth=20, random_state=42, verbose=1)


In [7]:
rf.fit(X_train, y_train)


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed: 13.2min finished


RandomForestClassifier(max_depth=20, n_estimators=200, random_state=42,
                       verbose=1)

In [8]:
y_pred = rf.predict(X_test)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    0.9s finished


In [10]:
y_train_pred = rf.predict(X_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    6.0s finished


In [11]:
from sklearn.metrics import accuracy_score, classification_report
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.47617720813597103
              precision    recall  f1-score   support

           0       0.54      0.18      0.27       491
           1       1.00      0.33      0.49        55
           2       0.53      0.29      0.38       528
           3       0.45      0.77      0.57       879
           4       0.37      0.38      0.38       594
           5       0.70      0.62      0.66       416
           6       0.46      0.45      0.45       626

    accuracy                           0.48      3589
   macro avg       0.58      0.43      0.46      3589
weighted avg       0.50      0.48      0.46      3589



In [12]:
from sklearn.metrics import classification_report

target_names = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
print('Training Classification')
print(classification_report(y_train, y_train_pred, target_names=target_names, labels=[0, 1, 2, 3, 4, 5, 6]))


Training Classification
              precision    recall  f1-score   support

       Angry       1.00      1.00      1.00      3995
     Disgust       1.00      1.00      1.00       436
        Fear       1.00      1.00      1.00      4097
       Happy       1.00      1.00      1.00      7215
         Sad       1.00      1.00      1.00      4830
    Surprise       1.00      1.00      1.00      3171
     Neutral       1.00      1.00      1.00      4965

    accuracy                           1.00     28709
   macro avg       1.00      1.00      1.00     28709
weighted avg       1.00      1.00      1.00     28709



# Regularization using hyper parameters


The provided training dataset is used to train the model.
During initialization, the number of estimators (decision trees), maximum depth, maximum number of features, minimum samples split, minimum samples leaf, random state, and verbosity level are all given.

The model is adjusted to account for the training features and labels.


In [13]:
rf = RandomForestClassifier(n_estimators=200, max_depth=20, max_features='sqrt', 
                             min_samples_split=5, min_samples_leaf=1, 
                             random_state=42, verbose=1)


In [14]:
rf.fit(X_train, y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:  6.2min finished


RandomForestClassifier(max_depth=20, max_features='sqrt', min_samples_split=5,
                       n_estimators=200, random_state=42, verbose=1)

In [15]:
y_pred = rf.predict(X_test)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    0.9s finished


In [16]:
y_train_pred = rf.predict(X_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    5.2s finished


In [17]:
from sklearn.metrics import classification_report

target_names = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
print('Training Classification')
print(classification_report(y_train, y_train_pred, target_names=target_names, labels=[0, 1, 2, 3, 4, 5, 6]))


Training Classification
              precision    recall  f1-score   support

       Angry       1.00      1.00      1.00      3995
     Disgust       1.00      0.99      1.00       436
        Fear       1.00      1.00      1.00      4097
       Happy       1.00      1.00      1.00      7215
         Sad       1.00      1.00      1.00      4830
    Surprise       1.00      1.00      1.00      3171
     Neutral       1.00      1.00      1.00      4965

    accuracy                           1.00     28709
   macro avg       1.00      1.00      1.00     28709
weighted avg       1.00      1.00      1.00     28709



In [18]:
from sklearn.metrics import accuracy_score, classification_report
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.47506269155753694
              precision    recall  f1-score   support

           0       0.56      0.19      0.28       491
           1       1.00      0.31      0.47        55
           2       0.53      0.27      0.36       528
           3       0.45      0.79      0.58       879
           4       0.37      0.39      0.38       594
           5       0.71      0.62      0.66       416
           6       0.44      0.43      0.44       626

    accuracy                           0.48      3589
   macro avg       0.58      0.43      0.45      3589
weighted avg       0.50      0.48      0.45      3589



### Analysis
Accuracy: Both versions have similar accuracy, with roughly 48% accuracy on the test set. This shows that the regularization strategy had no discernible effect on the model's overall accuracy.

Precision: The precision ratings for both models differ depending on the emotion category. However, the precision values for the majority of categories are comparable across the two models. The precision of Category 1 (Disgust) improves significantly with regularization (100% precision with regularization vs. 54% precision without regularization).

Recall: The recall scores varied depending on the mood category. Again, regularization does not result in a significant improvement in recollection. Regularization improves recall in Category 1 (Disgust) (31% recall with regularization vs. 18% recall without regularization).

F1-score: F1-scores, which provide a balance of precision and recall, exhibit patterns comparable to precision and recall. With regularization, Category 1 (Disgust) exhibits the greatest improvement in F1-score.

Overall, Regularization using hyperparameter tuning did not have much improvement.

In [1]:
import joblib

joblib.dump(rf, "random_forest_model.pkl")


NameError: name 'rf' is not defined