# Code

**Standards:** 

We use Python version 3.7.10 and scikit-learn packages for model training and prediction. Specifically, we use BernoulliNB, MultinomialNB, GaussianNB, svm, and KNeighborsClassifier packages from scikit-learn for our baseline models, and the MLPClassifier package for our experimental CNN model. Our accuracy measurements come from the scikit-learn accuracy_score and classification_report packages, and we find optimal parameters for our models using GridSearchCV.

We also utilize the pandas and numpy libraries, as well as train_test_split and StandardScaler packages from the scikit-learn library, for data preprocessing and management.

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
import time
!python --version

### Data Import 

Import data and take 10,000 samples from the data for model training and assessment, for time efficiency. The entire dataset has 35,887 rows; some of our models took over 9 hours to train on that much data. For this reason, we narrow the data by taking a sample in order to manage the amount of time spent on training.

In [4]:
# import data
# all_data = pd.read_csv('icml_face_data.csv')
all_data = pd.read_csv('../input/facial-expression-recognition-challenge/icml_face_data.csv/icml_face_data.csv')
# pd.read_csv('/kaggle/input/challenges-in-representation-learning-facial-expression-recognition-challenge/train.csv')
# all_data = all_data[0:700] # just for dev... remove for actual training
print(all_data.shape)
all_data = all_data.sample(n=10000, random_state=1)

accuracy = {}
params = {}

### Data Preprocessing

After renaming the Usage and pixels columns for formatting, we define the function `pixels_to_arr` that convert a pandas Series of pixels to a numpy array of pixels, and apply it to the pixels column of our dataframe. This produces a one-dimensional array of 2,304 pixel values for each row of data.

To make those arrays more usable, we then define the function `image_reshape` that reshapes each pixel array to a 48x48 matrix. Finally, we define the X and Y values we will feed into our models using the reshaped pixel matrices and their corresponding labels. We also create a `y_group` Series here that contains four labels, grouping the following emotions together: Angry/Sad, Fear/Surprise, Happy, and Neutral. Disgust is excluded because it has very little support (27 samples) in the test set.

In [17]:
all_data.rename({' Usage': 'Usage', ' pixels': 'pixels'}, axis=1, inplace=True)

In [18]:
def pixels_to_arr(pixels):
    array = np.array(pixels.split(),'float64')
    return array

all_data['pixels_arr'] = all_data['pixels'].apply(pixels_to_arr)

In [19]:
def image_reshape(data):
    image = np.reshape(data['pixels_arr'].to_list(),(data.shape[0],48,48,1))
    return image

X = image_reshape(all_data)
y = all_data['emotion']
y_group = all_data['emotion_group']

#### Prepare train and test sets

Here we create train and test sets using our 10,000-row dataframe. The training set has 8,000 rows and the testing set has 2,000 rows, and each contains 2,304 pixel values per row. The unraveled versions contain each value in its own column of the dataframe, and the standard versions contain the values as a single array. The unraveled format is used because it is better than a single array of pixel values for training models.

In [23]:
x_unraveled = pd.DataFrame(list(map(np.ravel, all_data['pixels_arr'])))
X_train_unrav, X_test_unrav, y_train_unrav, y_test_unrav = train_test_split(x_unraveled, y, test_size=0.2, random_state=12345)
print("Pixels as columns")
print("Training data shape: ", X_train_unrav.shape)
print("Test data shape", X_test_unrav.shape)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(all_data['pixels_arr'], y, test_size=0.2, random_state=12345)
print("Pixels as Single array")
print("Training data shape: ", X_train.shape)
print("Test data shape", X_test.shape)

Using additional emotion groupings (`y_group`):

In [None]:
X_train, X_test, y_train_g, y_test_g = train_test_split(all_data['pixels_arr'], y_group, test_size=0.2, random_state=12345)
X_train_unrav, X_test_unrav, y_train_unrav_g, y_test_unrav_g = train_test_split(x_unraveled, y_group, test_size=0.2, random_state=12345)
X_train_im, X_test_im, y_train_im_g, y_test_im_g = train_test_split(X, y_group, test_size=0.2, random_state=12345)

### Data Examination

This is a preliminary examination of what is contained in our data. The first code block produces a bar chart showing the proportions of each labeled emotion within the 10,000-row dataframe. There is a notably higher proportion of "happy" samples than there are any other emotion, and a very low proportion of the samples are labeled "disgust". The rest of the labels comprise between 10-20% of the limited dataframe.

The next block displays 5 samples with each label.

In [15]:
emotion_prop = (all_data.emotion.value_counts() / len(all_data)).to_frame().sort_index(ascending=True)

emotions = ['Angry','Disgust','Fear','Happy','Sad','Surprise','Neutral']
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']

plt.figure(figsize=[12,6])

plt.bar(x=emotions, height=emotion_prop['emotion'], color=palette, edgecolor='black')
    
plt.xlabel('Emotion')
plt.ylabel('Proportion')
plt.title('Emotion Label Proportions')
plt.show()

In [20]:
row = 0
for emotion in list(range(7)):

    all_emotion_images = all_data[all_data['emotion'] == emotion]
    for i in range(5):
        
        img = all_emotion_images.iloc[i,].pixels_arr.reshape(48,48)
        lab = emotions[emotion]
        
        plt.subplot(7,5,row+i+1)
        plt.imshow(img, cmap='binary_r')
        plt.axis('off')
    plt.text(-600, 27, s = str(lab), fontsize=10)
    row += 5

plt.show()

# Baseline Models

We train and measure the prediction accuracy of 5 baseline models to later compare to our experimental model. Each model is trained on the unraveled training data, and tested on the unraveled testing data. We use GridSearchCV to perform a 5-fold cross-validated grid search for the optimal parameters with regard to accuracy for each model. The `accuracy` scoring parameter passed into the GridSearchCV function calls the `accuracy_score` function of scikit-learn, which returns the fraction of correctly classified samples in the test set. This is the value we use for each model's accuracy as a whole.

## K Nearest Neighbors Model

The grid search for KNN model parameters includes all values of n between 1 and 19. It finds that the optimal number of neighbors for the 7-emotion model is 15; a model trained with these parameters is 28.15% accurate. For the 4-emotion model, the optimal number of neighbors is 2 and the model is 36.95% accurate.

In [None]:
knn = KNeighborsClassifier(algorithm='auto')
param_grid = dict(n_neighbors=list(range(1, 16)))
model_KNN = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy',verbose=1)

model_KNN.fit(X_train_unrav, y_train)
y_pred_KNN = model_KNN.predict(X_test_unrav)

print(model_KNN.best_params_)

accuracy['KNN'] = accuracy_score(y_pred_KNN, y_test)
params['KNN'] = model_KNN.best_params_
print(f"The model is {accuracy['KNN']*100:.2f}% accurate")

print('Classification Report:',classification_report(y_test,y_pred_KNN))

In [None]:
print(model_KNN.best_params_)

Additional groupings:

In [None]:
knn = KNeighborsClassifier(algorithm='auto')
param_grid = dict(n_neighbors=list(range(1, 20)))
model_KNN = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy',verbose=1)

model_KNN.fit(X_train_unrav, y_train_g)
y_pred_KNN = model_KNN.predict(X_test_unrav)

print(model_KNN.best_params_)

accuracy['KNN_group'] = accuracy_score(y_pred_KNN, y_test_g)
params['KNN_group'] = model_KNN.best_params_
print(f"The model is {accuracy['KNN_group']*100:.2f}% accurate")

print('Classification Report:',classification_report(y_test_g,y_pred_KNN))

## SVM Model

The grid search for SVM model parameters includes 5 possible values for the regularization parameter ('C'), four possible values for the kernel coefficient ('gamma'), and three possible kernel types. 

The best possible parameters from these options are a regularization paremeter of 0.1, a gamma value of 0.0001 and a linear kernel type. An SVC model from the scikit-learn SVM package trained with these parameters is 25% accurate.

In [None]:
param_grid = {'C':[0.01,0.1,1,10,100],'gamma':[0.0001,0.001,0.1,1],'kernel':['rbf','poly', 'linear']}
svc = svm.SVC(probability=True)
print("The training of the model is started, please wait for while as it may take few minutes to complete")

model_SVM = GridSearchCV(svc,param_grid)

start = time.time()
model_SVM.fit(X_train_unrav,y_train)
end = time.time()

print(f"Train time {end-start}")

In [None]:
print('The Model is trained well with the given images')
print(model_SVM.best_params_)
# {'C': 0.1, 'gamma': 0.0001, 'kernel': 'linear'}
# The model is 25.00% accurate 

In [None]:
y_pred_SVM = model_SVM.predict(X_test_unrav)
accuracy['SVM'] = accuracy_score(y_pred_SVM, y_test)
params['SVM'] = model_SVM.best_params_

print(f"The model is {accuracy['SVM']*100:.2f}% accurate")

## Bernoulli Naive Bayes Model

A grid search for Bernoulli Naive Bayes parameters searches for the best smoothing parameter ('alpha') from a list of 9 possibilities between 0 and 10. The model is given a binarization value of 0.1, meaning that any pixel values less than 0.1 are replaced by 0, and any above 0.1 are replaced by 1. 

In this configuration, the 7-emotion model has an optimal alpha value of 2.0. The trained 7-emotion model with this alpha value is 17.20% accurate. The 4-emotion model has an optimal alpha value of 1.0 which yields a 29.65% accurate model.


In [None]:
alphas = {'alpha': [1.0e-10, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]}

bnb = BernoulliNB(binarize=0.1)
model_BNB = GridSearchCV(bnb, alphas, scoring='accuracy')
model_BNB.fit(X_train_unrav, y_train) 

y_pred_BNB = model_BNB.predict(X_test_unrav) 
accuracy['BNB'] = accuracy_score(y_pred_BNB,y_test)
params['BNB'] = model_BNB.best_params_

print(f"The model is {accuracy['BNB']*100:.2f}% accurate")

print('Classification Report:',classification_report(y_test,y_pred_BNB))

In [None]:
print(model_BNB.best_params_)

Additional groupings:

In [None]:
alphas = {'alpha': [1.0e-10, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]}

bnb = BernoulliNB(binarize=0.1)
model_BNB = GridSearchCV(bnb, alphas, scoring='accuracy')
model_BNB.fit(X_train_unrav, y_train_g) 

y_pred_BNB = model_BNB.predict(X_test_unrav) 
accuracy['BNB_group'] = accuracy_score(y_pred_BNB,y_test_g)
params['BNB_group'] = model_BNB.best_params_

print(f"The model is {accuracy['BNB_group']*100:.2f}% accurate")

print('Classification Report:',classification_report(y_test_g,y_pred_BNB))

## Multinomial Naive Bayes Model

A grid search for Multinomial Naive Bayes parameters also searches for the best smoothing parameter ('alpha') from the same list of 9 possibilities between 0 and 10 that was used for the Bernoulli Naive Bayes grid search. Data fed into the model is trinarized with thresholds of 0.25 and 0.75. This means that any pixel values less than 0.25 are replaced by 0, any between 0.25 and 0.75 are replaced by 1, and any above 0.75 are replaced by 2.

In this configuration, the 7-emotion optimal alpha value is 10, and the resulting model is 22.65% accurate. The 4-emotion model has an optimal alpha value of 10, and is 33.70% accurate.

In [None]:
# GRID SEARCH
def trinarize(data, lower, upper):
        trinarized_data = np.zeros(data.shape)
        trinarized_data[(data <= lower)] = 0
        trinarized_data[(data > lower)] = 1
        trinarized_data[(data >= upper)] = 2
        return trinarized_data

alphas = {'alpha': [1.0e-10, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]}

mnb = MultinomialNB()
model_MNB = GridSearchCV(mnb, alphas, scoring='accuracy')
model_MNB.fit(trinarize(X_train_unrav,0.25,0.75), y_train)
y_pred_MNB = model_MNB.predict(trinarize(X_test_unrav,0.25,0.75))
accuracy['MNB'] = accuracy_score(y_pred_MNB,y_test)
params['MNB'] = model_MNB.best_params_

print(f"The Naive Bayes model is {accuracy['MNB']*100:.2f}% accurate")

print('Classification Report:',classification_report(y_test,y_pred_MNB))

In [None]:
print(model_MNB.best_params_)

Additional groupings:

In [None]:
# GRID SEARCH
def trinarize(data, lower, upper):
        trinarized_data = np.zeros(data.shape)
        trinarized_data[(data <= lower)] = 0
        trinarized_data[(data > lower)] = 1
        trinarized_data[(data >= upper)] = 2
        return trinarized_data

alphas = {'alpha': [1.0e-10, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]}

mnb = MultinomialNB()
model_MNB = GridSearchCV(mnb, alphas, scoring='accuracy')
model_MNB.fit(trinarize(X_train_unrav,0.25,0.75), y_train_g)
y_pred_MNB = model_MNB.predict(trinarize(X_test_unrav,0.25,0.75))
accuracy['MNB_group'] = accuracy_score(y_pred_MNB,y_test_g)
params['MNB_group'] = model_MNB.best_params_

print(f"The Naive Bayes model is {accuracy['MNB_group']*100:.2f}% accurate")

print('Classification Report:',classification_report(y_test_g,y_pred_MNB))

## Gaussian Naive Bayes Model

A grid search for Gaussian Naive Bayes parameters searches for the best smoothing variance, a portion of the largest variance of all the features, to add to each feature variance. This value is chosen from a collection of 100 possibilities in the log space between 0 and -9. 

The optimal smoothing variance for the 7-emotion model is ~0.0023, and the resulting model is 21.75% accurate. The optimal smoothing variance for the 4-emotion model is ~0.0035, and the resulting model is 36.15% accurate.

In [34]:
#GRID SEARCH
alphas = {'alpha': [1.0e-10, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]}
param_grid = {'var_smoothing': np.logspace(0,-9, num=100)}

gnb2 = GaussianNB()
model_GNB2 = GridSearchCV(gnb2, param_grid, scoring='accuracy')
model_GNB2.fit(X_train_unrav, y_train)
y_pred_GNB2 = model_GNB2.predict(X_test_unrav)
accuracy['GNB'] = accuracy_score(y_pred_GNB2,y_test)
params['GNB'] = model_GNB2.best_params_

print('Classification Report:',classification_report(y_test,y_pred_GNB2))

In [35]:
print(model_GNB2.best_params_)

Additional groupings:

In [None]:
#GRID SEARCH
alphas = {'alpha': [1.0e-10, 0.0001, 0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 10.0]}
param_grid = {'var_smoothing': np.logspace(0,-9, num=100)}

gnb2 = GaussianNB()
model_GNB2 = GridSearchCV(gnb2, param_grid, scoring='accuracy')
model_GNB2.fit(X_train_unrav, y_train_g)
y_pred_GNB2 = model_GNB2.predict(X_test_unrav)
accuracy['GNB_group'] = accuracy_score(y_pred_GNB2,y_test_g)
params['GNB_group'] = model_GNB2.best_params_

print(f"Gaussian Naive Bayes model with var_smoothing set to 0.1 had accuracy {accuracy['GNB_group']*100:.2f}%")

print('Classification Report:',classification_report(y_test_g,y_pred_GNB2))