# **Facial Expression Recognition**
## **Week 1 Training Notebook**
### Alejandro Alemany, Sara Manrriquez, and Benjamin Zaretzky

In this notebook we will build an image classification model to identify the emotion being expressed in the images of human faces.

## Import Packages

We import all necessary packages. 

In [None]:
import os
os.environ["KMP_SETTINGS"] = "false"
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pickle

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *

## Load Training DataFrame

We load the training data and view the first 5 rows. 

In [None]:
train = pd.read_csv('/kaggle/input/challenges-in-representation-learning-facial-expression-recognition-challenge/train.csv')
print(train.shape)

In [None]:
train.head()

## Preprocess Data

The images for this training set are stored as a string. In order to train the model and visualize the images we need to process these strings into a 4D array of pixel values.

In [None]:
train['pixels'] = [np.fromstring(x, dtype=int, sep=' ').reshape(-1,48,48,1) for x in train['pixels']]

In [None]:
pixels = np.concatenate(train['pixels'])
labels = train.emotion.values

print(pixels.shape)
print(labels.shape)

## Label Distribution

Let's view the distribution of labels. 

In [None]:
emotion_prop = (train.emotion.value_counts() / len(train)).to_frame().sort_index(ascending=True)

emotion_prop

In [None]:
emotions = ['Angry','Disgust','Fear','Happy','Sad','Surprise','Neutral']

In [None]:
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']

plt.figure(figsize=[12,6])

plt.bar(x=emotions, height=emotion_prop['emotion'], color=palette, edgecolor='black')
    
plt.xlabel('Emotion')
plt.ylabel('Proportion')
plt.title('Emotion Label Proportions')
plt.show()

As we can see from the distribution of labels, there is a class imbalance within this training set: the emotion happy accounts for about 25% of the data. 

## View Sample of Images

We view a sample of images for each emotion: angry, disgust, fear, happy, sad, surprise, and neutral.

In [None]:
plt.close()
plt.rcParams["figure.figsize"] = [16,16]

row = 0
for emotion in np.unique(labels):

    all_emotion_images = train[train['emotion'] == emotion]
    for i in range(5):
        
        img = all_emotion_images.iloc[i,].pixels.reshape(48,48)
        lab = emotions[emotion]

        plt.subplot(7,5,row+i+1)
        plt.imshow(img, cmap='binary_r')
        plt.text(-30, 5, s = str(lab), fontsize=10, color='b')
        plt.axis('off')
    row += 5

plt.show()

## Split, Reshape, and Scale Datasets

We split the data into training and validation sets using a stratified fashion, and scale the pixels values between 0 and 1. 

In [None]:
X_train, X_valid, y_train, y_valid = train_test_split(
    pixels, labels, test_size=0.2, stratify=labels, random_state=1
)


print('X_train Shape:', X_train.shape)
print('y_train Shape:', y_train.shape)
print()
print('X_valid Shape:', X_valid.shape)
print('y_valid Shape:', y_valid.shape)

In [None]:
Xs_train = X_train / 255
Xs_valid = X_valid / 255

## Build Network

We set the seed in order to produce the same results each training run. We build the convolutional neural network using a series of 2D convolution layers followed by densely-connected layers. Additionally, we incorporate max pooling, dropout, and batch normalization. 

In [None]:
np.random.seed(1)
tf.random.set_seed(1)

cnn = Sequential([
    Conv2D(64, (3,3), activation = 'relu', padding = 'same', input_shape=(48,48,1)),
    Conv2D(64, (5,5), activation = 'relu', padding = 'same'),
    MaxPooling2D(2,2),
    Dropout(0.5),
    BatchNormalization(),
    
    Conv2D(128, (3,3), activation = 'relu', padding = 'same'),
    Conv2D(128, (3,3), activation = 'relu', padding = 'same'),
    MaxPooling2D(2,2),
    Dropout(0.5),
    BatchNormalization(),

    Flatten(),
    
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    BatchNormalization(),
    Dense(7, activation='softmax')
])

cnn.summary()

## Train Network

We train the model using the Adam optimizer, a learning rate of 0.001, and sparse categorical crossentropy loss. 

In [None]:
opt = tf.keras.optimizers.Adam(0.001)
cnn.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

### Training Run 1

We train for 20 epochs for the first training run. 

In [None]:
%%time 

h1 = cnn.fit(
    Xs_train, y_train, 
    batch_size=256,
    epochs = 20,
    verbose = 1,
    validation_data = (Xs_valid, y_valid)
)

In [None]:
history = h1.history
print(history.keys())

In [None]:
epoch_range = range(1, len(history['loss'])+1)

plt.figure(figsize=[14,4])
plt.subplot(1,2,1)
plt.plot(epoch_range, history['loss'], label='Training')
plt.plot(epoch_range, history['val_loss'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Loss')
plt.legend()
plt.subplot(1,2,2)
plt.plot(epoch_range, history['accuracy'], label='Training')
plt.plot(epoch_range, history['val_accuracy'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()

From the leaning curves we can conclude the model is training well, between 55-60%; however, there is room for improvement. There is slight overfitting, and the model could benefit from additional epochs. 

### Training Run 2

In order to enhance the performance of the model, we will increase the learning rate to 0.0001. 

In [None]:
tf.keras.backend.set_value(cnn.optimizer.learning_rate, 0.0001)

We train for another 20 epochs for the second training run. 

In [None]:
%%time 

h2 = cnn.fit(
    Xs_train, y_train, 
    batch_size=256,
    epochs = 20,
    verbose = 1,
    validation_data = (Xs_valid, y_valid)
)

In [None]:
for k in history.keys():
    history[k] += h2.history[k]

epoch_range = range(1, len(history['loss'])+1)

plt.figure(figsize=[14,4])
plt.subplot(1,2,1)
plt.plot(epoch_range, history['loss'], label='Training')
plt.plot(epoch_range, history['val_loss'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Loss')
plt.legend()
plt.subplot(1,2,2)
plt.plot(epoch_range, history['accuracy'], label='Training')
plt.plot(epoch_range, history['val_accuracy'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()

From the learning curves we can conclude the increased learning rate caused significant overfitting. The training accuracy is ~70%, while the validation accuracy is ~57%. In order to enhance performance, this model may benefit from image augmentation. 

## Save Model and History

We save the model and training history for future reference. 

In [None]:
cnn.save('fer_model_v02.h5')
pickle.dump(history, open(f'fer_v02.pkl', 'wb'))