# **Facial Expression Recognition Training Notebook**
## **Week 6**
### Sara Manrriquez

In this notebook we will explore the effects of the pretrained model ResNet50 on the facial expression recognition data. ResNet50 is a transfer learning model that was trained on ImageNet, a large dataset of annotated photographs. The benefits of ResNet50 include accelerated training and help with the vanishing gradient problem.

## Import Packages

We import all necessary packages.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pickle

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

import tensorflow as tf
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.preprocessing.image import ImageDataGenerator

from keras.models import Model

## Load Training DataFrame

We load the training data and view the first 5 rows.

In [None]:
train = pd.read_csv('/kaggle/input/challenges-in-representation-learning-facial-expression-recognition-challenge/train.csv')
print(train.shape)

In [None]:
train.head()

## Preprocess Data

The images for this training set are stored as a string. In order to train the model and visualize the images we need to process these strings into a 4D array of pixel values.

In [None]:
train['pixels'] = [np.fromstring(x, dtype=int, sep=' ').reshape(-1,48,48) for x in train['pixels']]

In [None]:
pixels = np.concatenate(train['pixels'])
labels = train.emotion.values

print(pixels.shape)
print(labels.shape)

## Label Distribution

Let's view the distribution of labels.

In [None]:
emotion_prop = (train.emotion.value_counts() / len(train)).to_frame().sort_index(ascending=True)

emotion_prop

In [None]:
emotions = ['Angry','Disgust','Fear','Happy','Sad','Surprise','Neutral']

In [None]:
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']

plt.figure(figsize=[12,6])

plt.bar(x=emotions, height=emotion_prop['emotion'], color=palette, edgecolor='black')
    
plt.xlabel('Emotion')
plt.ylabel('Proportion')
plt.title('Emotion Label Proportions')
plt.show()

As we can see from the distribution of labels, there is a class imbalance within this training set: the emotion happy accounts for about 25% of the data.

## View Sample of Images

We view a sample of images for each emotion: angry, disgust, fear, happy, sad, surprise, and neutral.

In [None]:
plt.close()
plt.rcParams["figure.figsize"] = [16,16]

row = 0
for emotion in np.unique(labels):

    all_emotion_images = train[train['emotion'] == emotion]
    for i in range(5):
        
        img = all_emotion_images.iloc[i,].pixels.reshape(48,48)
        lab = emotions[emotion]

        plt.subplot(7,5,row+i+1)
        plt.imshow(img, cmap='binary_r')
        plt.text(-30, 5, s = str(lab), fontsize=10, color='b')
        plt.axis('off')
    row += 5

plt.show()

## Split, Reshape, and Scale Datasets

We split the data into training and validation sets using a stratified fashion.

In [None]:
X_train, X_valid, y_train, y_valid = train_test_split(
    pixels, labels, test_size=0.2, stratify=labels, random_state=1
)


print('X_train Shape:', X_train.shape)
print('y_train Shape:', y_train.shape)
print()
print('X_valid Shape:', X_valid.shape)
print('y_valid Shape:', y_valid.shape)

ResNet50 was trained with RGB images, and our data is in grayscale. In order to use the pretrained weights of the ResNet50 model we need to convert the single grayscale channel of our images into 3 channels (RGB).

In [None]:
rgb_X_train = np.repeat(X_train[..., np.newaxis], 3, -1)
print(rgb_X_train.shape)

rgb_X_valid = np.repeat(X_valid[..., np.newaxis], 3, -1)
print(rgb_X_valid.shape)

## Image Augmentation

In an effort to prevent overfitting, we use image augmentation to create additional training observations.

In [None]:
train_datagen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.3, 
    height_shift_range = 0.3, 
    zoom_range = 0.3, 
    horizontal_flip = True, 
    fill_mode = 'reflect'
)

train_loader = train_datagen.flow(rgb_X_train, y_train, batch_size=64)

## Transfer Learning with ResNet50

We load the pretrained ResNet50 model and set trainable to false. 

In [None]:
resnet_model = tf.keras.applications.resnet50.ResNet50(
    include_top=False, weights='imagenet', input_shape=(48,48,3))

resnet_model.trainable = False

We view the model summary and plot the model's architecture. 

In [None]:
resnet_model.summary()

In [None]:
tf.keras.utils.plot_model(resnet_model, show_shapes=True)

## Configure Model

To the ResNet50 model we add densely-connected layers, incorporating dropout and batch normalization.

In [None]:
np.random.seed(1)
tf.random.set_seed(1)

cnn = Sequential([
    resnet_model,
    BatchNormalization(),

    Flatten(),
    
    Dense(512, activation='relu'),
    BatchNormalization(),
    Dropout(0.7),
    
    Dense(256, activation='relu'),
    BatchNormalization(),
    Dropout(0.7),
    
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    
    Dense(7, activation='softmax')
])

cnn.summary()


## Train Model

We train the model using the Adam optimizer, a learning rate of 0.0001, and sparse categorical crossentropy loss.

In [None]:
opt = tf.keras.optimizers.Adam(0.0001)
cnn.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

We train the model for 30 epochs. 

In [None]:
%%time 

h1 = cnn.fit(
    train_loader, 
    batch_size=32,
    epochs = 30,
    verbose = 1,
    validation_data = (rgb_X_valid, y_valid)
)

In [None]:
history = h1.history
print(history.keys())

In [None]:
epoch_range = range(1, len(history['loss'])+1)

plt.figure(figsize=[14,4])
plt.subplot(1,2,1)
plt.plot(epoch_range, history['loss'], label='Training')
plt.plot(epoch_range, history['val_loss'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Loss')
plt.legend()
plt.subplot(1,2,2)
plt.plot(epoch_range, history['accuracy'], label='Training')
plt.plot(epoch_range, history['val_accuracy'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()

From the learning curves we can see there is slight overfitting, and the model could benefit from additional epochs.

## Fine-Tune Model

In an effort to enhance our model's performance, we will allow the ResNet50 model to retrain.

In [None]:
resnet_model.trainable = True
tf.keras.backend.set_value(cnn.optimizer.learning_rate, 0.00001)
cnn.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

In [None]:
cnn.summary()

We train the model for 30 epochs. 

In [None]:
%%time 

h2 = cnn.fit(
    train_loader, 
    batch_size=32,
    epochs = 30,
    verbose = 1,
    validation_data = (rgb_X_valid, y_valid)
)

In [None]:
for k in history.keys():
    history[k] += h2.history[k]

epoch_range = range(1, len(history['loss'])+1)

plt.figure(figsize=[14,4])
plt.subplot(1,2,1)
plt.plot(epoch_range, history['loss'], label='Training')
plt.plot(epoch_range, history['val_loss'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Loss')
plt.legend()
plt.subplot(1,2,2)
plt.plot(epoch_range, history['accuracy'], label='Training')
plt.plot(epoch_range, history['val_accuracy'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()

From the learning curves we can see retraining the model increase the accuracy from ~30% to ~44%; however, there is still unerfitting present. 

## Train Final Convolutional Layers

In an effort to enhance the model's performance, we will retrain the top convolutional layers of the ResNet50 model. 

In [None]:
resnet_model = tf.keras.applications.resnet50.ResNet50(
    include_top=False, weights='imagenet', input_shape=(48,48,3))

resnet_model.summary()

In [None]:
print('Number of layers in base model:', len(resnet_model.layers), '\n')

print('Names of last ten layers:')
for layer in resnet_model.layers[-10:]:
    print(layer.name)

In [None]:
resnet_model.trainable = True

for layer in resnet_model.layers[:-10]:
    layer.trainable = False

To the ResNet50 model we add densely-connected layers, incorporating dropout and batch normalization.

In [None]:
np.random.seed(1)
tf.random.set_seed(1)

cnn = Sequential([
    resnet_model,
    BatchNormalization(),

    Flatten(),
    
    Dense(512, activation='relu'),
    BatchNormalization(),
    Dropout(0.7),
    
    Dense(256, activation='relu'),
    BatchNormalization(),
    Dropout(0.7),
    
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    
    Dense(7, activation='softmax')
])

cnn.summary()

### Training Run 1

We train the model using the Adam optimizer, a learning rate of 0.0001, and sparse categorical crossentropy loss.

In [None]:
opt = tf.keras.optimizers.Adam(0.0001)
cnn.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

We train for 30 epochs for the first training run.

In [None]:
%%time 

h1 = cnn.fit(
    train_loader, 
    batch_size=32,
    epochs = 30,
    verbose = 1,
    validation_data = (rgb_X_valid, y_valid)
)

In [None]:
history = h1.history
print(history.keys())

In [None]:
epoch_range = range(1, len(history['loss'])+1)

plt.figure(figsize=[14,4])
plt.subplot(1,2,1)
plt.plot(epoch_range, history['loss'], label='Training')
plt.plot(epoch_range, history['val_loss'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Loss')
plt.legend()
plt.subplot(1,2,2)
plt.plot(epoch_range, history['accuracy'], label='Training')
plt.plot(epoch_range, history['val_accuracy'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()

From the learning curves we can see there is slight overfitting, and the model could benefit from additional epochs.

### Training Run 2

In order to enhance the performance of the model, we will increase the learning rate to 0.00001.

In [None]:
tf.keras.backend.set_value(cnn.optimizer.learning_rate, 0.00001)

We train for another 30 epochs for the second training run.

In [None]:
%%time 

h2 = cnn.fit(
    train_loader, 
    batch_size=32,
    epochs = 30,
    verbose = 1,
    validation_data = (rgb_X_valid, y_valid)
)

In [None]:
for k in history.keys():
    history[k] += h2.history[k]

epoch_range = range(1, len(history['loss'])+1)

plt.figure(figsize=[14,4])
plt.subplot(1,2,1)
plt.plot(epoch_range, history['loss'], label='Training')
plt.plot(epoch_range, history['val_loss'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.title('Loss')
plt.legend()
plt.subplot(1,2,2)
plt.plot(epoch_range, history['accuracy'], label='Training')
plt.plot(epoch_range, history['val_accuracy'], label='Validation')
plt.xlabel('Epoch'); plt.ylabel('Accuracy'); plt.title('Accuracy')
plt.legend()
plt.tight_layout()
plt.show()

From the model we can see the accuracy improved from ~35% to ~37%; however, there is still underfitting present. 

## Save Model and History

We save the model and training history for future reference.

In [None]:
cnn.save('fer_model_v06.h5')
pickle.dump(history, open(f'fer_v06.pkl', 'wb'))

## Summary

Overall, the ResNet50 did not perform well on the facial expression recognition data. The top accuracy was ~44%. This dataset may benefit from another transfer learning model or simple architecture with the addition of image augmentation. 

## Resources

[A Comparison of 4 Popular Transfer Learning Models](https://analyticsindiamag.com/a-comparison-of-4-popular-transfer-learning-models/)<br/>
[Facial Expression Detection 2](https://www.kaggle.com/haneenabdelmaguid/facial-expression-detection-2)<br/>
[How can I use a pre-trained neural network with grayscale images?](https://stackoverflow.com/questions/51995977/how-can-i-use-a-pre-trained-neural-network-with-grayscale-images)<br/>
[Transfer learning & fine-tuning](https://keras.io/guides/transfer_learning/#do-a-round-of-finetuning-of-the-entire-model)<br/>
[Transfer Learning Tutorial (CIFAR 10)](https://www.kaggle.com/drbeane/transfer-learning-tutorial-cifar-10)