# **Facial Expression Recognition**
## **Exploratory Data Analysis**
### Alejandro Alemany, Sara Manrriquez, and Ben Zaretzky
<br/>
In this notebook we explore the facial expression recognition dataset.

## Import Packages
We begin by importing the packages needed for this notebook.

In [None]:
# Import packages for EDA notebook
import os
os.environ["KMP_SETTINGS"] = "false"
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model

## Load Training Data
We will now load the training data which consists of 28,709 facial images. The pixel values for each image are given in an string.

In [None]:
# Load the training data and then view the shape
train = pd.read_csv('/kaggle/input/challenges-in-representation-learning-facial-expression-recognition-challenge/train.csv')
print(train.shape)

In [None]:
# View the first five rows of the training data
train.head()

## Preprocess Data
The pixels values for each image are converted from a string to an array. Then, the numerical labels are replaced with the corresponding emotion for increased interpretability.

In [None]:
# Convert the pixels values from a string to a numpy array
train['pixels'] = [np.fromstring(x, dtype=int, sep=' ').reshape(-1,48,48,1) for x in train['pixels']]

In [None]:
# Assign the emotions to the corresponding number and apply them to the DataFrame
emotion_cat = {0:'Anger', 1:'Disgust', 2:'Fear', 3:'Happiness', 4: 'Sadness', 5: 'Surprise', 6: 'Neutral'}
train['emotion'] = train['emotion'].apply(lambda x: emotion_cat[x])

In [None]:
# Create variables for pixels and labels
pixels = np.concatenate(train['pixels'])
labels = train.emotion.values

## Label Distribution
The distribution of the labels are provided below. They are first presented in a table format and then in a bar chart.

In [None]:
# Calculate the proportions for each emotion
emotion_prop = (train.emotion.value_counts() / len(train)).to_frame().sort_index(ascending=True)
emotion_prop

In [None]:
# Create a bar chart for the labels
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']

plt.figure(figsize=[12,6])

plt.bar(x=emotion_prop.index, height=emotion_prop['emotion'], color=palette, edgecolor='black')
    
plt.xlabel('Emotion')
plt.ylabel('Proportion')
plt.title('Emotion Label Proportions')
plt.show()

From the table and bar chart, we can see that the proportion of images labeled as disgust is significantly lower than the other labels. Also, the proportion of images lableled as happy is signififcantly higher than the other labels. Both of these observations lead to the conclusion that we are not dealing with balanced classes. Imbalanced classes may lead to poor performances from CNN models, especially for the disgust label. Our intuition suggests that images displaying disgust will be confused with anger the most. A confusion matrix for our final model will show if the performance for the disgust class is worse than the other classes. 

## View Sample of Images
In this section, we will view five sample images from each emotion.

In [None]:
plt.close()
plt.rcParams["figure.figsize"] = [16,16]

row = 0
for emotion in np.unique(labels):

    all_emotion_images = train[train['emotion'] == emotion]
    for i in range(5):
        
        img = all_emotion_images.iloc[i,].pixels.reshape(48,48)
        lab = emotion

        plt.subplot(7,5,row+i+1)
        plt.imshow(img, cmap='binary_r')
        plt.text(-30, 5, s = str(lab), fontsize=10, color='b')
        plt.axis('off')
    row += 5

plt.show()

We can use these sample images to get an idea of the distinguishable features for each emotion. The goal of our CNN model is to indentify these features through training and use them to correctly predict the emotion of images the model has not seen. We supsect the area around the mouth will play an important role in differentiating the emotions. For example, a wide circular mouth could indicate fear or surprise. Also, faces with lips close together could indicate sadness or neutral emotions. However, there will not be single feature that can distinguish one emotion from the other. There will be a combination of features that aid in the classification of each emotion. Class activation maps can help highlight key areas in an image that help determine the emotion. These will be discussed in further detail in another notebook.

# Image Augmentation

Image augemenatiton is a technique used to artifically create data from a preexisting data set. The data is created by most commonly by flipping, rotating, shifting, zooming, and blurring. The newly created images are combined with the orginal training set with the hope of increasing the robustness and performance of a deep learning model.

We considered and experimented with image augmentation as a method of reducing the overfitting of previously created models. While this was not the only technique used to address overfitting, image augementation approaches it from the root of the problem, the training set. Below is a sample of the image augmentation used in building our final CNN model. We allowed images to be flipped, shifted, rotated, and zoomed in or out. 

In [None]:
# Create image generator
train_datagen = ImageDataGenerator(
    rotation_range = 30,
    width_shift_range = 0.2, 
    height_shift_range = 0.2, 
    zoom_range = 0.2, 
    horizontal_flip = True, 
    fill_mode = 'nearest'
)

# Use only one image for illustrative example
im_aug_example = train_datagen.flow(train['pixels'][0], batch_size=1)

In [None]:
# Create plot for image augmentation
plt.close()
plt.rcParams["figure.figsize"] = [16,16]

for i in range(12):
    # preparing the subplot
    plt.subplot(3,4,i+1)
    
    # generating images in batches
    batch = im_aug_example.next()
    
    # Remember to convert these images to unsigned integers for viewing 
    image = batch[0].astype('uint8')
    
    # Plotting the data
    plt.imshow(image,cmap='binary_r')
    
# Displaying the figure
plt.show()

We can see from the plot above that all twelve images contain the same face and therefore the same emotion. However, the images are all slightly different due to the augmentation. This helps inhibit our model's ability to memorize training instances. It also increases the size of our training set, which is helpful for smaller data sets.

# Filter Visualization
Filter visualziation can give us an idea about how each filter in our model works. The following functions produce one image for each filter in our model. The filter is meant to be highly responsive to that image. Below we generate images for the first two convolutional layers in our final model.

In [None]:
# Load CNN model for filter visualization
cnn = load_model('../input/dsci-598-fa21/team_01_model_05.h5')
cnn.summary()

In [None]:
#Create functions for filter visualization
def compute_loss(input_image, layer, filter_index):
    feature_extractor = tf.keras.Model(inputs=cnn.inputs, outputs=layer.output)
    activation = feature_extractor(input_image)
    # We avoid border artifacts by only involving non-border pixels in the loss.
    filter_activation = activation[:, 2:-2, 2:-2, filter_index]
    return tf.reduce_mean(filter_activation)

def gradient_ascent_step(img, layer, filter_index, learning_rate):
    with tf.GradientTape() as tape:
        tape.watch(img)
        loss = compute_loss(img, layer, filter_index)
    # Compute gradients.
    grads = tape.gradient(loss, img)
    # Normalize gradients.
    grads = tf.math.l2_normalize(grads)
    img += learning_rate * grads
    return loss, img

def initialize_image():
    img = tf.random.uniform((1, 48, 48, 1))
    # ResNet50V2 expects inputs in the range [-1, +1].
    # Here we scale our random inputs to [-0.125, +0.125]
    return (img - 0.5) * 0.25


def visualize_filter(layer, filter_index, steps, learning_rate):
    img = initialize_image()
    for iteration in range(steps):
        loss, img = gradient_ascent_step(img, layer, filter_index, learning_rate)

    # Decode the resulting input image
    img = deprocess_image(img[0].numpy())
    return loss, img


def deprocess_image(img):
    # Normalize array: center on 0., ensure variance is 0.15
    img -= img.mean()
    img /= img.std() + 1e-5
    img *= 0.15

    # Clip to [0, 1]
    img += 0.5
    img = np.clip(img, 0, 1)

    # Convert to RGB array
    img *= 255
    img = np.clip(img, 0, 255).astype("uint8")
    return img

In [None]:
def display_layer_filters(layer_name, steps=60, learning_rate=1):
    layer = cnn.get_layer(name=layer_name)
            
    n_filters = layer.filters
    n_cols = 8
    n_rows = n_filters // n_cols
    
    print(f'{layer_name} - {n_filters} filters')
    
    plt.figure(figsize=[2*n_cols, 2*n_rows])
    for i in range(n_filters):
        plt.subplot(n_rows, n_cols, i+1)
        loss, img = visualize_filter(layer, i, steps, learning_rate)
        plt.imshow(img, cmap='binary_r')
        plt.axis('off')
    plt.show()

In [None]:
display_layer_filters('conv2d', steps=200)

These filter visualiztion can be hard to understand sometimes. We notice that some of the images contain vertical and horizontal lines of either a light or dark color at the edges. These could indicate that these filters attempt to find some pattern around the edge of an image. 

In [None]:
display_layer_filters('conv2d_1', steps=200)

Compared to the first group of images, these images for the second convolution layer are more complex. Some of the images are almost filled with the same color, while others contain a zebra like pattern.

# Visualize Filter Activations
Instead of creating an image that each filter is responsive to, we can visualize how each filter works on an image in our training set. We run the sample through our network and record where each filter is activated on that image and how strong it is. This allows us to get a sense of where our filters are looking for patterns. Our intuition sugguest that the mouth and eyes should play a vital role in determining emotion. Below we select two random images and view the activations of the first and second convolutional layers. 

In [None]:
# Create function to view filter acitvations
def display_layer(layer_index, activations, cmap):
    layer_activations = activations[layer_index]
    n_filters = layer_activations.shape[-1]
       
    n_cols = 8
    n_rows = n_filters // n_cols
    
    print(f'{cnn.layers[layer_index].name} - {n_filters} Filters')
    plt.figure(figsize=[2*n_cols, 2*n_rows])
    
    for i in range(n_filters):
        img = layer_activations[0,:,:,i]
        plt.subplot(n_rows, n_cols, i+1)
        plt.imshow(img, cmap=cmap)
        plt.axis('off')
    plt.show() 


def display_activations(img_tensor, layer_indices=[], cmap='viridis'):
    layer_outputs = [layer.output for layer in cnn.layers]
    activation_model = tf.keras.models.Model(inputs=cnn.inputs, outputs=layer_outputs)
    activations = activation_model(img_tensor)
    
    for i in layer_indices:
        display_layer(i, activations, cmap)

In [None]:
# Select two images and plot them
row0 = train.iloc[1000,:]
img0 = all_emotion_images.iloc[1000,].pixels.reshape(48,48)    

row1 = train.iloc[100,:]
img1 = all_emotion_images.iloc[100,].pixels.reshape(48,48)    

plt.subplot(1,2,1)
plt.imshow(img0, cmap='binary_r')
plt.text(0, -2, row0[0], color='k')
plt.axis('off')

plt.subplot(1,2,2)
plt.imshow(img1, cmap='binary_r')
plt.text(0, -2, row1[0], color='k')
plt.axis('off')

plt.show()

In [None]:
tensor0 = img0.reshape(1,48,48,1)/255
tensor1 = img1.reshape(1,48,48,1)/255

## Filter Activations for the First Image

In [None]:
display_activations(tensor0, [1,4], cmap='viridis')

## Filter Activations for the Second Image

In [None]:
display_activations(tensor1, [1,4], cmap='viridis')

We can see for some of the filters, the eyes and mouth are important areas. While other filters seem to focus on the outline of the face or the background. This is not a complete overview of how these filter works, but it does provide some evidence that our model is recognizing these facial features as factors that are helpful in determining emotion.