# Final Report Notebook

## Team Members:

#### Alejandro Alemany, Benjamin Zaretzky, Sara Manrriquez

# Table of Contents

* [Project Description](#Project_Description)
* [Preparing Data](#Preparing_Data)
* [Visualizations](#Visualizations)
* [Technical Challenges](#Technical_Challenges)
* [Weekly Progress](#Week_Progress)
    - [Weekly #1](#Week_1)
    - [Weekly #2](#Week_2)
    - [Weekly #3](#Week_3)
    - [Weekly #4](#Week_4)
* [Summary of Model Architectures](#Summary_of_Model_Architectures)
    - [VGG Architecture](#VGG_Architecture)
    - [ResNet50 Architecture](#ResNet50_Architecture)
    - [Xception Architecture](#Xception_Architecture)
* [Summary of final results](#Summary_of_final_results)
    - [Classification Report](#CR)
    - [Confussion Matrix](#Confussion_Matrix)
    - [Top K Accuracy score](#Top_K_Accuracy_score)
    

<a id="Project_Description"></a>
# Project Description:

Artificial intelligence has been increasingly implicated in any scope of the life of human beings. Technologies adapt to human needs, and artificial intelligence makes this adaptation possible between technology and human beings. 

There are AI techniques very present in algorithms for the recognition of human expressions. When human beings try to communicate with other people, a very high percentage is represented by communication, not verbal. Many studies show that facial expressions have a link with human emotions. The capacity of human beings to detect and identify these emotions makes it possible for us to understand each other. The main objective of this part of artificial intelligence is to use learning techniques so that the machine can identify these emotions.

Facial expression recognition is a part of intelligence whose main objective is to recognize primary affective expression forms on people's faces. But can we offer Machine learning and Deep learning techniques to identify these efficient enough facial expressions? 

In this project, we will apply the mechanisms these technologies offer us to recognize feelings or emotions in people. We will first process and obtain the information we want from a database to do this. This database is a series of pixels from images of people showing different feelings. 

#### Before moving forward in the project, we will explore and get familiar with the dataset.

In [None]:
%%capture

!pip install --upgrade scikit-learn

In [None]:
# Loading libraries needed in the project

%config Completer.use_jedi = False
import os
os.environ["KMP_SETTINGS"] = "false"

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pickle
import scikitplot

from sklearn.model_selection import train_test_split 
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report

import tensorflow as tf
from tensorflow.keras.applications import Xception
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import top_k_accuracy_score

<a id="Preparing_Data"></a>
# Preparing Data

This project provides a training set and a test set. We will be using both sets in this notebook for demonstration purposes. 

In [None]:
# Load the training data
train = pd.read_csv('../input/challenges-in-representation-learning-facial-expression-recognition-challenge/train.csv')

In [None]:
# View the first few rows of the data set
train.head()

After exploring the dataset, we found the first challenge. We do not have pictures to train our model. Instead, we have the pixels from each image. We need to convert those pixels to a format to visualize and prepare our algorithms. 

In [None]:
# Convert the pixels values from a string to a numpy array
train['pixels'] = [np.fromstring(x, dtype=int, sep=' ').reshape(-1,48,48,1) for x in train['pixels']]

<a id="Visualizations"></a>
# Visualizations

After reviewing the labels distributions, we can see that the distributions are similar for most labels. However, "disgust" only accounts for a small percentage. 

In [None]:
# Assign the emotions to the corresponding number and apply them to the DataFrame
emotion_cat = {0:'Anger', 1:'Disgust', 2:'Fear', 3:'Happiness', 4: 'Sadness', 5: 'Surprise', 6: 'Neutral'}
train['emotion'] = train['emotion'].apply(lambda x: emotion_cat[x])

# Create variables for pixels and labels
pixels = np.concatenate(train['pixels'])
labels = train.emotion.values

emotion_prop = (train.emotion.value_counts() / len(train)).to_frame().sort_index(ascending=True)

# Create a bar chart for the labels
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']

plt.figure(figsize=[12,6])

plt.bar(x=emotion_prop.index, height=emotion_prop['emotion'], color=palette, edgecolor='black')
    
plt.xlabel('Emotion')
plt.ylabel('Proportion')
plt.title('Emotion Label Proportions')
plt.show()

In [None]:
# View sample images for each emotion label
plt.close()
plt.rcParams["figure.figsize"] = [16,16]

row = 0
for emotion in np.unique(labels):

    all_emotion_images = train[train['emotion'] == emotion]
    for i in range(5):
        
        img = all_emotion_images.iloc[i,].pixels.reshape(48,48)
        lab = emotion

        plt.subplot(7,5,row+i+1)
        plt.imshow(img, cmap='binary_r')
        plt.text(-30, 5, s = str(lab), fontsize=10, color='b')
        plt.axis('off')
    row += 5

plt.show()

For more visualizations and analysis go to our EDA notebook:

https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-eda

<a id="Technical_Challenges"></a>
# Technical Challenges

As stated above, the first technical challenge we encountered was the pixels format of our data. We had to extract the values from a column in the dataset and convert the string into a vector. 
The application of transfer learning in our project was crucial to increase the performance of our model. However, when we implemented the TensorFlow library for the Xception architecture, we found that the model could not work with images smaller than 71x71. We tried to rescale the images without success. We found the source code and implemented the architecture from scratch as a last resource. ( https://colab.research.google.com/github/mavenzer/Autism-Detection-Using_YOLO/blob/master/Tutorial_implementing_Xception_in_TensorFlow_2_0_using_the_Functional_API.ipynb#scrollTo=uy3q-iLm3VV2)

The model architecture is shown below. For more info, refer to the training notebook:  https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-xception

In [None]:
# Xception Model Architecture

def entry_flow(inputs):

  x = layers.Conv2D(32, 3, strides=2, padding='same')(inputs)
  x = layers.BatchNormalization()(x)
  x = layers.Activation('relu')(x)

  x = layers.Conv2D(64, 3, padding='same')(x)
  x = layers.BatchNormalization()(x)
  x = layers.Activation('relu')(x)

  previous_block_activation = x  # Set aside residual
  
  # Blocks 1, 2, 3 are identical apart from the feature depth.
  for size in [128, 256, 728]:
    x = layers.Activation('relu')(x)
    x = layers.SeparableConv2D(size, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)

    x = layers.Activation('relu')(x)
    x = layers.SeparableConv2D(size, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)

    x = layers.MaxPooling2D(3, strides=2, padding='same')(x)
    
    residual = layers.Conv2D(  # Project residual
        size, 1, strides=2, padding='same')(previous_block_activation)           
    x = layers.add([x, residual])  # Add back residual
    previous_block_activation = x  # Set aside next residual

  return x


def middle_flow(x, num_blocks=8):
  
  previous_block_activation = x

  for _ in range(num_blocks):
    x = layers.Activation('relu')(x)
    x = layers.SeparableConv2D(728, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)

    x = layers.Activation('relu')(x)
    x = layers.SeparableConv2D(728, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)
    
    x = layers.Activation('relu')(x)
    x = layers.SeparableConv2D(728, 3, padding='same')(x)
    x = layers.BatchNormalization()(x)

    x = layers.add([x, previous_block_activation])  # Add back residual
    previous_block_activation = x  # Set aside next residual
    
  return x


def exit_flow(x, num_classes=7):
  
  previous_block_activation = x

  x = layers.Activation('relu')(x)
  x = layers.SeparableConv2D(728, 3, padding='same')(x)
  x = layers.BatchNormalization()(x)

  x = layers.Activation('relu')(x)
  x = layers.SeparableConv2D(1024, 3, padding='same')(x)
  x = layers.BatchNormalization()(x)
  
  x = layers.MaxPooling2D(3, strides=2, padding='same')(x)

  residual = layers.Conv2D(  # Project residual
      1024, 1, strides=2, padding='same')(previous_block_activation)
  x = layers.add([x, residual])  # Add back residual
  
  x = layers.SeparableConv2D(1536, 3, padding='same')(x)
  x = layers.BatchNormalization()(x)
  x = layers.Activation('relu')(x)
  
  x = layers.SeparableConv2D(2048, 3, padding='same')(x)
  x = layers.BatchNormalization()(x)
  x = layers.Activation('relu')(x)
  
  x = layers.GlobalAveragePooling2D()(x)
  if num_classes == 1:
    activation = 'sigmoid'
  else:
    activation = 'softmax'
  return layers.Dense(num_classes, activation=activation)(x)

inputs = keras.Input(shape=(48, 48, 1))  # Variable-size image inputs.
outputs = exit_flow(middle_flow(entry_flow(inputs)))
xception = keras.Model(inputs, outputs)

<a id="Week_Progress"></a>
# Weekly Progress

Every week, we made progress in working with the dataset. We will explain the most significant aspect of each week. For more info, visit the link to each weekly notebook. 

<a id="Week_1"></a>
### Week 1 

We did not use image augmentation or any other tool to explore the dataset during week one. Instead, we implemented a convolutional neural network to see how well it worked on the set. The CNN can be seen below. Our accuracy score for the first week was ~60%.

https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-week-1

In [None]:
np.random.seed(1)
tf.random.set_seed(1)

cnn = Sequential([
    Conv2D(64, (3,3), activation = 'relu', padding = 'same', input_shape=(48,48,1)),
    Conv2D(64, (5,5), activation = 'relu', padding = 'same'),
    MaxPooling2D(2,2),
    Dropout(0.5),
    BatchNormalization(),
    
    Conv2D(128, (3,3), activation = 'relu', padding = 'same'),
    Conv2D(128, (3,3), activation = 'relu', padding = 'same'),
    MaxPooling2D(2,2),
    Dropout(0.5),
    BatchNormalization(),

    Flatten(),
    
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(256, activation='relu'),
    Dropout(0.5),
    BatchNormalization(),
    Dense(7, activation='softmax')
])

cnn.summary()

<a id="Week_2"></a>
### Week 2 

We continued testing CNN to improve our score. We improved the architecture and gained points of accuracy.

https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-week-2

In [None]:
cnn = Sequential([
    Conv2D(64, (3,3), padding='same', input_shape=(48,48,1)),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size=(2,2)),
    Dropout(0.25),

    Conv2D(64, (5,5), padding='same'),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size=(2,2)),
    Dropout(0.25),

    Conv2D(128, (3,3), padding='same'),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size=(2,2)),
    Dropout(0.25),

    Conv2D(128, (3,3), padding='same'),
    BatchNormalization(),
    Activation('relu'),
    MaxPooling2D(pool_size=(2,2)),
    Dropout(0.25),

    Flatten(),

    Dense(128),
    BatchNormalization(),
    Activation('relu'),
    Dropout(0.25),

    Dense(256),
    BatchNormalization(),
    Activation('relu'),
    Dropout(0.25),
    Dense(7, activation='softmax')
])

cnn.summary()

<a id="Week_3"></a>
### Week 3 

We concluded that CNN would not provide improvements, so we started implementing new techniques. We used image augmentation and transfer learning. To explore our transfer learning architecture check the links below: 

- [VGG Architecture](#VGG_Architecture)
- [ResNet50 Architecture](#ResNet50_Architecture)
- [Xception Architecture](#Xception_Architecture)

#### Image Augmentation used in our project. A link to the notebook will be found in the Transfer Learning section. 

In [None]:
train_datagen = ImageDataGenerator(
    rotation_range = 30,
    width_shift_range = 0.2, 
    height_shift_range = 0.2, 
    zoom_range = 0.2, 
    horizontal_flip = True, 
    fill_mode = 'nearest'
)

<a id="Week_4"></a>
### Week 4 

In our last week of experimentation, we decided to implement ensemble learning. The accuracy improved by one point on the dataset. To explore the ensemble learning design visit the following link:

https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-ensemble

<a id="Summary_of_Model_Architectures"></a>
# Summary of Model Architectures

<a id="VGG_Architecture"></a>
### VGG Architecture

This architecture has a poor performance in the dataset. However, our convolutional neural network did perform better than this transfer learning architecture. The best score we got was 36%. The model can be reviewed here: https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-vgg16

In [None]:
np.random.seed(1)
tf.random.set_seed(1)

base_model = tf.keras.applications.VGG16(input_shape=(48,48,1),include_top=False, weights=None)

base_model.trainable = False
tf.keras.utils.plot_model(base_model, show_shapes=True)

<a id="ResNet50_Architecture"></a>
### ResNet50 Architecture

The ResNet50 architecture did perform better than VGG; however, the overall performance was poor compared to our convolutional neural network. The highest accuracy we accomplished with this model was ~44%. The model can be reviewed here: https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-resnet50

In [None]:
base_model = tf.keras.applications.resnet.ResNet50(include_top=False, weights=None, input_shape=(48,48,1))

base_model.trainable = False
tf.keras.utils.plot_model(base_model, show_shapes=True)

<a id="Xception_Architecture"></a>
### Xception Architecture

This architecture was our rock star. The higher accuracy that we accomplished with our CNN was 62%, but we achieved a 66% accuracy when implementing this Architecture. To review the model check the following notebook: https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-xception

In [None]:
np.random.seed(1)
tf.random.set_seed(1)

cnn = xception
cnn.summary()

<a id="Summary_of_final_results"></a>
# Summary of Final Results

In the end, we selected the CNN model using the Xception architecture. We achieved a 66% accuracy on the test set. Those results can be observed in the following classification reports:

In [None]:
# Read in full data set
data = pd.read_csv('../input/challenges-in-representation-learning-facial-expression-recognition-challenge/icml_face_data.csv')
data.columns = ['emotion', 'Usage', 'pixels']
print(data.shape)

In [None]:
# Select only rows that are in the public or private test set
test = data.loc[data["Usage"] != 'Training',['emotion','pixels']]
#test.drop(columns='Usage', inplace=True)
test.head()

In [None]:
# Reshape the pixels
test['pixels'] = [np.fromstring(x, dtype=int, sep=' ').reshape(-1,48,48,1) for x in test['pixels']]

In [None]:
# Combine pixels into single array
pixels = np.concatenate(test['pixels'].values)

print(pixels.shape)

In [None]:
# Standardize the pixels values between 0 and 1
pixels = pixels / 255

In [None]:
# Load model and generate predictions
model = load_model('../input/dsci-598-fa21/team_01_model_05.h5')
test_probs = model.predict(pixels)
test_pred = np.argmax(test_probs, axis=1)

In [None]:
test['predictions'] = test_pred
test.head()

<a id="CR"></a>
### Classification Report

In [None]:
# Assign labels to each value
emotion_cat = {0:'Anger', 1:'Disgust', 2:'Fear', 3:'Happiness', 4: 'Sadness', 5: 'Surprise', 6: 'Neutral'}
test['emotion'] = test['emotion'].apply(lambda x: emotion_cat[x])
test['predictions'] = test['predictions'].apply(lambda x: emotion_cat[x])

In [None]:
# Generate the classification report
my_classification_report = classification_report(test['emotion'], test['predictions'])
print(my_classification_report)

### Confussion_Matrix
<a id="Confussion_Matrix"></a>

In [None]:
# Total number of incorrect predictions
print('Total Wrong Predictions:', np.sum(test['emotion'] != test['predictions']))

In [None]:
# Confusion Matrix
scikitplot.metrics.plot_confusion_matrix(test['emotion'], test['predictions'], figsize=(7,7))    

<a id="Top_K_Accuracy_score"></a>
### Top K Accuracy Score

In [None]:
# View shape of probabilites
test_probs.shape

In [None]:
# Compute Top-K accuracy for each class
for k in range(0, 7):
    print(f"{emotion_cat[k]} top accuracy: {round(top_k_accuracy_score(test['emotion'], test_probs, k=k), 2)}")

### Links to notebooks used in the project and report.

* EDA Notebook - https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-eda
* Final Model Training Notebook -  https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-xception
* Final Model Evaluation Notebook - https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-evaluation
* Class Activation Maps Notebook - https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-cam
* Transfer Learning Notebook - https://www.kaggle.com/robbiebeane/dsci-598-fa21-team-1-xception