In [1]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,Activation,Flatten,BatchNormalization
from tensorflow.keras.layers import Conv2D,MaxPooling2D
import os

In [2]:
num_classes=7

# defines the size of the image array that we will be feeding to our neural network
img_rows,img_cols=48,48

# number of samples processed before the model is updated
batch_size=32

# I. Loading the Data

In [3]:
train_data_dir='Z:\MIT FutureMakers\Week 4\FER-2013/train'
validation_data_dir='Z:\MIT FutureMakers\Week 4\FER-2013/test'

# II. Dataset Modifications

* **Image Augmentation**: a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset
* Keras provides the capability to fit models using image data augmentation via the **ImageDataGenerator** class.


**rotation_range**: Degree range for random rotations.<br />
**shear_range**: Shear Intensity (Shear angle in counter-clockwise direction in degrees).<br /> 
**zoom_range**: Range for random zoom.<br />
**width_shift_range**: This shifts the images by a value across its width.<br />
**height_shift_range** : This shifts the images by a value across its height.<br />
**horizontal_flip**: This flips the images horizontally.<br />
**fill_mode**: This is used to fill in the pixels after making changes to the orientation of the images by the above used methods. Here, we're using ‘nearest’ as the fill mode to fill the missing pixels in the image using the nearby pixels.

In [4]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=30,
    shear_range=0.3,
    zoom_range=0.3,
    width_shift_range=0.4,
    height_shift_range=0.4,
    horizontal_flip=True,
    fill_mode='nearest')
validation_datagen = ImageDataGenerator(rescale=1./255)

In [5]:
# rescaling the validation data

train_generator = train_datagen.flow_from_directory(
 train_data_dir,
 color_mode='grayscale',
 target_size=(img_rows,img_cols),
 batch_size=batch_size,
 class_mode='categorical',
 shuffle=True)

validation_generator = validation_datagen.flow_from_directory(
 validation_data_dir,
 color_mode='grayscale',
 target_size=(img_rows,img_cols),
 batch_size=batch_size,
 class_mode='categorical',
 shuffle=True)

Found 28709 images belonging to 7 classes.
Found 7178 images belonging to 7 classes.


* In the above code, we used the **flow_from_directory()** method to load our dataset from the directory where the augmented images are stored 
* flow_from_directory() takes the path to a directory and generate batches of augmented data
    * Here, we are giving some options to the method to automatically change the dimension and divide it in the classes so that it is easier to feed in the model.

**directory**: The directory of the dataset<br />
**color_mode**: Converting the images to gray-scale<br /> 
**target_size**: Convert the images to a uniform size (ie. 48x48)<br />
**batch_size**: To make batches of data to train<br />
**class_mode**: Here we are using ‘categorical’ as the class mode as we are categorizing the images into 7 classes.<br />
**shuffle**: To shuffle the dataset for better training<br />

# III. Building the Model

We will be using a **Sequential model** which defines that all the layers in the network will be one after the other sequentially and storing it in a variable model.<br />
* The model consists of 7 blocks.

In [6]:
model = Sequential()

### Block 1

* **Conv2D layer** - This layer creates a convolutional layer for the network. Here we are creating a layer with 32 filters and a filter size of (3,3) with padding=’same’ to pad the image and using the kernel initializer *he_normal*. We also have 2 convolutional layers, each followed by an activation and batch normalization layers.
* **Activation layer** — elu activation.
* **BatchNormalization** — Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.
* **MaxPooling2D layer** — Downsamples the input representation by taking the maximum value over the window defined by pool_size for each dimension along the features axis. Here, the pool_size is (2,2).
* **Dropout**: Dropout is a technique where randomly selected neurons are ignored during training. Here, since the dropout is 0.5, it will ignore half the neurons

### Block 2

* Same layers as block-1 but the convolutional layers have 64 filters

### Block 3

* Same layers as block-1 but the convolutional layers have 128 filters

### Block 4

* Same layers as block-1 but the convolutional layers have 128 filters

### Block 5

* **Flatten layer** — To flatten the output of the previous layers in a flat layer or in other words in the form of a vector (ie. a 1D layer)
* **Dense layer** — A densely connected layer where each neuron is connected to every other neuron. Here, we are using 64 units or 64 neurons with a kernel initializer — he_normal.
* These layers are followed by activation layer with elu activation , batch normalization and finally a dropout with 50% dropout.

### Block 6

* Same layers as block 5 but without flatten layer as the input for this block is already flattened.

### Block 7

* **Dense layer** — In the final block of the network, we are using num_classes to create a dense layer having units=number of classes with a he_normal initializer.
* **Activation layer** — Here, we are using a softmax layer which is typically used for multi-class classifications.

In [7]:
#Block-1
model.add(Conv2D(32,(3,3),padding='same',kernel_initializer='he_normal',
                 input_shape=(img_rows,img_cols,1)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(32,(3,3),padding='same',kernel_initializer='he_normal',
                 input_shape=(img_rows,img_cols,1)))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

#Block-2
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(64,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

#Block-3
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(128,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

#Block-4
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Conv2D(256,(3,3),padding='same',kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.2))

#Block-5
model.add(Flatten())
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

#Block-6
model.add(Dense(64,kernel_initializer='he_normal'))
model.add(Activation('elu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

#Block-7
model.add(Dense(num_classes,kernel_initializer='he_normal'))
model.add(Activation('softmax'))

In [8]:
# the overall structure of the model

print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 48, 48, 32)        320       
_________________________________________________________________
activation (Activation)      (None, 48, 48, 32)        0         
_________________________________________________________________
batch_normalization (BatchNo (None, 48, 48, 32)        128       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 48, 48, 32)        9248      
_________________________________________________________________
activation_1 (Activation)    (None, 48, 48, 32)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 48, 48, 32)        128       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 24, 24, 32)        0

# IV. Training the Model

In [9]:
from tensorflow.keras.optimizers import RMSprop, SGD, Adam
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

Before compiling, we will create 3 things using keras.callbacks class:

1) **Checkpoint** (Function — *ModelCheckpoint()*) - monitors the validation loss and will try to minimize the loss using the mode=’min’ property. When the checkpoint is reached it will save the best trained weights.
* **file-path**: Path to save the model file. Here, we are saving the model file with the name EmotionDetectionModel.h5
* **monitor**: Quantity to monitor. Here, we are monitoring the validation loss.
* **mode**: One of {auto, min, max}.
* **save_best_only**: If save_best_only=True, the latest best model according to the quantity monitored will not be overwritten.
* **verbose**: int. 0: quiet, 1: update messages.

2) **Early Stopping** (Function — *EarlyStopping()*) - stops the execution early by checking the following properties.
* **monitor**: Quantity to monitor. Here, we are monitoring the validation loss.
* **min_delta**: Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
* **patience**: Number of epochs with no improvement after which training will be stopped. 
* **restore_best_weights**: Whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the model weights obtained at the last step of training are used.
* **verbose**: int. 0: quiet, 1: update messages.

3) **Reduce Learning Rate** (Function — *ReduceLROnPlateau()*) - Models often benefit from reducing the learning rate by a factor of 2–10 once learning stagnates. This callback monitors a quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced
* **monitor**: To monitor a particular loss. Here, we are monitoring the validation loss.
* **factor**: Factor by which the learning rate will be reduced. new_lr = lr * factor. Here, we are using 0.2 as factor.
* **patience**: Number of epochs with no improvement after which learning rate will be reduced.
* **min_delta**: Threshold for measuring the new optimum, to only focus on significant changes.
* **verbose**: int. 0: quiet, 1: update messages.

In [10]:
checkpoint = ModelCheckpoint('EmotionDetectionModel.h5',
                             monitor='val_loss',
                             mode='min',
                             save_best_only=True,
                             verbose=1)
earlystop = EarlyStopping(monitor='val_loss',
                          min_delta=0,
                          patience=3,
                          verbose=1,
                          restore_best_weights=True
                          )
reduce_lr = ReduceLROnPlateau(monitor='val_loss',
                              factor=0.2,
                              patience=3,
                              verbose=1,
                              min_delta=0.0001)
callbacks = [earlystop,checkpoint,reduce_lr]

Now it’s time to finally compile the model using **model.compile()** and fit or train the model on the dataset using **model.fit_generator()**

**model.compile()** has the following parameters: 
* **loss**: This value will determine the type of loss function to use in your code. Here, we have categorical data in 5 categories or classes so we used ‘categorical_crossentropy’ loss.
* **optimizer**: This value will determine the type of optimizer function to use in your code.Here, we used the Adam optimizer with learning rate 0.001 as it is the best optimizer for categorical data.
* **metrics**: The metrics argument should be a list — your model can have any number of metrics. It is the list of metrics to be evaluated by the model during training and testing. Here we have used accuracy as the metric which will compile the model according to the accuracy.

**model.fit_generator()** fits the model on data yielded batch-by-batch by a Python generator. It has the following parameters:
* **generator**: The train_generator object that we created earlier.
* **steps_per_epochs**: The steps to take on the training data in one epoch.
* **epochs**: The total number of epochs (pass though the whole dataset once).
* **callbacks**: The list containing all the callbacks that we created earlier.
* **validation_data**: The validation_generator object that we created earlier.
* **validation_steps**: The steps to take on the validation data in one epoch.

In [11]:
model.compile(loss='categorical_crossentropy',
 optimizer = Adam(learning_rate=0.001),
 metrics=['accuracy'])
nb_train_samples = 28709
nb_validation_samples = 7178
epochs=25

history=model.fit(
 train_generator,
 steps_per_epoch=nb_train_samples//batch_size,
 epochs=epochs,
 callbacks=callbacks,
 validation_data=validation_generator,
 validation_steps=nb_validation_samples//batch_size)

Epoch 1/25

Epoch 00001: val_loss improved from inf to 1.78354, saving model to EmotionDetectionModel.h5
Epoch 2/25

Epoch 00002: val_loss improved from 1.78354 to 1.76447, saving model to EmotionDetectionModel.h5
Epoch 3/25

Epoch 00003: val_loss improved from 1.76447 to 1.75025, saving model to EmotionDetectionModel.h5
Epoch 4/25

Epoch 00004: val_loss improved from 1.75025 to 1.68322, saving model to EmotionDetectionModel.h5
Epoch 5/25

Epoch 00005: val_loss improved from 1.68322 to 1.54824, saving model to EmotionDetectionModel.h5
Epoch 6/25

Epoch 00006: val_loss improved from 1.54824 to 1.38361, saving model to EmotionDetectionModel.h5
Epoch 7/25

Epoch 00007: val_loss improved from 1.38361 to 1.34065, saving model to EmotionDetectionModel.h5
Epoch 8/25

Epoch 00008: val_loss did not improve from 1.34065
Epoch 9/25

Epoch 00009: val_loss improved from 1.34065 to 1.21880, saving model to EmotionDetectionModel.h5
Epoch 10/25

Epoch 00010: val_loss did not improve from 1.21880
Epoch