# Machine Learning Engineer Nanodegree

## Capstone Project - Smile Detector


---

In this project, CNNs are used to build models to detect if the person in the image is smiling or not.
CelibA dataset is used for this purpose - http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
The data for this project is downloaded from Kaggle - https://www.kaggle.com/jessicali9530/celeba-dataset

Context (from Kaggle)
A popular component of computer vision and deep learning revolves around identifying faces for various applications from logging into your phone with your face or searching through surveillance images for a particular suspect. This dataset is great for training and testing models for face detection, particularly for recognising facial attributes such as finding people with brown hair, are smiling, or wearing glasses. Images cover large pose variations, background clutter, diverse people, supported by a large quantity of images and rich annotations. This data was originally collected by researchers at MMLAB, The Chinese University of Hong Kong.

Content
Overall

202,599 number of face images of various celebrities
10,177 unique identities, but names of identities are not given
40 binary attribute annotations per image
5 landmark locations
Data Files

img_align_celeba.zip: All the face images, cropped and aligned
list_eval_partition.csv: Recommended partitioning of images into training, validation, testing sets. Images 1-162770 are training, 162771-182637 are validation, 182638-202599 are testing
list_bbox_celeba.csv: Bounding box information for each image. "x_1" and "y_1" represent the upper left point coordinate of bounding box. "width" and "height" represent the width and height of bounding box
list_landmarks_align_celeba.csv: Image landmarks and their respective coordinates. There are 5 landmarks: left eye, right eye, nose, left mouth, right mouth
list_attr_celeba.csv: Attribute labels for each image. There are 40 attributes. "1" represents positive while "-1" represents negative.

Since the training is done on a CPU, it is not practical to train on the entire dataset. Hence, a reasonable random subset of the dataset is considered in this project.


In [60]:
from keras.callbacks import ModelCheckpoint, EarlyStopping, CSVLogger
from keras.optimizers import SGD

In [1]:
IMG_H=218
IMG_W=178
IMG_D=3

NUM_IMAGES = 10000
BATCH_SIZE=32

In the code cell below, we populate a few variables through the use of the load_files function from the scikit-learn library:

train_files, valid_files, test_files - numpy arrays containing file paths to images
train_targets, valid_targets, test_targets - numpy arrays containing onehot-encoded classification labels
smile_names - list of string-valued smile categories for translating labels

In [3]:
from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob

# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path, load_content=False)
    smile_files = np.array(data['filenames'])
    smile_targets = np_utils.to_categorical(np.array(data['target']), 2)
    return smile_files, smile_targets

# load train, test, and validation datasets
train_files, train_targets = load_dataset('input/dataset/train')
valid_files, valid_targets = load_dataset('input/dataset/validate')
test_files, test_targets = load_dataset('input/dataset/test')

smile_names = [item[:-1] for item in sorted(glob("input/dataset/train/*/"))]

# print statistics about the dataset
print('There are %d total smile categories.' % len(smile_names))
print('There are %s total smile images.\n' % len(np.hstack([train_files, valid_files, test_files])))
print('There are %d training smile images.' % len(train_files))
print('There are %d validation smile images.' % len(valid_files))
print('There are %d test smile images.'% len(test_files))

There are 2 total smile categories.
There are 15000 total smile images.

There are 10000 training smile images.
There are 2500 validation smile images.
There are 2500 test smile images.


When using TensorFlow as backend, Keras CNNs require a 4D array as input, with shape

(nb_samples,rows,columns,channels), 
where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

The path_to_tensor function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to an image that is  218x178  pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape

(1,218,178,3).
 
The paths_to_tensor function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape

(nb_samples,218,178,3).
 
Here, nb_samples is the number of samples, or number of images, in the supplied array of image paths. Also as VGG19 pretrained model is used, the corresponding preprocess_input function is used for the necessary actions required for VGG19.

In [4]:
from tqdm import tqdm
from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(IMG_H, IMG_W))
    # convert PIL.Image.Image type to 3D tensor with shape (IMG_H, IMG_W, IMG_D)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, IMG_H, IMG_W, IMG_D) and return 4D tensor
    x = preprocess_input(x)
    return np.expand_dims(x, axis=0)
    
def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)


In [6]:
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True                 

# pre-process the data for Keras
#train_tensors = paths_to_tensor(train_files)
#valid_tensors = paths_to_tensor(valid_files)
test_tensors = paths_to_tensor(test_files)

100%|██████████| 2500/2500 [00:04<00:00, 614.08it/s]


Let us now consider the details of the pre-trained VGG19 model

In [7]:
orig_model = VGG19(weights='imagenet',
                  include_top=False,
                    input_shape=(IMG_H, IMG_W, IMG_D))
print("number of layers:", len(orig_model.layers))
orig_model.summary()

number of layers: 22
Model: "vgg19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 218, 178, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 218, 178, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 218, 178, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 109, 89, 64)       0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 109, 89, 128)      73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 109, 89, 128)      147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 54, 

Now we have to extract the bottleneck features for the training and validation sets by running them on VGG-19.

In [8]:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications

# dimensions of our images.


top_model_weights_path = 'input/saved_models/bottleneck_fc_model.h5'
train_data_dir = 'input/dataset/train'
validation_data_dir = 'input/dataset/validate'
nb_train_samples = 10000
nb_validation_samples = 2500
epochs = 50

def save_bottlebeck_features():
    datagen = ImageDataGenerator(
        preprocessing_function = preprocess_input)

    # build the VGG16 network
    model = applications.VGG19(include_top=False, 
                               weights='imagenet', 
                               input_shape=(IMG_H, IMG_W, IMG_D))

    generator = datagen.flow_from_directory(
        train_data_dir,
        target_size=(IMG_H, IMG_W),
        batch_size=BATCH_SIZE,
        class_mode=None,
        shuffle=False)
    
    bottleneck_features_train = model.predict_generator(
        generator, nb_train_samples // BATCH_SIZE)
    
    np.save(open('vgg19_bottleneck_features/bottleneck_features_train.npy', 'wb'),
            bottleneck_features_train)

    generator = datagen.flow_from_directory(
        validation_data_dir,
        target_size=(IMG_H, IMG_W),
        batch_size=BATCH_SIZE,
        class_mode=None,
        shuffle=False)
    bottleneck_features_validation = model.predict_generator(
        generator, nb_validation_samples // BATCH_SIZE)
    
    np.save(open('vgg19_bottleneck_features/bottleneck_features_validation.npy', 'wb'),
            bottleneck_features_validation)

In [10]:
save_bottlebeck_features()

Found 10000 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.


Build and train the 'top-model' for classification using MLPs

In [65]:
checkpointer = ModelCheckpoint(filepath=top_model_weights_path,
                               verbose=1,save_best_only=True)


csv_logger = CSVLogger('logs/training_topmodel_vgg19_sigmoid.log')

earlyStopping = EarlyStopping(verbose = 1, min_delta = 0.01, patience = 5)

def train_top_model():
    train_data = np.load(open('vgg19_bottleneck_features/bottleneck_features_train.npy', 'rb'))
    train_labels = np.array(
        [0] * (nb_train_samples // 2) + [1] * (nb_train_samples // 2))
    train_labels = np_utils.to_categorical(train_labels, 2)
    
    print ("Train data:" , len(train_data) )
    print ("Train labels:" , len(train_labels) )
    
    train_labels = train_labels[:len(train_data)]
    print ("Train labels:" , len(train_labels) )
    
    
    validation_data = np.load(open('vgg19_bottleneck_features/bottleneck_features_validation.npy','rb'))
    validation_labels = np.array(
        [0] * (nb_validation_samples // 2) + [ 1] * (nb_validation_samples // 2))
    validation_labels = np_utils.to_categorical(validation_labels, 2)
    
    print ("validation data:" , len(validation_data) )
    print ("Validation  labels:" , len(validation_labels) )
    
    validation_labels = validation_labels[:len(validation_data)]
    print ("Validation  labels (new):" , len(validation_labels) )

    model = Sequential()
    model.add(Flatten(input_shape=train_data.shape[1:]))
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(2,activation='softmax'))
    #model.add(Dense(1,activation='sigmoid'))

    model.compile(optimizer=SGD(lr=0.0001, momentum=0.9),
                  loss='categorical_crossentropy', metrics=['accuracy'])
    model.summary()
    
    model.fit(train_data, train_labels,
              epochs=epochs,
              batch_size=BATCH_SIZE,
              validation_data=(validation_data, validation_labels),
              callbacks = [checkpointer, csv_logger])

In [66]:
train_top_model()

Train data: 9984
Train labels: 10000
Train labels: 9984
validation data: 2496
Validation  labels: 2500
Validation  labels (new): 2496
Model: "sequential_24"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_24 (Flatten)         (None, 15360)             0         
_________________________________________________________________
dense_47 (Dense)             (None, 256)               3932416   
_________________________________________________________________
dropout_24 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_48 (Dense)             (None, 2)                 514       
Total params: 3,932,930
Trainable params: 3,932,930
Non-trainable params: 0
_________________________________________________________________
Train on 9984 samples, validate on 2496 samples
Epoch 1/50

Epoch 00001: val_loss improved from inf to 0.55

The composite model is built by retaining the VGG19 (w/o top layers) model as the base and the newly trained top model for classification.

In [67]:
from keras import Model
base_model =  applications.VGG19(include_top=False, 
                                      weights='imagenet',
                                      input_shape=(IMG_H, IMG_W, IMG_D))

print ("Base Model layers #: ", len(base_model.layers))

top_model = Sequential()
top_model.add(Flatten(input_shape=base_model.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2,activation='softmax'))

top_model.load_weights(top_model_weights_path)

# add the model on top of the convolutional base
composite_model = Model(inputs= base_model.input, outputs= top_model(base_model.output))

print ("Composite Model layers #: ", len(composite_model.layers))

#composite_model.summary()


Base Model layers #:  22
Composite Model layers #:  23


Now, let us test the composite model against the test dataset

In [73]:
# evaluate and print the test accuracy
# get index of predicted smile detection for each image in test set
smile_prediction = [np.argmax(composite_model.predict(np.expand_dims(test_data, axis=0))) for test_data in test_tensors]

# report test accuracy
test_accuracy = 100*np.sum(np.array(smile_prediction)==np.argmax(test_targets, axis=1))/len(smile_prediction)
print('Test accuracy: %.4f%%' % test_accuracy)

Test accuracy: 81.0400%




Here we see that an accuracy of 81.04% is achieved considering only the composite model and weights considering the subset of the CelibA dataset. Now, we unfreeze the last Convolution block in VGG-19. 

In [78]:
from keras.layers import GlobalAveragePooling2D, Dense, Dropout
from keras.models import Model
from keras.optimizers import SGD

#Freezing all layers except the 5th convolution block in the VGG19 model.

for layer in composite_model.layers[:17]:
    layer.trainable = False

Layer: 0:input_5 Trainable: False
Layer: 1:block1_conv1 Trainable: False
Layer: 2:block1_conv2 Trainable: False
Layer: 3:block1_pool Trainable: False
Layer: 4:block2_conv1 Trainable: False
Layer: 5:block2_conv2 Trainable: False
Layer: 6:block2_pool Trainable: False
Layer: 7:block3_conv1 Trainable: False
Layer: 8:block3_conv2 Trainable: False
Layer: 9:block3_conv3 Trainable: False
Layer: 10:block3_conv4 Trainable: False
Layer: 11:block3_pool Trainable: False
Layer: 12:block4_conv1 Trainable: False
Layer: 13:block4_conv2 Trainable: False
Layer: 14:block4_conv3 Trainable: False
Layer: 15:block4_conv4 Trainable: False
Layer: 16:block4_pool Trainable: False
Layer: 17:block5_conv1 Trainable: True
Layer: 18:block5_conv2 Trainable: True
Layer: 19:block5_conv3 Trainable: True
Layer: 20:block5_conv4 Trainable: True
Layer: 21:block5_pool Trainable: True
Layer: 22:sequential_25 Trainable: True


Now we train the composite model with SGD optimizer

In [81]:
# compile the model (should be done *after* setting layers to non-trainable)
composite_model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), 
                 loss='categorical_crossentropy', 
                 metrics=['accuracy'])

In [82]:
composite_model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         (None, 218, 178, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 218, 178, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 218, 178, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 109, 89, 64)       0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 109, 89, 128)      73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 109, 89, 128)      147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 54, 44, 128)       0   

Next, we shall try improving upon this by making use of Data Augmentation to this model.

In [86]:
USE_AUGMENTATION = True

if (not USE_AUGMENTATION):
    savedFileName = 'input/saved_models/transfer_models.weights.best.hdf5'
else:
    savedFileName = 'input/saved_models/transfer_models_btnk_aug.weights.best.hdf5'
    
checkpointer = ModelCheckpoint(filepath=savedFileName,
                               verbose=1,save_best_only=True)

In [87]:
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    preprocessing_function = preprocess_input,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    zoom_range=0.2,
    horizontal_flip = True)

train_generator = train_datagen.flow_from_directory(
        'input/dataset/train',
        target_size=(IMG_H,IMG_W),
        batch_size=BATCH_SIZE,
        class_mode='categorical')

val_datagen = ImageDataGenerator(
    preprocessing_function = preprocess_input)

val_generator = val_datagen.flow_from_directory(
        'input/dataset/validate',
        target_size=(IMG_H,IMG_W),
        batch_size=BATCH_SIZE,
        class_mode='categorical')


Found 10000 images belonging to 2 classes.
Found 2500 images belonging to 2 classes.


In [88]:
NUM_EPOCHS = 5
csv_logger = CSVLogger('logs/composite_aug_finetune.log', append = True)

if (not USE_AUGMENTATION):    
    composite_model.fit(train_tensors, 
                 train_targets, 
                 validation_data=(valid_tensors, alid_targets),
                 epochs=NUM_EPOCHS, 
                 batch_size=BATCH_SIZE, 
                 callbacks=[checkpointer], 
                 verbose=1)
else:
    composite_model.fit_generator(train_generator,
                          validation_data=val_generator,
                          steps_per_epoch = NUM_IMAGES // BATCH_SIZE,
                          epochs=NUM_EPOCHS, 
                          callbacks=[checkpointer, csv_logger], 
                          verbose=1)

Epoch 1/5

Epoch 00001: val_loss improved from inf to 0.06307, saving model to input/saved_models/transfer_models_btnk_aug.weights.best.hdf5
Epoch 2/5

Epoch 00002: val_loss did not improve from 0.06307
Epoch 3/5

Epoch 00003: val_loss did not improve from 0.06307
Epoch 4/5

Epoch 00004: val_loss improved from 0.06307 to 0.00493, saving model to input/saved_models/transfer_models_btnk_aug.weights.best.hdf5
Epoch 5/5

Epoch 00005: val_loss did not improve from 0.00493


In [89]:
# Load the model with the best validation accuracy
composite_model.load_weights(savedFileName)

In [93]:
# evaluate and print the test accuracy
# get index of predicted smile detection for each image in test set
smile_prediction = [np.argmax(composite_model.predict(np.expand_dims(test_data, axis=0))) for test_data in test_tensors]

# report test accuracy
test_accuracy = 100*np.sum(np.array(smile_prediction)==np.argmax(test_targets, axis=1))/len(smile_prediction)
print('Test accuracy: %.4f%%' % test_accuracy)

Test accuracy: 90.6400%



Here, we have the following:
1. Store bottleneck features from VGG-19 model
2. Train the top-level model using bottle neck features. 
3. Un-freeze the final convolution layer in VGG-19
4. Retrain - the last convolution layer + smile-top-layer with starting weights from previous run.

With the above steps also considering Data Augmentation, an accuracy of 90.64% is achieved.
