## Facial Key Point Detection

This is using the Kaggle Dataset, wherein the training data has labels for 15 different coordinates (x,y) on an individual's face, making it 30 labels in total.  This is so because, these are the key points that help identify an individual's face.

The goal of this exercise is to build a Deep Neural Network using Keras and make predictions on the validation / test datasets.  Validation dataset is part (20%) of the original training dataset that was identified as good ones with all labels.  Test dataset is the original test set from Kaggle website and does not have labels for us to validate.  However, we can predict on the test set and plot the predictions on the images to visually validate our model's performance.

We will use Keras library for this exercise and try tuning several hyper-parameters to identify what gets best results.  Hyper-parameters tuned includes optimizers, filters, kernel size, number of convolutional layers etc.  Further enhancements can be done by applying blurring, contrast enhancements etc. using image processing techniques.

In [4]:
import cv2
from skimage import exposure

In [5]:
import matplotlib.pyplot as plt
%matplotlib inline

In [6]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Activation, BatchNormalization
from keras import optimizers
from keras import backend as K

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [18]:
import os
import numpy as np
from pandas.io.parsers import read_csv
from sklearn.utils import shuffle
import pandas as pd
from sklearn.model_selection import train_test_split

### Load Dataset

The following function loads the dataset and does clean-up to exclude ones with missing labels.  The training data and labels are then returned to the calling function.


In [19]:
def load(test=False):
    """ Function to load the dataset into np arrays
    
        Argument: 
        test - boolean value to indicate 'test' if True and 'training' if False 

        Returns: 
        X: np array holding training / test data
        y: np array holding labels
        cols: column names (30 data points that are labels)
    """
    
    # files for training and test datasets
    FTRAIN = 'training/training.csv'
    FTEST = 'test/test.csv'
    
    filename = FTEST if test else FTRAIN

    df = read_csv(os.path.expanduser(filename))
    df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
    cols = df.columns
    
    # normalize values
    X = np.vstack(df['Image'].values)/255.

    # labels missing in training should be removed from training
    if (test==False):
        y = df[df.columns[0:30]].values
        X = X[~ np.isnan(y).any(axis=1)]
        y = y[~ np.isnan(y).any(axis=1)]
        X, y = shuffle(X, y, random_state=42)
    else:
        y = None
        cols = None

    X = X.astype(np.float32)
    
    # return X (data), y (labels) and cols (column names)
    return X, y, cols

def load_2D(test=False):
    """ Load into 2D by reshaping
    
        Argument: 
        test - boolean value to indicate 'test' if True and 'training' if False 

        Returns: 
        X: np array holding training / test data
        y: np array holding labels
        cols: column names (30 data points that are labels)
    """
    X, y, cols = load(test)
    X = X.reshape(-1, 96, 96, 1)
    
    return X, y, cols

In [20]:
# Load training data
X_train, y_train, label_cols = load_2D(test=False)


In [21]:
# implicitly split the training dataset into 'train' and 'validation' datasets (test_size=0.2)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

In [25]:
X_valid.shape

(428, 96, 96, 1)

In [26]:
df_train = pd.DataFrame(X_train.reshape(1712,9216))
df_train.to_csv('new_training.csv', index=False)
#fkp_id[['RowId', 'Location']].to_csv('Submission1.csv', index=False)

In [27]:
df_valid = pd.DataFrame(X_valid.reshape(428,9216))
df_valid.to_csv('new_validation.csv', index=False)


In [28]:
df_train_labels = pd.DataFrame(y_train)
df_train_labels.to_csv('new_training_labels.csv', index=False)


In [29]:
df_valid_labels = pd.DataFrame(y_valid)
df_valid_labels.to_csv('new_validation_labels.csv', index=False)


In [24]:
y_train

array([[62.76970213, 35.51509787, 30.51574468, ..., 68.07285106,
        48.16408511, 79.02706383],
       [65.39765227, 36.96986932, 31.02671096, ..., 72.11070393,
        51.36959342, 77.76892257],
       [64.76246617, 34.93338947, 31.85972932, ..., 69.44914286,
        52.18213534, 83.32006015],
       ...,
       [66.51187302, 30.94238549, 29.68014512, ..., 75.56190476,
        62.43809524, 82.5170068 ],
       [67.64129032, 37.1708129 , 31.65754839, ..., 72.11605161,
        49.99587097, 84.22606452],
       [64.69317293, 35.07406917, 28.56396992, ..., 63.46105263,
        48.88637594, 75.39681203]])

In [15]:
df_new = pd.read_csv('new_training.csv')

In [16]:
df_new.shape

(1712, 9216)

In [13]:
df_new1 = np.array(df_new)
df_new1 = df_new.reshape(1712, 96,96, 1)
df_new1.shape

(1712, 96, 96, 1)

In [8]:
print("Train data shape: ", X_train.shape, " and Train label shape: ", y_train.shape)
print("Labels: ", label_cols[:-1])

Train data shape:  (1712, 96, 96, 1)  and Train label shape:  (1712, 30)
Labels:  Index(['left_eye_center_x', 'left_eye_center_y', 'right_eye_center_x',
       'right_eye_center_y', 'left_eye_inner_corner_x',
       'left_eye_inner_corner_y', 'left_eye_outer_corner_x',
       'left_eye_outer_corner_y', 'right_eye_inner_corner_x',
       'right_eye_inner_corner_y', 'right_eye_outer_corner_x',
       'right_eye_outer_corner_y', 'left_eyebrow_inner_end_x',
       'left_eyebrow_inner_end_y', 'left_eyebrow_outer_end_x',
       'left_eyebrow_outer_end_y', 'right_eyebrow_inner_end_x',
       'right_eyebrow_inner_end_y', 'right_eyebrow_outer_end_x',
       'right_eyebrow_outer_end_y', 'nose_tip_x', 'nose_tip_y',
       'mouth_left_corner_x', 'mouth_left_corner_y', 'mouth_right_corner_x',
       'mouth_right_corner_y', 'mouth_center_top_lip_x',
       'mouth_center_top_lip_y', 'mouth_center_bottom_lip_x',
       'mouth_center_bottom_lip_y'],
      dtype='object')


In [9]:
print("Validation dataset shape: ", X_valid.shape, " and Validation label shape: ", y_valid.shape)

Validation dataset shape:  (428, 96, 96, 1)  and Validation label shape:  (428, 30)


In [10]:
# Load training data
X_test, y_test, label_cols = load_2D(test=True)


In [11]:
print("Test data shape: ", X_test.shape, " and Test label shape: ", y_test)
print("Labels: ", label_cols)

Test data shape:  (1783, 96, 96, 1)  and Test label shape:  None
Labels:  None
