### Hyper-Parameter Tuning Methodology in Task A1 (Model 1)

This Jupyter Notebook shows the methodology used in task A1 to pick the best parameters for model 1. This model uses face landmarks (provided in lab 2) as features for a Support Vector Machine (SVM).

In order to observe the impact of the models hyper-parameters, Grid Search Cross-Validation was performed with a variety of possible parameters. This method undertakes an exhaustive search over given parameter settings, as to find the combination of parameters which will perform best.

In [1]:
# Import statements
import time, os, sys
import numpy as np
import pandas as pd

from matplotlib import image
import matplotlib.pyplot as plt 

from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_validate, GridSearchCV, train_test_split

sys.path.append('../HelperFunctions/')
import landmarksA1 as landmarks

### Importing & pre-processing data

The steps taken when importing & pre-processing the data are the same as the ones performed in the final model in A1.py, and described in the report.

In [2]:
def mainA1Landmarks():
    '''
    Extracts facial landmarks for each picture
    Performs train/test spliting (90% train, 10% test)
    Implements dimensionality reduction by scaling and performing PCA
    
    Returns:
        - pca_train : Train dataset of facial landmarks after PCA
        - pca_test : Test dataset of facial landmarks after PCA
        - lbs_train : Labels of training dataset
        - lbs_test : Labels of testing dataset
    '''
    
    # Extracting facil landmarks
    imgs, lbs = landmarks.extract_features_labels('../Datasets/dataset/A/original/')

    # Splitting data into 90% train and 10% test
    tr_data, te_data, lbs_train, lbs_test = train_test_split(imgs, lbs, test_size=0.1)
    data_train = tr_data.reshape(tr_data.shape[0], tr_data.shape[1]*tr_data.shape[2])
    data_test = te_data.reshape(te_data.shape[0], te_data.shape[1]*te_data.shape[2])

    # Applying dimensionality reduction
    pca_train, pca_test = dimensionality_reduction(data_train, data_test)

    return pca_train, pca_test, lbs_train, lbs_test
 

def dimensionality_reduction(train_data, test_data):
    '''
    Scales train and test datasets
    Implements Principal Component Analysis (PCA) on both datasets

    Keyword arguments:
        - train_data : Raw train dataset of facial landmarks
        - test_data : Raw test dataset of facial landmarks

    Returns:
        - train_pca : Train dataset of facial landmarks after PCA
        - test_pca : Train dataset of facial landmarks after PCA
    '''

    # Scaling both datasets
    scaler = StandardScaler()
    scaler.fit(train_data)
    train_data = scaler.transform(train_data)
    test_data = scaler.transform(test_data)

    # Applying PCA to both datasets
    pca = PCA(n_components = 'mle', svd_solver = 'full')
    pca.fit(train_data)
    train_pca = pca.transform(train_data)
    test_pca = pca.transform(test_data)

    return train_pca, test_pca

In [3]:
pca_train, pca_test, lbs_train, lbs_test = mainA1Landmarks()

### Grid Search Cross-Validation with PCA

In [4]:
# Parameter distribution to perform the search on
param_dist = { 
    # Kernel type to be used in the algorithm
    'kernel': ('linear', 'rbf'),   

    # Regularization parameter
    'C': [0.1,0.3,1,3,10,30],

    # Kernel coefficient if kernel is 'rbf'
    'gamma': ['scale',0.001,0.01,0.1,0.3,1],

    # Specifying the seed for random distribution of data
    'random_state': [42]
}

In [5]:
def report(results, n_top=3):
    '''
    Helper function to report best scores for model
    '''
    
    for i in range(1, n_top + 1): 
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                results['mean_test_score'][candidate],
                results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

In [6]:
# Running Grid Search

clf = SVC()
grid_search = GridSearchCV(clf, param_grid=param_dist, cv=5)
start = time.time()
grid_search.fit(pca_train, lbs_train)

print("GridSearchCV took %.2f minutes for %d candidate parameter settings."
    % (round((time.time() - start)/60,2), len(grid_search.cv_results_['params'])))
print("")

report(grid_search.cv_results_)

GridSearchCV took 31.54 minutes for 72 candidate parameter settings.

Model with rank: 1
Mean validation score: 0.921 (std: 0.009)
Parameters: {'C': 3, 'gamma': 'scale', 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.009)
Parameters: {'C': 3, 'gamma': 0.001, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.009)
Parameters: {'C': 3, 'gamma': 0.01, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.009)
Parameters: {'C': 3, 'gamma': 0.1, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.009)
Parameters: {'C': 3, 'gamma': 0.3, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.009)
Parameters: {'C': 3, 'gamma': 1, 'kernel': 'linear', 'random_state': 42}



### Grid Search Cross-Validation without PCA

In [7]:
def mainA1LandmarksSansPCA():
    '''
    Extracts facial landmarks for each picture
    Performs train/test spliting (90% train, 10% test)
    
    Returns:
        - data_train : Train dataset of facial landmarks
        - data_test : Test dataset of facial landmarks
        - lbs_train : Labels of training dataset
        - lbs_test : Labels of testing dataset
    '''
    
    # Extracting facil landmarks
    imgs, lbs = landmarks.extract_features_labels()

    # Splitting data into 90% train and 10% test
    tr_data, te_data, lbs_train, lbs_test = train_test_split(imgs, lbs, test_size=0.1)
    data_train = tr_data.reshape(tr_data.shape[0], tr_data.shape[1]*tr_data.shape[2])
    data_test = te_data.reshape(te_data.shape[0], te_data.shape[1]*te_data.shape[2])

    return data_train, data_test, lbs_train, lbs_test

In [8]:
data_train, data_test, lbs_train, lbs_test = mainA1LandmarksSansPCA()

In [9]:
# Running Grid Search

clf = SVC()
grid_search = GridSearchCV(clf, param_grid=param_dist, cv=5)
start = time.time()
grid_search.fit(data_train, lbs_train)

print("GridSearchCV took %.2f minutes for %d candidate parameter settings."
    % (round((time.time() - start)/60,2), len(grid_search.cv_results_['params'])))
print("")

report(grid_search.cv_results_)

GridSearchCV took 719.39 minutes for 72 candidate parameter settings.

Model with rank: 1
Mean validation score: 0.921 (std: 0.003)
Parameters: {'C': 1, 'gamma': 'scale', 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.003)
Parameters: {'C': 1, 'gamma': 0.001, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.003)
Parameters: {'C': 1, 'gamma': 0.01, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.003)
Parameters: {'C': 1, 'gamma': 0.1, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.003)
Parameters: {'C': 1, 'gamma': 0.3, 'kernel': 'linear', 'random_state': 42}

Model with rank: 1
Mean validation score: 0.921 (std: 0.003)
Parameters: {'C': 1, 'gamma': 1, 'kernel': 'linear', 'random_state': 42}



### Conclusions

Observing the results of Grid Search Cross-Validation with and without PCA, it is possible to conclude that the SVM model performs (and generalizes) similarly both for the PCA and sans-PCA dataset, achieving a mean validation accuracy of 92.1 %, albeit the sans-PCA model has a smaller standard deviation, which is benefitial. 
However, due to the extensive computational time of the sans-PCA model training, the PCA data model is the chosen one for this SVM model.

Furthermore, the parameters of the model with the highest rank in the PCA model will be used as to get the best performance possible. They are:
* Regularization parameter (C) : 3
* Gamma : 'scale'
* Kernel Function : Linear