# **Kinship Prediction - Submission Notebook**

### Kerry Cook, Chris Wilkerson

We included 3 sections that contain the main different methods we tried in this competition. Our best submission is the the first section, which included ensemble methods with gradient boosting. 

A very close second was just using a pretrained network (Facenet) to calculate image embeddings, and use the L2 norm distance between two images for classification. 

The last section includes our efforts to improve the facenet prediction with transfer learning.

1. Gradient Boosting and Ensemble Prediction
2. Pretrained Facenet Network
3. Transfer Learning 


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Install Libraries

In [5]:
%%capture
!pip install deepface

In [3]:

from collections import defaultdict
from glob import glob
from random import choice, sample

import tensorflow as tf
import keras
import cv2
import numpy as np
import pandas as pd
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.layers import Input, Dense, GlobalMaxPool2D, GlobalAvgPool2D, Concatenate, Multiply, Dropout, Subtract
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from keras.models import load_model


In [None]:
print(tf.__version__)

2.5.0


Helper function to normalize images before predicition

In [4]:
def normalize(x):
    if x.ndim == 4:
        axis = (1, 2, 3)
        size = x[0].size
    elif x.ndim == 3:
        axis = (0, 1, 2)
        size = x.size
    else:
        raise ValueError('Dimension should be 3 or 4')

    mean = np.mean(x, axis=axis, keepdims=True)
    std = np.std(x, axis=axis, keepdims=True)
    std_adj = np.maximum(std, 1.0/np.sqrt(size))
    y = (x - mean) / std_adj
    return y

# 1. Gradient Boost + Ensemble Methods - Best Score 

# 2. Pretrained Facenet embedding with L2 Norm - No transfer learning

The cell below uses a pre-trained facenet model that was trained using the MS-Celeb-1M dataset (https://drive.google.com/drive/folders/12aMYASGCKvDdkygSv1yQq8ns03AStDO_).

We calculated the embedding for each image, then calculated the cosine similarity, euclidean distance, and L2 normalized euclidean distance to test which performed best. The l2 distance performed the best on both the submission set and training data. In addition, we tried convering the distances to probabilities, which boosted accuracy by 2%. 

Before feeding the images to the network, they are preprocessed using deepface's preprocessing library to detect faces and reshape the image. This submission recevied a score of 69.866% on Kaggle, and were not able to improve with transfer learning, even when trying to expand the training pairs and add data augmentation. 

In [None]:
from deepface.commons import functions
from deepface.commons.distance import findCosineDistance, findEuclideanDistance, l2_normalize

model_path = '/content/drive/MyDrive/kinship_test/models/facenet_keras.h5'
model = load_model(model_path)

test_path = "/content/drive/MyDrive/Kinship Recognition Starter/test/"
submission = pd.read_csv('/content/drive/MyDrive/Kinship Recognition Starter/test_ds.csv')

cos_predictions, euc_pred, l2_pred = [], [], []
cos, euc, l2 = [], [], []
for i in range(0, len(submission)):
  
    X1 = submission.p1[i]
    X1 = test_path + X1
    

    X2 = submission.p2[i]
    X2 = test_path + X2 

    #Process Image and detect faces 
    img1 = normalize(functions.preprocess_face(X1, target_size = (160, 160),enforce_detection=False))
    img2 = normalize(functions.preprocess_face(X2, target_size = (160, 160),enforce_detection=False))
    
    #Cacluate image embedding from pre trained network
    img1_emb = model.predict(img1)[0]
    img2_emb = model.predict(img2)[0]

    #Calculate both cosine distance and threshold pred
    distance = findCosineDistance(img1_emb, img2_emb)
    pred = 1 if distance >= .68 else 0 
    cos_predictions.append(pred)
    cos.append(distance)

    #Euclid distance and threshold prediction
    distance = findEuclideanDistance(img1_emb, img2_emb)
    pred=1 if distance <= 6.14 else  0 
    euc_pred.append(pred)
    euc.append(distance)

    #L2 Euclid distance and threshold prediction
    distance = findEuclideanDistance(l2_normalize(img1_emb), l2_normalize(img2_emb))
    pred =1 if distance <= 1.35 else  0 
    l2_pred.append(pred)
    l2.append(distance) 



In [26]:
#Convert distances into probabilities for prediction 
l2_sum = sum(l2)
l2 = np.array(l2)

l2_prob = []
for d in l2:
  prob = np.sum(l2[np.where(l2 <= d)[0]])/l2_sum
  l2_prob.append(1 - prob)

In [None]:
#Save all the threshold prediction files 
d = {'index': np.arange(0, 3000, 1), 'label':cos_predictions}
submissionfile = pd.DataFrame(data=d)
submissionfile.astype('int64').to_csv("/content/drive/MyDrive/kinship_test/transfer_learning/facenet_cos.csv", index=False)

d = {'index': np.arange(0, 3000, 1), 'label':euc_pred}
submissionfile = pd.DataFrame(data=d)
submissionfile.astype('int64').to_csv("/content/drive/MyDrive/kinship_test/transfer_learning/facenet_euc.csv", index=False)


d = {'index': np.arange(0, 3000, 1), 'label':l2_pred}
submissionfile = pd.DataFrame(data=d)
submissionfile.astype('int64').to_csv("/content/drive/MyDrive/kinship_test/transfer_learning/facenet_l2.csv", index=False)

In [None]:
#Save distance calculation files 
d = {'index': np.arange(0, 3000, 1), 'label':cos}
submissionfile = pd.DataFrame(data=d)
submissionfile.to_csv("/content/drive/MyDrive/kinship_test/transfer_learning/dist_cos.csv", index=False)

d = {'index': np.arange(0, 3000, 1), 'label':euc}
submissionfile = pd.DataFrame(data=d)
submissionfile.to_csv("/content/drive/MyDrive/kinship_test/transfer_learning/dist_euc.csv", index=False)


d = {'index': np.arange(0, 3000, 1), 'label':l2}
submissionfile = pd.DataFrame(data=d)
submissionfile.to_csv("/content/drive/MyDrive/kinship_test/transfer_learning/dist_l2.csv", index=False)

The probability file is the file used for submission - picking a threshold of .55 

In [28]:
#Save L2 dist probability file - this is used for submission 
d = {'index': np.arange(0, 3000, 1), 'label':l2_prob}
submissionfile = pd.DataFrame(data=d)
submissionfile.to_csv("/content/drive/MyDrive/kinship_test/prob_l2.csv", index=False)

# 3. Transfer Learning 