# **Large-Scale Kinship Recognition Data Challenge: Kinship Verification STARTER NOTEBOOK**

We provide framework code to get you started on the competition. The notebook is broken up into three main sections. 
1. Data Loading & Visualizing
2. Data Generator & Model Building
3. Training & Testing Model

We have done the majority of the heavy lifting by making the data easily and readily accessible through Google Drive. Furthermore, we have made the task easier by creating a dataloader and fully trained end-to-end model that predicts a binary label (0 or 1) denoting whether two faces share a kinship relation. 

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**WARNING: IF YOU HAVE NOT DONE SO**

Change to GPU:

Runtime --> Change Runtime Type --> GPU

Mount to Google Drive

Install Libraries

In [2]:
%%capture
!pip install keras_vggface
!pip install keras_applications
!pip install arcface
!pip install deepface

In [3]:
from collections import defaultdict
from glob import glob
from random import choice, sample

import tensorflow as tf
import keras
import cv2
import numpy as np
import pandas as pd
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.layers import Input, Dense, GlobalMaxPool2D, GlobalAvgPool2D, Concatenate, Multiply, Dropout, Subtract
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from keras_vggface.utils import preprocess_input
from keras_vggface.vggface import VGGFace


In [None]:
print(tf.__version__)

2.5.0


train_relationships.csv contains pairs of image paths which are positive samples (related to each other).

train-faces contains the images for training itself.

In [4]:
# Modify paths as per your method of saving them
train_file_path = "/content/drive/MyDrive/Kinship Recognition Starter/train_ds.csv"
train_folders_path = "/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/"
# All images belonging to families F09** will be used to create the validation set while training the model
# For final submission, you can add these to the training data as well
val_famillies = "F09"

In [5]:
all_images = glob(train_folders_path + "*/*/*.jpg")

train_images = [x for x in all_images if val_famillies not in x]
val_images = [x for x in all_images if val_famillies in x]

train_person_to_images_map = defaultdict(list)

ppl = [x.split("/")[-3] + "/" + x.split("/")[-2] for x in all_images]

for x in train_images:
    train_person_to_images_map[x.split("/")[-3] + "/" + x.split("/")[-2]].append(x)

val_person_to_images_map = defaultdict(list)

for x in val_images:
    val_person_to_images_map[x.split("/")[-3] + "/" + x.split("/")[-2]].append(x)

In [None]:
all_images

['/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03496_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03500_face2.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03497_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03501_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03492_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03499_face5.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03494_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03493_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03495_face0.jpg',
 '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0330/MID1/P03498_fa

In [6]:
relationships = pd.read_csv(train_file_path)
relationships = list(zip(relationships.p1.values, relationships.p2.values, relationships.relationship.values))
relationships = [(x[0],x[1],x[2]) for x in relationships if x[0][:10] in ppl and x[1][:10] in ppl]

train = [x for x in relationships if val_famillies not in x[0]]
val = [x for x in relationships if val_famillies in x[0]]

In [7]:
from keras.preprocessing import image
def read_img(path):
    img = image.load_img(path, target_size=(224, 224))
    img = np.array(img).astype(np.float)
    return preprocess_input(img, version=2)

Define a data generator. Here our data generator will generate a batch of examples which will be used by our model in training. It will generate two images, one for each in the pair as well as a label associated with it.

In [None]:
def gen(list_tuples, person_to_images_map, batch_size=16):
    ppl = list(person_to_images_map.keys())
    while True:
        batch_tuples = sample(list_tuples, batch_size)
        
        # All the samples are taken from train_ds.csv, labels are in the labels column
        labels = []
        for tup in batch_tuples:
          labels.append(tup[2])

        X1 = [x[0] for x in batch_tuples]
        X1 = np.array([read_img(train_folders_path + x) for x in X1])

        X2 = [x[1] for x in batch_tuples]
        X2 = np.array([read_img(train_folders_path + x) for x in X2])

        yield [X1, X2], np.array(labels)

In [None]:
from arcface import ArcFace
from arcface.lib.models import ArcFaceModel

face_rec = ArcFace.ArcFace()
test = gen(train, train_person_to_images_map, batch_size=16)
img = next(test)

af_model = ArcFaceModel(size=224, channels=3, num_classes=None, name='arcface_model',
                 margin=0.5, logist_scale=64, embd_shape=512,
                 head_type='ArcHead', backbone_type='ResNet50',
                 w_decay=5e-4, use_pretrain=True, training=False)

#emb2 = face_rec.calc_emb([img[0][0][0],img[0][0][1]])
#print(emb1)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5


In [None]:
type(af_model)


tensorflow.python.keras.engine.functional.Functional

In [None]:
len(img[0][0][0])

224

In [None]:
dist = np.linalg.norm(emb1 - emb2)
sim = np.dot(emb1, emb2) / (np.sqrt(np.dot(emb1,emb1)) * np.sqrt(np.dot(emb2,emb2)))
print(dist)
print(sim)

1.1992311
0.2809223


In [11]:
val

[('F0917/MID3/P09684_face0.jpg', 'F0290/MID6/P03086_face2.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0199/MID1/P02146_face1.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0167/MID4/P01797_face1.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0939/MID3/P09902_face5.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0360/MID1/P12313_face0.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0665/MID4/P06953_face3.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0119/MID1/P01238_face1.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0841/MID1/P08886_face0.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0358/MID5/P10928_face5.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0421/MID3/P04429_face0.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0930/MID2/P09812_face4.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0457/MID1/P04834_face2.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0736/MID1/P07716_face2.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0573/MID1/P06024_face3.jpg', 0),
 ('F0917/MID3/P09684_face0.jpg', 'F0511/MID2/P05

Here is an ensemble model built with two resnet-50 architectures, pre-trained, with which we can apply transfer leraning on. This model achieves the baseline and the goal is to expand on this work. There have been papers exploring different architectures as well as introducing BatchNormalization among many other techniques to improve how well the model recognizes kinship between two faces.

In [None]:

def baseline_model():
    input_1 = Input(shape=(224, 224, 3))
    input_2 = Input(shape=(224, 224, 3))

    #base_model = VGGFace(model='resnet50', include_top=False)
   
    base_model = ArcFaceModel(size=224, channels=3, num_classes=None, name='arcface_model',
                 margin=0.5, logist_scale=64, embd_shape=512,
                 head_type='ArcHead', backbone_type='ResNet50',
                 w_decay=5e-4, use_pretrain=True, training=False)
    print(type(base_model))
    base_model.trainable = False

    for x in base_model.layers[:-2]:
        x.trainable = True

    x1 = base_model(input_1)
    x2 = base_model(input_2)

    #x1 = Concatenate(axis=-1)([GlobalMaxPool2D()(x1), GlobalAvgPool2D()(x1)])
    #x2 = Concatenate(axis=-1)([GlobalMaxPool2D()(x2), GlobalAvgPool2D()(x2)])

    x3 = Subtract()([x1, x2])
    x3 = Multiply()([x3, x3])

    x = Multiply()([x1, x2])

    x = Concatenate(axis=-1)([x, x3])

    x = Dense(100, activation="relu")(x)
    x = Dropout(0.05)(x)
    out = Dense(1, activation="sigmoid")(x)

    model = Model([input_1, input_2], out)

    model.compile(loss="binary_crossentropy", metrics=['acc'], optimizer=Adam(0.00001))

    model.summary()

    return model

# ArFace embedding with cosine distance - No transfer Learning

Save the best model to your drive after each training epoch so that you can come back to it. ReduceLROnPlateau reduces the learning rate when a metric has stopped improving, in this case the validation accuracy. 

In [None]:
submission

Unnamed: 0,index,p1,p2
0,0,face1116.jpg,face3426.jpg
1,1,face762.jpg,face3128.jpg
2,2,face1499.jpg,face3480.jpg
3,3,face1027.jpg,face1733.jpg
4,4,face158.jpg,face2620.jpg
...,...,...,...
2995,2995,face2104.jpg,face4163.jpg
2996,2996,face207.jpg,face2441.jpg
2997,2997,face2024.jpg,face3753.jpg
2998,2998,face1064.jpg,face3385.jpg


In [None]:
from arcface import ArcFace

#cosine distance 
def cos_dist(x1,x2):
  """
  Get cosine distance between 2 numpy arrays
  """
  return np.dot(x1, x2) / (np.sqrt(np.dot(x1,x1)) * np.sqrt(np.dot(x2,x2)))


def get_cosine_distance (img1,img2):
  face_rec = ArcFace.ArcFace()
  emb1 = face_rec.calc_emb(img1)
  emb2 = face_rec.calc_emb(img2)

  dist = cos_dist(emb1, emb2)
  return dist
test_path = "/content/drive/MyDrive/Kinship Recognition Starter/test/"


submission = pd.read_csv('/content/drive/MyDrive/Kinship Recognition Starter/test_ds.csv')

predictions = []
scores = []

for i in range(0, len(submission)):
    X1 = submission.p1[i]
    #print(X1)
    X1 = test_path + X1
    

    X2 = submission.p2[i]
    X2 = test_path + X2 
    backends = ['opencv', 'ssd', 'dlib', 'mtcnn', 'retinaface']

    #face detection and alignment
    detected_face = DeepFace.detectFace("img.jpg", detector_backend = backends[4])
    
    similarity = get_cosine_distance(X1, X2)
    pred = 0 
    scores.append(similarity)
    if similarity >= 0.65: pred = 1  
       
    predictions.append(pred)


In [38]:
from deepface import DeepFace
from deepface.commons.distance import findCosineDistance, findEuclideanDistance, l2_normalize
from deepface.commons import functions
from deepface.basemodels import ArcFace

model = ArcFace.loadModel()
model.load_weights("/content/drive/MyDrive/arcface_weights.h5")

test_path = "/content/drive/MyDrive/Kinship Recognition Starter/test/"
submission = pd.read_csv('/content/drive/MyDrive/Kinship Recognition Starter/test_ds.csv')

cos_predictions, euc_pred, l2_pred = [], [], []
cos, euc, l2 = [], [], []
for i in range(0, len(submission)):
    print(i)
    X1 = submission.p1[i]
    #print(X1)
    X1 = test_path + X1
    

    X2 = submission.p2[i]
    X2 = test_path + X2 

    img1 = functions.preprocess_face(X1, target_size = (112, 112),enforce_detection=False)
    img2 = functions.preprocess_face(X2, target_size = (112, 112),enforce_detection=False)
    
    img1_emb = model.predict(img1)[0]
    img2_emb = model.predict(img2)[0]

    distance = findCosineDistance(img1_emb, img2_emb)
    cos.append(distance)
    pred = 1 if distance >= .68 else 0 
    cos_predictions.append(pred)
    cos.append(distance)

    distance = findEuclideanDistance(img1_emb, img2_emb)
    pred=1 if distance <= 6.14 else  0 
    euc_pred.append(pred)
    euc.append(distance)

    distance = findEuclideanDistance(l2_normalize(img1_emb), l2_normalize(img2_emb))
    pred =1 if distance <= 1.5 else  0 
    l2_pred.append(pred)
    l2.append(distance) 



0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [None]:
X1= '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0990/MID10/P10431_face3.jpg'
X2 = '/content/drive/MyDrive/Kinship Recognition Starter/train/train-faces/F0990/MID9/P10437_face5.jpg'

img1 = functions.preprocess_face(X1, target_size = (112, 112),enforce_detection=False)
img2 = functions.preprocess_face(X2, target_size = (112, 112),enforce_detection=False)

img1 = model.predict(img1)[0]
img2= model.predict(img2)[0]

distance = findCosineDistance(img1, img2)
print("Cos ", distance)
distance = findEuclideanDistance(img1, img2)
print("eucl ", distance)
distance = findEuclideanDistance(l2_normalize(img1), l2_normalize(img2))
print("L2 ", distance)

In [30]:
len(submission)

3000

In [None]:
file_path = "/content/drive/MyDrive/af_model.h5"

checkpoint = ModelCheckpoint(file_path, monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=True, mode='max')

reduce_on_plateau = ReduceLROnPlateau(monitor="val_acc", mode="max", factor=0.1, patience=20, verbose=1)

callbacks_list = [checkpoint, reduce_on_plateau]

model = baseline_model()

<class 'tensorflow.python.keras.engine.functional.Functional'>
Model: "model_4"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_27 (InputLayer)           [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
input_28 (InputLayer)           [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
arcface_model (Functional)      (None, 512)          74978688    input_27[0][0]                   
                                                                 input_28[0][0]                   
__________________________________________________________________________________________________
subtract_2 (Subtract)        

In [None]:
model.fit(gen(train, train_person_to_images_map, batch_size=16), use_multiprocessing=False,
                validation_data=gen(val, val_person_to_images_map, batch_size=64), epochs=200, verbose=1,
                workers=1, callbacks=callbacks_list, steps_per_epoch=100, validation_steps=50)

Epoch 1/200

Epoch 00001: val_acc improved from -inf to 0.46125, saving model to /content/drive/MyDrive/af_model.h5
Epoch 2/200

Epoch 00002: val_acc improved from 0.46125 to 0.48875, saving model to /content/drive/MyDrive/af_model.h5
Epoch 3/200

Epoch 00003: val_acc improved from 0.48875 to 0.49781, saving model to /content/drive/MyDrive/af_model.h5
Epoch 4/200

Epoch 00004: val_acc improved from 0.49781 to 0.51187, saving model to /content/drive/MyDrive/af_model.h5
Epoch 5/200

Epoch 00005: val_acc improved from 0.51187 to 0.51594, saving model to /content/drive/MyDrive/af_model.h5
Epoch 6/200

Epoch 00006: val_acc improved from 0.51594 to 0.53031, saving model to /content/drive/MyDrive/af_model.h5
Epoch 7/200

Epoch 00007: val_acc improved from 0.53031 to 0.53938, saving model to /content/drive/MyDrive/af_model.h5
Epoch 8/200

Epoch 00008: val_acc did not improve from 0.53938
Epoch 9/200

Epoch 00009: val_acc did not improve from 0.53938
Epoch 10/200

Epoch 00010: val_acc did not i

In [None]:
# Modify paths as per your need
test_path = "/content/drive/MyDrive/Kinship Recognition Starter/test/"

model = baseline_model()
model.load_weights("/content/drive/MyDrive/baseline_model.h5")

submission = pd.read_csv('/content/drive/MyDrive/Kinship Recognition Starter/test_ds.csv')
predictions = []
scores = []
for i in range(0, len(submission.p1.values), 32):
    X1 = submission.p1.values[i:i+32]
    X1 = np.array([read_img(test_path + x) for x in X1])

    X2 = submission.p2.values[i:i+32]
    X2 = np.array([read_img(test_path + x) for x in X2])

    pred = model.predict([X1, X2]).ravel().tolist()
    predictions += pred

The following Variables were used a Lambda layer's call (tf.nn.convolution_212), but
are not present in its tracked objects:
  <tf.Variable 'conv1/7x7_s2/kernel:0' shape=(7, 7, 3, 64) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.compat.v1.nn.fused_batch_norm_212), but
are not present in its tracked objects:
  <tf.Variable 'conv1/7x7_s2/bn/gamma:0' shape=(64,) dtype=float32>
  <tf.Variable 'conv1/7x7_s2/bn/beta:0' shape=(64,) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
The following Variables were used a Lambda layer's call (tf.nn.convolution_213), but
are not present in its tracked objects:
  <tf.V

The final predictions will need to be rounded: EG 0.01 rounded to 0 and 0.78 rounded to 1. The simple .round() function is sufficient as below.

In [55]:
print(len(cos_predictions))
d = {'index': np.arange(0, 3000, 1), 'label':cos_predictions}
submissionfile = pd.DataFrame(data=d)
#submissionfile = submissionfile.round()

3000


In [59]:
print(len(cos), len(euc), len(l2))
d = {'index': np.arange(0, 3000, 1), 'cos':new}
f = pd.DataFrame(data=d)  
f.to_csv("/content/drive/MyDrive/kinship_test/deepface_cosdistances.csv", index=False) 

6000 3000 3000


In [None]:
submissionfile.to_csv("/content/drive/MyDrive/kinship_test/c.wilkerson_ksc2138.csv", index=False)

In [None]:
import pandas as pd 
df = pd.read_csv("/content/drive/MyDrive/kinship_test/cw3329_ksc2138.csv")

In [56]:
submissionfile.astype('int64').to_csv("/content/drive/MyDrive/kinship_test/deepface_cos.csv", index=False)

In [57]:
new = cos[::2]

In [58]:
print(len(new))

3000


At this point, download the CSV and submit it on Kaggle to score your predictions.
