## Implementation of the Face Recognition Model from FaceNet

I will be using stochastic gradient descent, which deviates from the use of mini-batch gradient descent in the FaceNet paper. For each epoch, I will be creating a list of (Anchor,Pos ex., Neg ex.) pairs which corresponds to all anchor positive pairs and the selection of semi-hard negatives based on encoding vector distances.

In [1]:
import numpy as np
import tensorflow as tf
import pandas as pd
from keras import backend as K
import matplotlib.pyplot as plt
import latex
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

Using TensorFlow backend.


In [2]:
# Loading data
names = np.load("../../data/facenet/names.npy")
images = np.load("../../data/facenet/images.npy")
print(names.shape)
print(images.shape)

(426,)
(426, 220, 220, 3)


### Tensorflow Parameters

Note - there are three outputs from the placeholder function corresponding to a placeholder for an anchor image, a positive image, and a negative image.

In [3]:
# Placeholder values for input image data
def get_placeholders(x_h,x_w,x_c):
    """
    x_h: Height for image input 
    x_w: Width for image input
    x_c: Channels for image input
    """
    anchor = tf.placeholder(tf.float32, name="anchor", shape=(None,x_h,x_w,x_c))
    pos = tf.placeholder(tf.float32, name="pos_ex", shape=(None,x_h,x_w,x_c))
    neg = tf.placeholder(tf.float32, name="neg_ex", shape=(None,x_h,x_w,x_c))
    return [anchor,pos,neg]

In [4]:
# Testing placeholders function
tf.reset_default_graph()
with tf.Session() as sess:
    anch,pos,neg = get_placeholders(220,220,3)
    print("Anchor shape:",anch.shape)
    print("Pos. example shape:",pos.shape)
    print("Neg. example shape:",neg.shape)

Anchor shape: (?, 220, 220, 3)
Pos. example shape: (?, 220, 220, 3)
Neg. example shape: (?, 220, 220, 3)


### Tensorflow forward prop

Using the NN1 architecture outlined in the FaceNet paper. Note - I adjust this model slightly, in that I add batch normalization during the convolution steps to increase training time and simultaneously handle regularization concerns. I also slightly alter the fully-connected layers steps.

In [5]:
def forward_pass(input_images):
    img1,img2,img3 = input_images
    img_enc1 = conv_network(img1)
    img_enc2 = conv_network(img2)
    img_enc3 = conv_network(img3)
    return [img_enc1,img_enc2,img_enc3]

In [6]:
# Defining constant layer for 2d convolution, batch norm, and activation
def conv(the_input,layer,f,ks,s):
    """
    the_input: the layer which will be used as input in conv layer
    layer: specifies the layer number for naming sections of graph
    f (filters): the number of filters to be used for conv layer
    ks (kernel_size): kernel size for conv2d layer
    s: stride for conv2d layer
    """
    layer = str(layer)
    Z = tf.layers.conv2d(the_input,filters=f,kernel_size=[ks,ks],strides=(s,s),padding="same",name="Z"+layer,kernel_initializer=tf.contrib.layers.xavier_initializer(seed=0),reuse=tf.AUTO_REUSE)
    Bn = tf.layers.batch_normalization(Z,name="Bn"+layer,reuse=tf.AUTO_REUSE)
    A = tf.nn.relu(Bn,name="A"+layer)
    return A

In [7]:
# NN1 architecture outlined in FaceNet paper with slight adjustments
def conv_network(X):
    input_layer = tf.reshape(X,[-1,220,220,3]) # Input shape of images
    S1 = conv(input_layer,1,64,7,2) # 110x110x64 
    P1 = tf.layers.max_pooling2d(S1,pool_size=[3,3],strides=2,padding="same",name="P1")
    S2 = conv(P1,2,64,1,1)
    S3 = conv(S2,3,192,3,1)
    P3 = tf.layers.max_pooling2d(S3,pool_size=[3,3],strides=2,padding="same",name="P3")
    S4 = conv(P3,4,192,1,1)
    S5 = conv(S4,5,384,3,1)
    P5 = tf.layers.max_pooling2d(S5,pool_size=[3,3],strides=2,padding="same",name="P5")
    S6 = conv(P5,6,384,1,1)
    S7 = conv(S6,7,256,3,1)
    S8 = conv(S7,8,256,1,1)
    S9 = conv(S8,9,256,3,1)
    S10 = conv(S9,10,256,1,1)
    S11 = conv(S10,11,256,3,1)
    P11 = tf.layers.max_pooling2d(S11,pool_size=[3,3],strides=2,padding="same",name="P11")
    # Reshape and maxout,fully connected layers
    Mo1 = tf.contrib.layers.maxout(P11,128) # 7x7x128
    F = tf.layers.flatten(Mo1,name="Flatten")
    Fc1 = tf.layers.dense(F,4096,activation=tf.nn.relu,name="Fc1",reuse=tf.AUTO_REUSE)
    Do = tf.layers.dropout(Fc1,rate=0.2,name="Dropout")
    Fc2 = tf.layers.dense(Do,128,activation=None,name="Fc2",reuse=tf.AUTO_REUSE)
    return Fc2

In [8]:
# Testing forward prop
tf.reset_default_graph()
img1,img2,img3 = images[0],images[1],images[2]
img1.shape = (1,220,220,3)
img2.shape = (1,220,220,3)
img3.shape = (1,220,220,3)
with tf.Session() as sess:
    aimg1,aimg2,aimg3 = get_placeholders(220,220,3)
    embeddings = forward_pass([aimg1,aimg2,aimg3])
    init = tf.global_variables_initializer()
    sess.run(init)
    aembedding = sess.run(embeddings,feed_dict={aimg1:img1,aimg2:img2,aimg3:img3})
    print("Anchor embedding shape:", str(aembedding[0].shape))
    print("Positive embedding shape:", str(aembedding[1].shape))
    print("Negative embedding shape:", str(aembedding[2].shape))

Anchor embedding shape: (1, 128)
Positive embedding shape: (1, 128)
Negative embedding shape: (1, 128)


### Tensorflow Triplet Loss

$$ J = \sum_{i=1}^{m} \bigg[ || f(x_i^a) - f(x^p_i) ||_2^2 - || f(x_i^a) - f(x^n_i) ||_2^2 + \alpha \bigg] $$
Note - I am not normalizing encoding vectors.

Terms:
- alpha: margin (set to 0.2 here)
- x<sub>i</sub><sup>a</sup>: anchor encoding
- x<sub>i</sub><sup>p</sup>: positive example encoding
- x<sub>i</sub><sup>n</sup>: negative example encoding

In [14]:
# Input embeddings is a list of three (None,128) embeddings
def cost_function(embeddings,alpha=0.2):
    anchor,pos,neg = embeddings
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,pos)))
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,neg)))
    basic_loss = tf.add(tf.subtract(pos_dist,neg_dist),alpha)
    loss = tf.reduce_sum(tf.maximum(basic_loss,0.0))
    return loss

In [15]:
# Testing cost function
tf.reset_default_graph()
img1,img2,img3 = images[0],images[1],images[2]
img1.shape,img2.shape,img3.shape = (1,220,220,3),(1,220,220,3),(1,220,220,3)
with tf.Session() as sess:
    aimg1,aimg2,aimg3 = get_placeholders(220,220,3)
    embeddings = forward_pass([aimg1,aimg2,aimg3])
    cost = cost_function(embeddings) 
    init = tf.global_variables_initializer()
    sess.run(init)
    acost = sess.run(cost,feed_dict={aimg1:img1,aimg2:img2,aimg3:img3})
    print("cost:",acost)
    print("cost shape:",cost.shape)

cost: 0.0
cost shape: ()


### Tensorflow Model