# Face Recognition

This notebook, adapted from Deeplearning.ai's Deep Learning course, focuses on building a face recognition system. Many of the ideas presented here are from [FaceNet](https://arxiv.org/pdf/1503.03832.pdf). In the notebook, we will also encountered [DeepFace](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf).
# Face Recognition and Verification

Face recognition problems commonly fall into one of two categories:

**Face Verification:** "Is this the claimed person?" For example, at some airports, customs can be passed by letting a system scan a passport and then verifying that the person carrying the passport is the correct individual. A mobile phone that unlocks using a face also employs face verification. This is a 1:1 matching problem.

**Face Recognition:** "Who is this person?" For instance, the video lecture showcased a [face recognition video](https://www.youtube.com/watch?v=wr4rx0Spihs) of Baidu employees entering the office without needing to otherwise identify themselves. This is a 1:K matching problem.

FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, it is possible to determine if two pictures are of the same person.

## Objectives:

* Differentiate between face recognition and face verification
* Implement one-shot learning to solve a face recognition problem
* Apply the triplet loss function to learn a network's parameters in the context of face recognition
* Explain how to pose face recognition as a binary classification problem
* Map face images into 128-dimensional encodings using a pretrained model
* Perform face verification and face recognition with these encodings

## Channels-last notation

A pre-trained model representing ConvNet activations using a "channels last" convention will be used.

In other words, a batch of images will be of shape $(m, n_H, n_W, n_C)$.


In [None]:
# uncomment the following line to install the packages.
# !pip install numpy Pillow pandas tensorflow

In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import MaxPooling2D, AveragePooling2D
from tensorflow.keras.layers import Concatenate
from tensorflow.keras.layers import Lambda, Flatten, Dense
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.layers import Layer
from tensorflow.keras import backend as K
K.set_image_data_format('channels_last')
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
import PIL

%matplotlib inline
%load_ext autoreload
%autoreload 2

## Naive Face Verification

In Face Verification, two images are given, and it must be determined if they are of the same person. The simplest way to do this is to compare the two images pixel-by-pixel. If the distance between the raw images is below a chosen threshold, it may be the same person!

<img src="images/pixel_comparison.png" style="width:380px;height:150px;">
<caption><center> <u> <font> <b>Figure 1</b> </u></center></caption>

Of course, this algorithm performs poorly, since the pixel values change dramatically due to variations in lighting, orientation of the person's face, minor changes in head position, and so on.

Rather than using the raw image, an encoding, $f(\text{img})$, can be learned.

By using an encoding for each image, an element-wise comparison produces a more accurate judgment as to whether two pictures are of the same person.


## Encoding Face Images into a 128-Dimensional Vector

### Using a ConvNet to Compute Encodings

The FaceNet model requires a lot of data and significant time to train. Following common practice in applied deep learning, pre-trained weights will be loaded. The network architecture follows the Inception model from [Szegedy *et al*](https://arxiv.org/abs/1409.4842). The pre-trained model can be downloaded [here](https://www.kaggle.com/datasets/rmamun/kerasfaceneth5). An Inception network implementation is provided in the file `inception_blocks_v2.py` for a closer look at its implementation.

*Hot tip:* Go to "File->Open..." at the top of this notebook to open the file directory containing the `.py` file.

Key points to note:

- This network uses 160x160 dimensional RGB images as its input. Specifically, a face image (or batch of $m$ face images) is a tensor of shape $(m, n_H, n_W, n_C) = (m, 160, 160, 3)$.
- The input images are originally of shape 96x96, thus scaling to 160x160 is required. This scaling is performed in the `img_to_encoding()` function.
- The output is a matrix of shape $(m, 128)$ that encodes each input face image into a 128-dimensional vector.

Run the cell below to create the model for face images.

In [2]:
from tensorflow.keras.models import model_from_json
# put keras-facenet-h5 in the same directory as this file
# load json and create model
json_file = open('keras-facenet-h5/model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
model.load_weights('keras-facenet-h5/model.h5')

Now summarize the input and output shapes: 

In [3]:
print(model.inputs)
print(model.outputs)

[<tf.Tensor 'input_1:0' shape=(None, 160, 160, 3) dtype=float32>]
[<tf.Tensor 'Bottleneck_BatchNorm/batchnorm/add_1:0' shape=(None, 128) dtype=float32>]


By using a 128-neuron fully connected layer as its last layer, the model ensures that the output is an encoding vector of size 128. The encodings are then used to compare two face images as follows:

<img src="images/distance_kiank.png" style="width:680px;height:250px;">
<caption><center> <u> <font> <b>Figure 2:</b> </u> <font><br>By computing the distance between two encodings and thresholding, it can be determined if the two pictures represent the same person.</center></caption>

An encoding is considered effective if:

- The encodings of two images of the same person are quite similar.
- The encodings of two images of different persons are very different.

The triplet loss function formalizes this concept, aiming to "push" the encodings of two images of the same person (Anchor and Positive) closer together, while "pulling" the encodings of two images of different persons (Anchor and Negative) further apart.

<img src="images/triplet_comparison.png" style="width:280px;height:150px;"><br>
<caption><center> <u> <font> <b>Figure 3: </b> </u> <font><br>In the next section, the pictures from left to right are referred to as Anchor (A), Positive (P), and Negative (N).</center></caption>


By using a 128-neuron fully connected layer as its last layer, the model ensures that the output is an encoding vector of size 128. We then use the encodings to compare two face images as follows:

<img src="images/distance_kiank.png" style="width:680px;height:250px;">
<caption><center> <u> <font> <b>Figure 2:</b> <br> </u> <font>By computing the distance between two encodings and thresholding, we can determine if the two pictures represent the same person</center></caption>

So, an encoding is a good one if:

- The encodings of two images of the same person are quite similar to each other.
- The encodings of two images of different persons are very different.

The triplet loss function formalizes this, and tries to "push" the encodings of two images of the same person (Anchor and Positive) closer together, while "pulling" the encodings of two images of different persons (Anchor, Negative) further apart.
    
<img src="images/triplet_comparison.png" style="width:280px;height:150px;"><br>
<caption><center> <u> <font> <b>Figure 3: </b> <br> </u> <font> In the next section,  we'll call the pictures from left to right: Anchor (A), Positive (P), Negative (N)</center></caption>

### The Triplet Loss

**Important Note**: Since a pretrained model is being used, implementing the triplet loss function is not necessary. However, the triplet loss is the main ingredient of the face recognition algorithm, and understanding how to use it is essential for training a custom FaceNet model or addressing other types of image similarity problems. Therefore, the triplet loss function will be implemented below for educational purposes.

For an image $ x $, its encoding is denoted as $ f(x) $, where $ f $ is the function computed by the neural network.

<img src="images/f_x.png" style="width:380px;height:150px;">

Training utilizes triplets of images $(A, P, N)$:

- $ A $ is an "Anchor" image—a picture of a person.
- $ P $ is a "Positive" image—a picture of the same person as the Anchor image.
- $ N $ is a "Negative" image—a picture of a different person than the Anchor image.

These triplets are selected from the training dataset. $ (A^{(i)}, P^{(i)}, N^{(i)}) $ denotes the $ i $-th training example.

The goal is to ensure that an image $ A^{(i)} $ of an individual is closer to the Positive $ P^{(i)} $ than to the Negative image $ N^{(i)} $ by at least a margin $ \alpha $:

$$
\| f(A^{(i)}) - f(P^{(i)}) \|_{2}^{2} + \alpha < \| f(A^{(i)}) - f(N^{(i)}) \|_{2}^{2}
$$

The objective is to minimize the following "triplet cost":

$$\mathcal{J} = \sum^{m}_{i=1} \left[ \underbrace{\| f(A^{(i)}) - f(P^{(i)}) \|_2^2}_\text{(1)} - \underbrace{\| f(A^{(i)}) - f(N^{(i)}) \|_2^2}_\text{(2)} + \alpha \right]_+ \tag{3}$$

Here, the notation "$[z]_+$" denotes $ \max(z, 0) $.

**Notes**:

- The term (1) is the squared distance between the anchor "A" and the positive "P" for a given triplet; this should be small.
- The term (2) is the squared distance between the anchor "A" and the negative "N" for a given triplet; this should be relatively large. The minus sign preceding it indicates that minimizing the negative term is equivalent to maximizing that term.
- $ \alpha $ is called the margin, a hyperparameter manually selected. In this case, $ \alpha = 0.2 $ will be used.

Most implementations also rescale the encoding vectors to have an L2 norm equal to one (i.e., $ \| f(\text{img}) \|_2 = 1 $); this rescaling is not required.

### triplet_loss

The triplet loss is defined by formula (3). These are the 4 steps:

1. Compute the distance between the encodings of "anchor" and "positive": $ \| f(A^{(i)}) - f(P^{(i)}) \|_2^2 $.
2. Compute the distance between the encodings of "anchor" and "negative": $ \| f(A^{(i)}) - f(N^{(i)}) \|_2^2 $.
3. Compute the formula per training example: $ \| f(A^{(i)}) - f(P^{(i)}) \|_2^2 - \| f(A^{(i)}) - f(N^{(i)}) \|_2^2 + \alpha $.
4. Compute the full formula by taking the max with zero and summing over the training examples:

$$\mathcal{J} = \sum^{m}_{i=1} \left[ \| f(A^{(i)}) - f(P^{(i)}) \|_2^2 - \| f(A^{(i)}) - f(N^{(i)}) \|_2^2 + \alpha \right]_+ \tag{3}$$

*Key Points*:

- Useful functions: `tf.reduce_sum()`, `tf.square()`, `tf.subtract()`, `tf.add()`, `tf.maximum()`.

- For steps 1 and 2, sum over the entries of $ \| f(A^{(i)}) - f(P^{(i)}) \|_2^2 $ and $ \| f(A^{(i)}) - f(N^{(i)}) \|_2^2 $.

- For step 4, sum over the training examples.

- Recall that the square of the L2 norm is the sum of the squared differences: $ ||x - y||_2^2 = \sum_{i=1}^{N}(x_i - y_i)^2 $.

- Note that the anchor, positive, and negative encodings are of shape $(m, 128)$, where $ m $ is the number of training examples and 128 is the number of elements used to encode a single example.

- For steps 1 and 2, maintain the number of $ m $ training examples and sum along the 128 values of each encoding. `tf.reduce_sum` has an axis parameter to choose along which axis the sums are applied.

- One way to choose the last axis in a tensor is to use negative indexing (axis=-1).

- In step 4, when summing over training examples, the result will be a single scalar value.

- For `tf.reduce_sum` to sum across all axes, keep the default value axis=None.

In [4]:
def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss as defined by formula (3)
    
    Arguments:
    y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)
    
    Returns:
    loss -- real number, value of the loss
    """
    
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]

    # Step 1: Compute the (encoding) distance between the anchor and the positive
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)),axis=-1)
    # Step 2: Compute the (encoding) distance between the anchor and the negative
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)),axis=-1)
    # Step 3: subtract the two previous distances and add alpha.
    basic_loss = tf.maximum(tf.add(tf.subtract(pos_dist,neg_dist),alpha),0)
    # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
    loss = tf.reduce_sum(basic_loss)
    
    return loss

In [5]:
# BEGIN UNIT TEST
tf.random.set_seed(1)
y_true = (None, None, None) # It is not used
y_pred = (tf.keras.backend.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
          tf.keras.backend.random_normal([3, 128], mean=1, stddev=1, seed = 1),
          tf.keras.backend.random_normal([3, 128], mean=3, stddev=4, seed = 1))
loss = triplet_loss(y_true, y_pred)

assert type(loss) == tf.python.framework.ops.EagerTensor, "Use tensorflow functions"
print("loss = " + str(loss))

y_pred_perfect = ([[1., 1.]], [[1., 1.]], [[1., 1.,]])
loss = triplet_loss(y_true, y_pred_perfect, 5)
assert loss == 5, "Wrong value. Did you add the alpha to basic_loss?"
y_pred_perfect = ([[1., 1.]],[[1., 1.]], [[0., 0.,]])
loss = triplet_loss(y_true, y_pred_perfect, 3)
assert loss == 1., "Wrong value. Check that pos_dist = 0 and neg_dist = 2 in this example"
y_pred_perfect = ([[1., 1.]],[[0., 0.]], [[1., 1.,]])
loss = triplet_loss(y_true, y_pred_perfect, 0)
assert loss == 2., "Wrong value. Check that pos_dist = 2 and neg_dist = 0 in this example"
y_pred_perfect = ([[0., 0.]],[[0., 0.]], [[0., 0.,]])
loss = triplet_loss(y_true, y_pred_perfect, -2)
assert loss == 0, "Wrong value. Are you taking the maximum between basic_loss and 0?"
y_pred_perfect = ([[1., 0.], [1., 0.]],[[1., 0.], [1., 0.]], [[0., 1.], [0., 1.]])
loss = triplet_loss(y_true, y_pred_perfect, 3)
assert loss == 2., "Wrong value. Are you applying tf.reduce_sum to get the loss?"
y_pred_perfect = ([[1., 1.], [2., 0.]], [[0., 3.], [1., 1.]], [[1., 0.], [0., 1.,]])
loss = triplet_loss(y_true, y_pred_perfect, 1)
if (loss == 4.):
    raise Exception('Perhaps you are not using axis=-1 in reduce_sum?')
assert loss == 5, "Wrong value. Check your implementation"

# END UNIT TEST

loss = tf.Tensor(527.2598, shape=(), dtype=float32)


**Expected Output**:

<table>
    <tr>
        <td>
            <b>loss:</b>
        </td>
        <td>
           tf.Tensor(527.2598, shape=(), dtype=float32)
        </td>
    </tr>
    </table>

## Loading the Pre-trained Model

FaceNet is trained by minimizing the triplet loss. But since training requires a lot of data and a lot of computation, we will load a previously trained model in the following cell; which might take a couple of minutes to run.

In [6]:
FRmodel = model

Here are some examples of distances between the encodings between three individuals:

<img src="images/distance_matrix.png" style="width:380px;height:200px;"><br>
<caption><center> <u> <font> <b>Figure 4:</b></u> <br>  <font> Example of distance outputs between three individuals' encodings</center></caption>

## Applying the Model

A system is being developed for an office building where the building manager aims to implement facial recognition for employees to enter the premises.

The goal is to create a face verification system that grants access to a list of authorized individuals. Upon arrival, each person swipes an identification card at the entrance. The face recognition system then verifies their identity.

### Face Verification

A database will be constructed containing one encoding vector for each person authorized to enter the office. To generate the encoding, the function `img_to_encoding(image_path, model)` is used, which runs the model's forward propagation on the specified image.

Execute the following code to build the database (represented as a Python dictionary). This database maps each person's name to a 128-dimensional encoding of their face.

In [7]:
#tf.keras.backend.set_image_data_format('channels_last')
def img_to_encoding(image_path, model):
    img = tf.keras.preprocessing.image.load_img(image_path, target_size=(160, 160))
    img = np.around(np.array(img) / 255.0, decimals=12)
    x_train = np.expand_dims(img, axis=0)
    embedding = model.predict_on_batch(x_train)
    return embedding / np.linalg.norm(embedding, ord=2)

In [8]:
database = {}
database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)

Load the sample images

In [9]:
danielle = tf.keras.preprocessing.image.load_img("images/danielle.png", target_size=(160, 160))
kian = tf.keras.preprocessing.image.load_img("images/kian.jpg", target_size=(160, 160))

In [10]:
np.around(np.array(kian) / 255.0, decimals=12).shape

(160, 160, 3)

In [12]:
np.around(np.array(danielle) / 255.0, decimals=12).shape

(160, 160, 3)

### verify

The `verify()` function checks if the front-door camera picture (`image_path`) matches the identity of the person named "identity". It follows these steps to accomplish this:

- Compute the encoding of the image from `image_path`.
- Compute the distance between this encoding and the encoding of the identity image stored in the database.
- Open the door if the distance is less than 0.7; otherwise, do not open it.

As discussed, use the L2 distance `np.linalg.norm` for this comparison.

**Note**: Compare the L2 distance, not the square of the L2 distance, to the threshold of 0.7.

In [14]:
def verify(image_path, identity, database, model):
    """
    Function that verifies if the person on the "image_path" image is "identity".
    
    Arguments:
        image_path -- path to an image
        identity -- string, name of the person you'd like to verify the identity. Has to be an employee who works in the office.
        database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
        model -- your Inception model instance in Keras
    
    Returns:
        dist -- distance between the image_path and the image of "identity" in the database.
        door_open -- True, if the door should open. False otherwise.
    """
    # Step 1: Compute the encoding for the image. Use img_to_encoding() see example above.
    encoding = img_to_encoding(image_path,model)
    # Step 2: Compute distance with identity's image
    dist = np.linalg.norm(encoding - database[identity])
    # Step 3: Open the door if dist < 0.7, else don't open
    if dist < 0.7:
        print("It's " + str(identity) + ", welcome in!")
        door_open = True
    else:
        print("It's not " + str(identity) + ", please go away")
        door_open = False     
    return dist, door_open

Let's run the `verify` function on some pictures:

In [None]:
# BEGIN UNIT TEST
distance, door_open_flag = verify("images/camera_0.jpg", "younes", database, FRmodel)
assert np.isclose(distance, 0.5992949), "Distance not as expected"
assert isinstance(door_open_flag, bool), "Door open flag should be a boolean"
print("(", distance, ",", door_open_flag, ")")
# END UNIT TEST

**Expected Output**:

<table>
    <tr>
        <td>
            <b>It's younes, welcome in!</b>
        </td>
        <td>
           (0.5992949, True)
        </td>
    </tr>
    </table>

In [16]:
verify("images/camera_2.jpg", "kian", database, FRmodel)

It's not kian, please go away


(1.0259346, False)

**Expected Output**:

<table>
    <tr>
        <td>
            <b>It's not kian, please go away</b>
        </td>
        <td>
           (1.0259346, False)
        </td>
    </tr>
    </table>

### Face Recognition

The current face verification system is functioning, but an issue arose when Kian had his ID card stolen. The next day, he was unable to access the building without his card.

To address this problem, the system will be upgraded to face recognition. This change eliminates the need for ID cards, allowing authorized individuals to simply walk up to the building and have the door unlock automatically.

The new face recognition system will identify if an image belongs to one of the authorized persons and determine their identity. Unlike the previous verification system, the input will not include a person's name.

### who_is_it

The `who_is_it()` function has the following steps:

- Compute the encoding for the image provided by `image_path`.
- Identify the encoding in the database that has the smallest distance to the target encoding.
- Initialize the `min_dist` variable to a sufficiently large number (e.g., 100) to track the closest encoding.
- Iterate over the database dictionary's names and encodings using `for (name, db_enc) in database.items()`.
- Calculate the L2 distance between the target encoding and each encoding from the database. If this distance is smaller than `min_dist`, update `min_dist` and set `identity` to the current `name`.

In [None]:
def who_is_it(image_path, database, model):
    """
    Implements face recognition for the office by finding who is the person on the image_path image.
    
    Arguments:
        image_path -- path to an image
        database -- database containing image encodings along with the name of the person on the image
        model -- your Inception model instance in Keras
    
    Returns:
        min_dist -- the minimum distance between image_path encoding and the encodings from the database
        identity -- string, the name prediction for the person on image_path
    """

    ## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. 
    encoding = img_to_encoding(image_path,model)
    
    ## Step 2: Find the closest encoding ##
    
    # Initialize "min_dist" to a large value, say 100
    min_dist = 100
    
    # Loop over the database dictionary's names and encodings.
    for (name, db_enc) in database.items():
        
        # Compute L2 distance between the target "encoding" and the current db_enc from the database. 
        dist = np.linalg.norm(encoding - db_enc)

        # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. 
        if dist < min_dist:
            min_dist = dist
            identity = name
    
    if min_dist > 0.7:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
        
    return min_dist, identity

Younes is at the front door and the camera takes a picture of him ("images/camera_0.jpg"). Let's see if your `who_it_is()` algorithm identifies Younes.

In [20]:
# BEGIN UNIT TEST
# Test 1 with Younes pictures 
who_is_it("images/camera_0.jpg", database, FRmodel)

# Test 2 with Younes pictures 
test1 = who_is_it("images/camera_0.jpg", database, FRmodel)
assert np.isclose(test1[0], 0.5992946)
assert test1[1] == 'younes'

# Test 3 with Younes pictures 
test2 = who_is_it("images/younes.jpg", database, FRmodel)
assert np.isclose(test2[0], 0.0)
assert test2[1] == 'younes'
# END UNIT TEST

it's younes, the distance is 0.5992949
it's younes, the distance is 0.5992949
it's younes, the distance is 0.0


**Expected Output**:

<table>
    <tr>
        <td>
            <b>it's younes, the distance is</b> 0.5992949<br>
            <b>it's younes, the distance is</b> 0.5992949<br>
            <b>it's younes, the distance is</b> 0.0<br>
        </td>
    </tr>
    </table>

 Change "camera_0.jpg" (picture of Younes) to "camera_1.jpg" (picture of Bertrand) and see the result.

Recap: 

- Posed face recognition as a binary classification problem
- Implemented one-shot learning for a face recognition problem
- Applied the triplet loss function to learn a network's parameters in the context of face recognition
- Mapped face images into 128-dimensional encodings using a pretrained model
- Performed face verification and face recognition with these encodings

**Enhancing the Model**:

Several strategies can be employed to further enhance the performance of the algorithm:

- **Expand the Database**: Include additional images of each person, taken under various lighting conditions and on different days. Comparing new images against a larger set of photos for each individual can significantly improve accuracy.

- **Crop the Images**: Focus the images on the face itself, minimizing the "border" region around the face. This preprocessing step reduces irrelevant pixels and enhances the algorithm's robustness by concentrating on the key features of the face.


## References
1. Florian Schroff, Dmitry Kalenichenko, James Philbin (2015). [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/pdf/1503.03832.pdf)

2. Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf (2014). [DeepFace: Closing the gap to human-level performance in face verification](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf)

3. This implementation also took a lot of inspiration from the official FaceNet github repository: https://github.com/davidsandberg/facenet

4. Further inspiration was found here: https://machinelearningmastery.com/how-to-develop-a-face-recognition-system-using-facenet-in-keras-and-an-svm-classifier/

5. And here: https://github.com/nyoki-mtl/keras-facenet/blob/master/notebook/tf_to_keras.ipynb