Face recognition problems commonly fall into one of two categories:

Many of the ideas presented here are from [FaceNet](https://arxiv.org/pdf/1503.03832.pdf)

**Face Verification** "Is this the claimed person?" For example, at some airports, a person can pass through customs by letting a system scan his passport and then verifying that he (the person carrying the passport) is the correct person. A mobile phone that unlocks using our face is also using face verification. This is a 1:1 matching problem.

**Face Recognition** "Who is this person?".Without using ID card one can enter office using live image of the employee. This is a 1:K matching problem.

FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, we can then determine if two pictures are of the same person.



# 1.Packages

In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from tensorflow.keras.models import Model
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import MaxPooling2D, AveragePooling2D
from tensorflow.keras.layers import Concatenate
from tensorflow.keras.layers import Lambda, Flatten, Dense
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.layers import Layer
from tensorflow.keras import backend as K
K.set_image_data_format('channels_last')
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
import PIL

%matplotlib inline
%load_ext autoreload
%autoreload 2

# 2.Naive Face Verification

In Face Verification, we're given two images and we have to determine if they are of the same person. The simplest way to do this is to compare the two images pixel-by-pixel. If the distance between the raw images is below a chosen threshold, it may be the same person!

This algorithm performs poorly, since the pixel values change dramatically due to variations in lighting, orientation of the person's face, minor changes in head position, and so on.

To overcome this obstace instead of comparing pixel per pixel  we can learn an encoding of the raw image $f(img)$


By using an encoding for each image, an element-wise comparison produces a more accurate judgement as to whether two pictures are of the same person.


#3.Encoding Face Images into a 128-Dimensional Vector

#3.1-Using a ConvNet to Compute Encodings

The FaceNet model takes a lot of data and a long time to train.This is merely impossible considering the constraints of having lot of data and computation power.Instead a pre-trained model will be used to have the encoding.

In [2]:
import zipfile
with zipfile.ZipFile("/content/facenet_keras.h5.zip","r") as zip_ref:
    zip_ref.extractall("/content")

In [3]:
from tensorflow.keras.models import model_from_json
json_file = open('/content/model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
model.load_weights('/content/facenet_keras.h5')

In [4]:
print(model.inputs)
print(model.outputs)

[<KerasTensor: shape=(None, 160, 160, 3) dtype=float32 (created by layer 'input_1')>]
[<KerasTensor: shape=(None, 128) dtype=float32 (created by layer 'Bottleneck_BatchNorm')>]


In [5]:
model.summary()

Model: "inception_resnet_v1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 160, 160, 3  0           []                               
                                )]                                                                
                                                                                                  
 Conv2d_1a_3x3 (Conv2D)         (None, 79, 79, 32)   864         ['input_1[0][0]']                
                                                                                                  
 Conv2d_1a_3x3_BatchNorm (Batch  (None, 79, 79, 32)  96          ['Conv2d_1a_3x3[0][0]']          
 Normalization)                                                                                   
                                                                                

# 3.2 Triplet loss

By computing the distance between two encodings and thresholding, we can determine if the two pictures represent the same person

So, an encoding is a good one if:

1.The encodings of two images of the same person are quite similar to each other.

2.The encodings of two images of different persons are very different.

The triplet loss function formalizes this, and tries to "push" the encodings of two images of the same person (Anchor and Positive) closer together, while "pulling" the encodings of two images of different persons (Anchor, Negative) further apart.

For an image $x$, its encoding is denoted as $f(x)$ , where $f$ is the function computed by the neural network.


![](https://raw.githubusercontent.com/amanchadha/coursera-deep-learning-specialization/3a623a00267716d1695e0ce57480f9027648ad4e/C4%20-%20Convolutional%20Neural%20Networks/Week%204/Face%20Recognition/images/f_x.png)


**Training will use triplets of images $(A,P,N)$:**

  1.A is an "Anchor" image--a picture of a person.

  2.P is a "Positive" image--a picture of the same person as the Anchor image.
  
  3.N is a "Negative" image--a picture of a different person than the Anchor image.


These triplets are picked from the training dataset. 
 $(A^{(i)},P^{(i)},N^{(i)})$  is used here to denote the $ith$ training example.

 We would like to make sure that an image $A^{(i)}$ of an individual is closer to the Positive $P^{(i)}$
 than to the Negative image 
$N^{(i)}$ by at least a margin $\alpha$ :
\begin{align}
|| f\left(A^{(i)}\right)-f\left(P^{(i)}\right)||_{2}^{2}+\alpha<|| f\left(A^{(i)}\right)-f\left(N^{(i)}\right)||_{2}^{2}$
\end{align}


We would like to minimize the triplet loss function $\mathcal{J}$:

\begin{align}
\mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3}
\end{align}


where $[z]_{+}$ is used to denote $max(z,0)$







In [6]:
def triplet_loss(y_true, y_pred, alpha = 0.2):
    """
    Implementation of the triplet loss as defined by formula (3)
    
    Arguments:
    y_true -- true labels, required when you define a loss in Keras
    y_pred -- python list containing three objects:
            anchor -- the encodings for the anchor images, of shape (None, 128)
            positive -- the encodings for the positive images, of shape (None, 128)
            negative -- the encodings for the negative images, of shape (None, 128)
    
    Returns:
    loss -- real number, value of the loss
    """
    
    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,positive)),axis=-1)
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)),axis=-1)
    basic_loss = tf.maximum(tf.add(tf.subtract(pos_dist,neg_dist),alpha),0)
    loss = tf.reduce_sum(basic_loss)
    return loss

#4.Loading the Pre-trained Model

In [7]:
FRmodel = model

# 5.Apply the model

**5.1 Face Verification**:

It's a 1:1 matching problem.A database containing one encoding vector for each person who is allowed to enter the office or any premises we would like to have a verification system.

A person will only be allowed to enter office or premises on swipping his/her ID card if it matches with the image stored in database.



In [8]:
def img_to_encoding(image_path, model):
    img = tf.keras.preprocessing.image.load_img(image_path, target_size=(160, 160))
    img = np.around(np.array(img) / 255.0, decimals=12)
    x_train = np.expand_dims(img, axis=0)
    embedding = model.predict_on_batch(x_train)
    return embedding / np.linalg.norm(embedding, ord=2)

In [15]:
# database of some  employees image encoding
database = {}
database["chris"] = img_to_encoding("/content/Chris-Hemsworth.jpg", FRmodel)
database["andrew"] = img_to_encoding("/content/andrew.jpg", FRmodel)
database["jeffry"] = img_to_encoding("/content/jeffrey.jpg", FRmodel)
database["johnny"] = img_to_encoding("/content/johnny.jpg", FRmodel)
database["kay"] = img_to_encoding("/content/keiv.jpg", FRmodel)
database["kian"] = img_to_encoding("/content/kian.jpg", FRmodel)
database["nicolas"] = img_to_encoding("/content/nicolas.jpg", FRmodel)
database["robert"] = img_to_encoding("/content/robert.jpg", FRmodel)
database["sarwar"] = img_to_encoding("/content/sar.jpg", FRmodel)
database["tom"] = img_to_encoding("/content/tom1.jpeg", FRmodel)

# Verify

Uppon entering the office premises an employee can swipe their ID card or a picture of him/her will be taken  and if it matches with an encoding stored in the database then he/she will be allowed to enter.

In [18]:
def verify(image_path, identity, database, model):
    """
    Function that verifies if the person on the "image_path" image is "identity".
    
    Arguments:
        image_path -- path to an image
        identity -- string, name of the person we'd like to verify the identity. Has to be an employee who works in the office.
        database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
        model --  Inception model instance in Keras
    
    Returns:
        dist -- distance between the image_path and the image of "identity" in the database.
        door_open -- True, if the door should open. False otherwise.
    """
    encoding = img_to_encoding(image_path,model) # encoding of a new image
    dist = np.linalg.norm(encoding - database[identity])
    if dist < 0.75:
        print("It's " + str(identity) + ", welcome in!")
        door_open = True
    else:
        print("It's not " + str(identity) + ", please go away")
        door_open = False
    return dist, door_open

In [19]:
verify("/content/sar.jpg", "sarwar", database, FRmodel)
verify("/content/tom.jpeg", "tom", database, FRmodel)

It's sarwar, welcome in!
It's tom, welcome in!


(0.71993434, True)

#5.2 Face Recognition
It's a 1:k matching problem ,given an image it will search through the databse if any encoding matched within a threshold the person will be let in.


In [20]:
def who_is_it(image_path, database, model):
    """
    Implements face recognition for the office by finding who is the person on the image_path image.
    
    Arguments:
        image_path -- path to an image
        database -- database containing image encodings along with the name of the person on the image
        model -- Inception model instance in Keras
    
    Returns:
        min_dist -- the minimum distance between image_path encoding and the encodings from the database
        identity -- string, the name prediction for the person on image_path
    """
    encoding = img_to_encoding(image_path,model)
    min_dist = 100
    
    for (name, db_enc) in database.items():
        
        dist = np.linalg.norm(encoding - db_enc)

        if dist < min_dist:
            min_dist = dist
            identity = name
    
    
    if min_dist > 0.7:
        print("Not in the database.")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
        
    return min_dist, identity

In [21]:
who_is_it("/content/tom1.jpeg", database, FRmodel)

it's tom, the distance is 0.0


(0.0, 'tom')

In [22]:
who_is_it("/content/sr1.jpg",database,FRmodel)

it's robert, the distance is 0.6777545


(0.6777545, 'robert')