# Face Recognition Prototype

This purpose of this notebook to experiment with Face Recognition. In particular the method use is that described in the Facenet paper by Florian Schroff, Dmitry Kalenichenko, and James Philbin (https://arxiv.org/abs/1503.03832).

I leverage some great work that has been done by David Sandberg who has shared an implementation of facenet here (https://github.com/davidsandberg/facenet). To run this notebook, clone this repository and adjust the system path to the location of facenet/src.

To perform the database search, I utilize the FAISS library from Facebook (https://github.com/facebookresearch/faiss). This repository should also be cloned. It is a C++ library with Python wrapper. Both must be build before it can be used.

This project also depends on Tensorflow. Installation instructions are here: https://www.tensorflow.org/install/

In [1]:
import os
import sys
import cv2
import time
import numpy as np
import pandas as pd

import tensorflow as tf
print(tf.__version__)

# Checkout facenet to same root directory as this repository.
sys.path.append("../facenet/src")
import facenet
import align.detect_face

1.8.0


# Import Faiss

1. Clone https://github.com/facebookresearch/faiss
2. brew install llvm
3. copy one of the makefiles in faiss/example_makefiles (I chose makefile.inc.Mac.brew) and rename it to makefile.inc.
4. In IndexScalarQuantizer.cpp add the following:
```c++
  #ifndef __clang__
  #include <malloc.h>
  #endif
```

4. Then "make" to build C++ library.
6. Then "make py" to build python wrapper.

In [2]:
sys.path.append("../faiss")
import faiss 

# Load Embedding database
Currently using the Labeled Faces in the Wild data set to experiment around with. This comprises about 13,000 photos of 6500 different identities. Generating the embedding for this is very time consuming (approx 16 hours on macbook pro), so this has been done and results saved in csv file to reload easily.  

To generate embeddings on a different dataset run "Build Face Database" notebook. 

The code below loads from csv file and loads up the Faiss index. Results from Faiss index return the index of the embedding. To look up the name find the same index in the face_identities array.

In [3]:
df = pd.read_csv("faces_cv2.csv")

face_identities = []
face_index = faiss.IndexFlatL2(128)

for _, row in df.iterrows():
    identity = row['id']
    embedding = row.iloc[1:129].as_matrix().astype('float32')
    embedding = np.ascontiguousarray(embedding.reshape(1, 128))
    face_index.add(embedding)
    face_identities.append(identity)

# Load Face Detection and Embedding Neural Networks.

1. Face Detection:
Face detection is done by feeding an image through a MTCNN (Multi task Convolutional Neural Network).  This process is described in this paper by Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao (https://arxiv.org/abs/1604.02878).

This implementation comes from David Sandberg facenet.


2. Face Embedding:
Once a face has been detected, the face is fed through another neural network to generate the face embedding.  An embedding is a 128 float vector which can be compared with embeddings from other faces.   Embeddings that are simliar (i.e. Eucleadean space) represent the same person.

This model is an Inception ResNet v1 architecture. This Neural Network has been trained on the MS-Celeb-1M data set using triplet loss.

Before running this cell, download the model at https://drive.google.com/open?id=1uFRZtwdJbu1ND_y1Abj6okCtdjhLig6M and unzip in a models subdirectory.

Implementation Detail:
Both Neural Networks are loaded into the same tensorflow graph to greatly improve performance.

In [4]:
MODEL_FILE = "./20170512-110547/20170512-110547.pb"

facenet_graph = tf.Graph()
with facenet_graph.as_default():
    facenet_graph_def = tf.GraphDef()
    with tf.gfile.GFile(MODEL_FILE, 'rb') as fid:
        serialized_graph = fid.read()
        facenet_graph_def.ParseFromString(serialized_graph)            
        tf.import_graph_def(facenet_graph_def, name='enet')
        
        sess = tf.Session()
        with sess.as_default():
            enet = lambda img : sess.run(('enet/embeddings:0'), feed_dict={'enet/input:0':img, 'enet/phase_train:0':False})
            pnet, rnet, onet = align.detect_face.create_mtcnn(sess, None)       

# Helper Functions

In [5]:
# Face Detection constants.
MIN_FACE_SIZE = 20                     # minimum size of the face for the MTCNN
DETECT_THRESHOLDS = [ 0.6, 0.7, 0.7 ]  # threshold values for the three stages of the MTCNN
SCALE_FACTOR = 0.709                   # MTCNN scale factor

# Face Embedding constants.
INPUT_IMAGE_SIZE = 160

# This function normalizes the image before generating the embedding.
def run_facenet(image):
    image_data = np.around(image/255.0, decimals=12)
    image_data = np.expand_dims(image_data, axis=0)
    return enet(image_data)

def register_face(image, name):
    # Remove alpha.
    image = image[:,:,0:3]
    
    # get image dimensions for later calculations.
    height, width = image.shape[0:2]
    
    # Find bounding boxes for all faces.  Ignore facial landmarks for now.
    bb, landmarks = align.detect_face.detect_face(image, MIN_FACE_SIZE, pnet, rnet, onet, DETECT_THRESHOLDS, SCALE_FACTOR)

    faces = bb.shape[0]
    boxes = np.zeros((faces, 4), dtype=np.int32)
    
    if (faces == 1):
        boxes[0, 0] = np.maximum(bb[0, 0], 0)
        boxes[0, 1] = np.maximum(bb[0, 1], 0)
        boxes[0, 2] = np.minimum(bb[0, 2], width)
        boxes[0, 3] = np.minimum(bb[0, 3], height)

        # Crop and scale for input into embedding neural network (160x160)
        cropped = image[boxes[0, 1]:boxes[0, 3],boxes[0, 0]:boxes[0, 2],:]
        scaled = cv2.resize(cropped, (INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE), interpolation=cv2.INTER_LINEAR) 
        
        embedding = run_facenet(scaled)
        
        face_index.add(embedding)
        face_identities.append(name)
        return "Registered.", scaled 
    else:
        return "More than one face detected: " + str(faces), None

def identify_faces_in_image(image):   
    # Remove alpha.
    image = image[:,:,0:3]
    
    # get image dimensions for later calculations.
    height, width = image.shape[0:2]
    
    # Find bounding boxes for all faces.
    # Note, flip image colors as this is what detect_face expects.
    bb, landmarks = align.detect_face.detect_face(image, MIN_FACE_SIZE, pnet, rnet, onet, DETECT_THRESHOLDS, SCALE_FACTOR)

    faces = bb.shape[0]
    
    # Allocate the bounding boxes and embeddings for each face.
    boxes = np.zeros((faces, 4), dtype=np.int32)
    embeddings = np.zeros((faces, 128), dtype=np.float32)
    
    for faceIx in range(0, faces):    
        # Make sure box is in size of image.
        boxes[faceIx, 0] = np.maximum(bb[faceIx, 0], 0)
        boxes[faceIx, 1] = np.maximum(bb[faceIx, 1], 0)
        boxes[faceIx, 2] = np.minimum(bb[faceIx, 2], width)
        boxes[faceIx, 3] = np.minimum(bb[faceIx, 3], height)

        # Crop and scale for input into embedding neural network (160x160)
        cropped = image[boxes[faceIx, 1]:boxes[faceIx, 3],boxes[faceIx, 0]:boxes[faceIx, 2],:]
        scaled = cv2.resize(cropped, (INPUT_IMAGE_SIZE, INPUT_IMAGE_SIZE), interpolation=cv2.INTER_LINEAR) 
        
        # Generate the embedding.
        embeddings[faceIx,:] = run_facenet(scaled)

    # Search through face_index to find the indicies into the identities array, and the distance.
    distances, indicies = face_index.search(embeddings, 1)
    
    return boxes, landmarks, indicies, distances, embeddings

# Register User

In [11]:
name = "Jeff Watts"

vc = cv2.VideoCapture(0)
register_count = 10

while register_count>0:
    time.sleep(.5)
    is_capturing, frame = vc.read()
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    message, registered_face = register_face(image, name)
    print(message)
    if (registered_face is not None):
        registered_face = cv2.cvtColor(registered_face, cv2.COLOR_RGB2BGR)
        cv2.imwrite("register" + str(register_count) + ".png", registered_face)
    register_count -= 1
    
vc.release()
print("Done")

Registered.
Registered.
Registered.
Registered.
Registered.
Registered.
Registered.
Registered.
Registered.
Registered.
Done


# Identify Faces

In [13]:
vc = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    is_capturing, frame = vc.read()
    #image = frame[...,::-1]
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    boxes, landmarks, indicies, distances, embeddings = identify_faces_in_image(image)

    faces = boxes.shape[0]

    for faceIx in range(0, faces):
        identity = face_identities[indicies[faceIx,0]]
        distance = distances[faceIx, 0]
        if (distance > 0.6):
            identity = "Unknown"

        xmin = boxes[faceIx,0]
        ymin = boxes[faceIx,1]
        xmax = boxes[faceIx,2]
        ymax = boxes[faceIx,3]

        cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (255, 165, 20), 4)
        
        text = identity + ": " + str(distance) 
        cv2.putText(frame, text, (xmin, ymin-15),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 165, 20), 2)
        
        faceIx_landmarks = landmarks[:,faceIx]
        
        for landmarkIx in range(0, int(faceIx_landmarks.shape[0]/2)):
            x = faceIx_landmarks[landmarkIx]
            y = faceIx_landmarks[landmarkIx+5]
            cv2.circle(frame, (x,y), 2, (255, 165, 20), thickness=2) 
        
    # Display the resulting frame
    cv2.imshow('Video', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything is done, release the capture
vc.release()
cv2.destroyAllWindows()

KeyboardInterrupt: 