# Computer Vision: Face Recognition

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext watermark
%watermark -v -m -p numpy,pandas,sklearn,cv2 -g

import os
import sys
import pickle
import argparse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import watermark
from tqdm import tqdm, tqdm_notebook

CPython 3.7.3
IPython 7.8.0

numpy 1.17.3
pandas 0.25.1
sklearn 0.21.3
cv2 4.1.1

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 19.0.0
machine    : x86_64
processor  : i386
CPU cores  : 16
interpreter: 64bit
Git hash   : bb97e071ca243a5b74f45496aa291524e912720d


## Face Recognition

In object recognition there are different algorithms and techniques and within the main ones we find:
- ImageAI (library)
- OpenCV (library)

- Single Shot Detectors
- YOLO (You only look once)
- Region-based Convolutional Neural Networks

In the algorithms there are two main categories:
- Two Step Detectors: typically slower but more accurate.
- Single Shot Detectors: faster but lossing accuracy compared to the two step detectors.


Main topics:
1. **Deep metric learning**
2. Facial embeddings
3. Application to still images as well as video streams

We will use python, OpenCV and Deep Learning to do an implementation of Face Recognition. Deep learning-based facial embeddings are both highly accurate and capable of being executed in real-time.


### How it works?

The big difference is **deep metric learning**. If you have used deep learning you've already seen that typically the process to train a neural network implies:
1. Accepting a single input image.
2. Generating an output classification or label for that image.

But in deep metric learning we instead generate a real-value **feature vector**. For example in a dlib facial recognition network, the output feature vector is 128d (a list with 128 real numeric values) and they quantify the face. And the way to train the neural network is done in **triplets**.

In this triplet of images two images are from the same person and the third from a different person (random). The idea behind is that the weights in our model will be adjusted so that the 128d embeddings of the 2 similar images will be closer to each other than the random person embedding.

Our network architecture for face recognition is based on **ResNet-34** from the Deep Residual Learning for Image Recognition paper by [He and Zhang](https://arxiv.org/abs/1512.03385), but with fewer layers and the number of filters reduced by half.

The network itself was trained by Davis King on a dataset of ~3 million images. On the [Labeled Faces in the Wild (LFW)](http://vis-www.cs.umass.edu/lfw/) dataset the network compares to other state-of-the-art methods, reaching 99.38% accuracy.

Both Davis King (the creator of dlib) and Adam Geitgey (the author of the face_recognition module we’ll be using shortly) have written detailed articles on how deep learning-based facial recognition works:

- [High Quality Face Recognition with Deep Metric Learning](http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html) (Davis)
- [Modern Face Recognition with Deep Learning](https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78) (Adam)


In [2]:
# The extra packages that we require include: dlib, face_recognition and imutils
# !{sys.executable} -m pip install --quiet dlib
# !{sys.executable} -m pip install --quiet face_recognition

# Do not run:
# > !pip install dlib
# as it may not install in the current active kernel

# If using conda:
# > !conda install --yes --prefix {sys.prefix} dlib

### Encoding faces with OpenCV and Deep Learning

- Create the 128-d embeddings for each face in the dataset: in our case we are using *dlib* which contains our implementation of “deep metric learning” used to construct our face embeddings.
- Use these embeddings to recognize the faces of the characters in both images and video streams.
- *face_recognition* library wraps dlib and make it easier to operate.

Using the pre-trained model, we use it to construct the 128-d embeddings for the faces in our dataset.

For the classification we can use a k-NN model (slower but more accurate) or hog model (faster but less accurate).

In [3]:
# Import the CV libraries
import cv2
from imutils import paths
import face_recognition

In [4]:
DETECTION_METHOD = 'cnn'       # cnn or hog
IMAGES_FOLDER = "./data/faces/"
imagePaths = list(paths.list_images(IMAGES_FOLDER))
ENCODINGS_OUTPUT = './data/face_encodings.pickle'
OUTPUT_FILE = "./data/usb_output"
DISPLAY = 1

# initialize the list of known encodings and known names
knownEncodings = []
knownNames = []

In [5]:
# We have 218 faces to loop through and incorporate the names and encodings
for (i, imagePath) in enumerate(imagePaths):
    # extract the person name from the image path
    if i % 25 == 0:
        print(f"[INFO] processing image {str(i + 1).zfill(3)}/{len(imagePaths)}...")
    name = imagePath.split(os.path.sep)[-2]

    # load the input image and convert it from RGB (OpenCV ordering) to dlib ordering (RGB)
    image = cv2.imread(imagePath)
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

    # detect the (x, y)-coordinates of the bounding boxes
    # corresponding to each face in the input image
    boxes = face_recognition.face_locations(rgb, model=DETECTION_METHOD)

    # compute the facial embedding for the face
    encodings = face_recognition.face_encodings(rgb, boxes)

    # loop over the encodings
    for encoding in encodings:
        # add each encoding + name to our set of known names and
        # encodings
        knownEncodings.append(encoding)
        knownNames.append(name)
print(f"[INFO] processing finished!")

[INFO] processing image 001/218...
[INFO] processing image 026/218...
[INFO] processing image 051/218...
[INFO] processing image 076/218...
[INFO] processing image 101/218...
[INFO] processing image 126/218...
[INFO] processing image 151/218...
[INFO] processing image 176/218...
[INFO] processing image 201/218...
[INFO] processing finished!


In [6]:
# Now we save the results to the hd
print("[INFO] serializing encodings...")
data = {"encodings": knownEncodings, "names": knownNames}
with open(ENCODINGS_OUTPUT, "wb") as encodings_file:
    encodings_file.write(pickle.dumps(data))

[INFO] serializing encodings...


Now we have all the images encoded in our encodings file

### Recognizing faces on images


In [7]:
# Load the previously calculated faces and embeddings
print("[INFO] loading encodings...")
with open(ENCODINGS_OUTPUT, "rb") as encodings_file:
    data = pickle.loads(encodings_file.read())

# load the input image and convert it from BGR to RGB
test_image = './data/faces_test/example_01.png'
image = cv2.imread(test_image)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

[INFO] loading encodings...


In [8]:
# detect the (x, y)-coordinates of the bounding boxes corresponding to each
# face in the input image, then compute the facial embeddings for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb, model=DETECTION_METHOD)
encodings = face_recognition.face_encodings(rgb, boxes)

# initialize the list of names for each face detected
names = []

[INFO] recognizing faces...


In [9]:
# loop over the facial embeddings
for encoding in encodings:
    # attempt to match each face in the input image to our known encodings
    matches = face_recognition.compare_faces(data["encodings"], encoding)
    name = "Unknown"
    
    # This function returns a list of True / False  values, one for each image in our dataset. 
    # therefore the returned list will have 218 boolean values.
    # Internally, the compare_faces function is computing the Euclidean distance between 
    # the candidate embedding and all faces in our dataset and assesing it with the defined tolerance

    # check to see if we have found a match
    if True in matches:
        # find the indexes of all matched faces then initialize a dictionary
        #  to count the total number of times each face was matched
        matchedIdxs = [i for (i, b) in enumerate(matches) if b]
        counts = {}

        # loop over the matched indexes and maintain a count for each recognized face face
        for i in matchedIdxs:
            name = data["names"][i]
            counts[name] = counts.get(name, 0) + 1

        # determine the recognized face with the largest number of votes (if there
        # is an unlikely tie, python will select first entry in the dictionary)
        name = max(counts, key=counts.get)

    # update the list of names
    names.append(name)

In [None]:
# loop over the recognized faces
progress = tqdm_notebook if in_ipython() else tqdm
for ((top, right, bottom, left), name) in progress(zip(boxes, names)):
    # draw the predicted face name on the image
    cv2.rectangle(image, (left, top), (right, bottom), (0, 255, 0), 2)
    y = top - 15 if top - 15 > 15 else top + 15
    cv2.putText(image, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)

# load image using cv2....and do processing.
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# as opencv loads in BGR format by default, we want to show it in RGB.
plt.show()

### Recognizing faces on video

Now we'll make sure we use hog or even OpenCV Haar cascades to avoid performance issues but the rest  of the script is similar to the one we have already seen.

In [13]:
import time
import imutils
from imutils.video import VideoStream

# Initialize the video stream and pointer to output video file, then allow the camera sensor to warm up
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()           # src=0 for our first camera
writer = None
time.sleep(2.0)

[INFO] starting video stream...


In [19]:
# loop over frames from the video file stream
while True:
    # grab the frame from the threaded video stream
    frame = vs.read()
    
    # convert the input frame from BGR to RGB then resize it to have
    # a width of 750px (to speedup processing)
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    rgb = imutils.resize(frame, width=750)
    r = frame.shape[1] / float(rgb.shape[1])

    # detect the (x, y)-coordinates of the bounding boxes
    # corresponding to each face in the input frame, then compute
    # the facial embeddings for each face
    boxes = face_recognition.face_locations(rgb, model=DETECTION_METHOD)
    encodings = face_recognition.face_encodings(rgb, boxes)
    names = []

    # loop over the facial embeddings
    for encoding in encodings:
        # attempt to match each face in the input image to our known encodings
        matches = face_recognition.compare_faces(data["encodings"], encoding)
        name = "Unknown"

        # check to see if we have found a match
        if True in matches:
            # find the indexes of all matched faces then initialize a
            # dictionary to count the total number of times each face was matched
            matchedIdxs = [i for (i, b) in enumerate(matches) if b]
            counts = {}

            # loop over the matched indexes and maintain a count for
            # each recognized face face
            for i in matchedIdxs:
                name = data["names"][i]
                counts[name] = counts.get(name, 0) + 1

            # determine the recognized face with the largest number of votes (in the event
            # of an unlikely tie Python, will select first entry in the dictionary)
            name = max(counts, key=counts.get)

        # update the list of names
        names.append(name)

    # loop over the recognized faces
    for ((top, right, bottom, left), name) in zip(boxes, names):
        # rescale the face coordinates
        top = int(top * r)
        right = int(right * r)
        bottom = int(bottom * r)
        left = int(left * r)

        # draw the predicted face name on the image
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(frame, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)

    # if the video writer is None *AND* we are supposed to write
    # the output video to disk initialize the writer
    if writer is None and OUTPUT_FILE is not None:
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter(OUTPUT_FILE, fourcc, 20, (frame.shape[1], frame.shape[0]), True)

    # if the writer is not None, write the frame with recognized faces t odisk
    if writer is not None:
        writer.write(frame)

    # check to see if we are supposed to display the output frame to the screen
    if DISPLAY > 0:
        plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        # as opencv loads in BGR format by default, we want to show it in RGB.
        plt.show()

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

KeyboardInterrupt: 