# TABLE OF CONTENT

0. [Introduction](#Introduction)<br>
1. [Setting up](#Setting-Up)<br>
2. [Training Stage](#Training-Stage)<br>
    2.1 [Creating encodings dataset](#Creating-encodings-from-the-dataset)<br>
    2.2 [Merging two dataset](#Merging-two-pickle-file-into-one)<br>
3. [Image FR](#Face-recognition-on-images)<br>
4. [Video FR](#Face-Recognition-on-a-Video)<br>
    4.1 [Frequency](#1.-Using-frequency)<br>
    4.2 [Area ratio](#2.-Using-area-ratio-relative-to-frame-size)<br>
    4.3 [Central](#3.-Using-Euclidean-distance-from-the-centre)<br>
5. [Quality Control](#Quality-Control)<br>
    5.1 [Check FPS](#To-check-video-FPS)<br>
    5.3 [Check Encodings Quality](#To-check-quality-of-the-dataset-encodings)<br>
6. [Face Clustering](#Face-Clustering)<br>
7. [Improvement Lists](#Improvement-Lists)

# Introduction 

This is a notebook providing the code and comments for the Face Recognition (FR) system for the Recommender System. The whole process of FR approach is highly depended on two library: dlib and face_recognition. The face_recognition library is built to ease the application of dlib in Python. Some of the code is also extracted from a blog specialised in computer vision. The creators of those two libraries have written articles about their libraries.<br>
<br>
References:<br>
1)	Dlib by Davis King: <br>http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html <br>
2)	Face Recognition by Adam Geitgey: <br>https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78<br>
3)  PyImageSearch by Adrian Rosebrock:<br> https://www.pyimagesearch.com/2018/06/18/face-recognition-with-opencv-python-and-deep-learning/<br>
<br>
It is to be understood that this face recognition system, although using the idea of deep neural network, is fairly different to the common process in using image classification CNNs such as Inception, VGG or ResNet. In fact, this system is a combination of CNNs, HoG (if you choose so), and kNN. Each models is use in combination in different stage of the face recognition.<br>
<br>
When an input image is feed into the system, a trained CNN (or HoG) detect the location of faces giving an output of a set of coordinates which represent the boxes around the faces. Next, the set of coordinates is preprocessed to crop out the face images which are input into another CNN that output a 128-D feature vector (known as encodings) for each face. These two CNN are trained differently to serve different purposes. <br>
<br>
The second CNN is trained in a method known as deep metric learning using triplet loss, where 3 images (2 of the same person, and 1 from a completely different person) are feeded as input. The training step will tweaks the neural network slightly so that it makes sure the distance of the encodings it generates for faces of the same person are slightly closer while making sure the distance of the encodings for faces of different persons are slightly further apart.  Any ten different pictures of the same person should give roughly the same encodings. This is the most important feature of the CNN that differentiate it from typical image classification CNN as it can produce encodings for faces that the CNN is not trained with.<br>
<br>
Well-known face recognition system by Google is also trained and implemented in this method.<br> FaceNet: https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf<br>
<br>
To simplify the whole face recognition system, we will first create an imaginary situation. Let's say we want to identify the faces of person A and person B while remembering the fact that the CNN is never trained with them. First, we get a few pictures of person A, feed it into the face recognition system and got an encodings as output. We labelled the encodings with the name of person A.  Then, we do the same for person B. This is known as the 'training phase' for the kNN. After that, we have a new picture with a single face which we don't know whether it belongs to person A or person B. The picture is feeded into the system and got a new encoding as output. We compare the distance between that encoding with the previous encoding we have. The encoding that has the smallest distance to the new encodings is identified as a match, and labelled with the name attached to the previous encodings. This is known as the 'predicting stage'.<br>
<br>
The advantages of this methodology are the the retraining of the CNN (which takes 24 hours to get a decent model) is not required for new faces while the number of pictures for that 'training stage' is significant less that commonly needed. There are definitely room for improvement which is listed down in the last section of the notebook.<br>
<br>

# Setting up

[Back to top](#TABLE-OF-CONTENT)

Installations guide:<br>

conda create --name <env_name> python=3.6<br>
conda install matplotlib<br>
conda install scikit-learn<br>
pip install opencv-python<br>
pip install dlib<br>
pip install face_recognition<br>
pip install imutils<br>

Current version:<br>

python == 3.6.6<br>
matplotlib == 2.2.3<br>
scikit-learn == 0.19.1 <br>
opencv-python == 3.4.2.17<br>
dlib == 19.15.0<br>
face_recognition == 1.2.2<br>
imutils == 0.4.6<br>

In [None]:
# import the necessary packages
from imutils import paths
from matplotlib import pyplot as plt
import face_recognition
import pickle
import cv2
import os
import time
import dlib

# Training Stage

## Creating encodings from the dataset

[Back to top](#TABLE-OF-CONTENT)

This is the 'training stage' of the face recognition system.<br>
<br>
1) Create a parent folder which subfolder is named after the person.<br>
2) Put all pictures for the specific person's faces in the subfolder with the corresponding names.<br>

In [None]:
# path to input directory of faces + images (parent folder)
input_path = "C:/Users/Administrator/Desktop/Dataset/part_5"
# path to serialized db of facial encodings
output_path = "C:/Users/Administrator/Desktop/5_new_100.pickle"
# face detection model to use: either `hog` or `cnn`, CNN if using GPU, use HoG for CPU (faster but less accurate)
detection_method = "cnn"

print("Detection method:" , detection_method+"." )

In [None]:
# grab the paths to the input images in our dataset
start = time.time()
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images(input_path))

# initialize the list of known encodings and known names
knownEncodings = []
knownNames = []

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
    # extract the person name from the image path
    print("[INFO] processing image {}/{}".format(i + 1, len(imagePaths)))
    name = imagePath.split(os.path.sep)[-2]
    # load the input image and convert it from RGB (OpenCV ordering) to dlib ordering (RGB)
    image = cv2.imread(imagePath)
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    # detect the (x, y)-coordinates of the bounding boxes corresponding to each face in the input image
    boxes = face_recognition.face_locations(rgb, model=detection_method)
    for ((top, right, bottom, left)) in boxes:
        print("boxing")
        # draw the predicted face name on the image
        cv2.rectangle(rgb, (left, top), (right, bottom), (0, 0, 255), 2)
        #print boxes to check unexpected detected object
        print("Boxes:", top, right, bottom, left)

    # show the output image
    # plt.imshow(rgb)
    # plt.show()
    #a = time.time()
    # compute the facial embedding for the face
    encodings = face_recognition.face_encodings(rgb, boxes)
    #b = time.time()-a
    #print("Image encoding time: " + str(round(b,2)) + " seconds")

    # loop over the encodings
    for encoding in encodings:
        # add each encoding + name to our set of known names and encodings
        knownEncodings.append(encoding)
        knownNames.append(name)

# dump the facial encodings + names to disk
print("[INFO] serializing encodings...")
data = {"encodings": knownEncodings, "names": knownNames}
f = open(output_path, "wb")
f.write(pickle.dumps(data))
f.close()

print("[INFO] done encodings!")
total_time = time.time()-start
print("Total time: " + str(round(total_time,2)) + " seconds")

## Merging two pickle file into one

[Back to top](#TABLE-OF-CONTENT)

In [None]:
# path to previous serialized db of facial encodings to produce new one 
# if there is no previous encodings, just put None
prev_output = None
# path to serialized db of facial encodings
new_output =  "/Users/Elwin/Desktop/FaceR/face-recognition-opencv/encodings/enc_my_2_1_jitter_100.pickle"

In [None]:
# This code is to merge two pickle file into one file
if prev_output != None:
    #previous_output
    file_1 = open(prev_output, "rb")
    pkl_1 = pickle.load(file_1)
    print("pkl_1 names: ")
    print(pkl_1["names"][0],pkl_1["names"][10],pkl_1["names"][20],pkl_1["names"][30],)
    file_1.close()
    #current output
    file_2 = open(output_path, "rb")
    pkl_2 = pickle.load(file_2)
    print("pkl_2 names: ")
    print(pkl_2["names"][0],pkl_2["names"][10],pkl_2["names"][20],pkl_2["names"][30])
    file_2.close()
    #new output
    for i in pkl_2["encodings"]:
        pkl_1["encodings"].append(i)
    for i in pkl_2["names"]:
        pkl_1["names"].append(i)
    print("pkl_1 names: ")
    print(pkl_1["names"])
    f = open(new_output, "wb")
    f.write(pickle.dumps(pkl_1))
    f.close()
else:
    print("No previous encodings!")

In [None]:
f = open(new_output, "rb")
pkl_file = pickle.load(f)
f.close()
for i in enumerate(pkl_file["names"]):
    print(i)

# Face recognition on images

[Back to top](#TABLE-OF-CONTENT)

In [None]:
# path to serialized db of facial encodings
input_encodings = "C:/Users/Administrator/Desktop/5_ori_1.pickle"
# path to input image
image_path = "C:/Users/Administrator/Desktop/test_2.jpg"
# just in case if updating face detection model to use: either `hog` or `cnn`
detection_method = "cnn"
# how much distance between faces to consider it a match, lower is more strict
# based on investigation, Asian face seems to need stricter tolerance around 0.3 to 0.5
tolerance = 0.5
# changing the pyplot figure size
plt.rcParams['figure.figsize'] = [10, 10]

In [None]:
start = time.time()
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(input_encodings, "rb").read())
# time
time_1 = time.time()
time_x = time_1 - start
print("Loading encodings: " + str(round(time_x,4)) + " seconds")

# load the input image and convert it from BGR to RGB
image = cv2.imread(image_path)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# detect the (x, y)-coordinates of the bounding boxes corresponding
# to each face in the input image, then compute the facial embeddings
# for each face
print("[INFO] recognizing faces...")
boxes = face_recognition.face_locations(rgb, model=detection_method)
print(boxes)
# time
time_2 = time.time()
time_x = time_2 - time_1
print("Locating faces: " + str(round(time_x,4)) + " seconds")

print("[INFO] encoding faces...")
encodings = face_recognition.face_encodings(rgb, boxes)
# time
time_3 = time.time()
time_x = time_3 - time_2
print("Producing encodings: " + str(round(time_x,4)) + " seconds")

print("[INFO] comparing faces...")
# initialize the list of names for each face detected
names = []
# loop over the facial embeddings
for encoding in encodings:
    # attempt to match each face in the input image to our known encodings
    matches = face_recognition.compare_faces(data["encodings"], encoding, tolerance )
    distance = face_recognition.face_distance(data["encodings"], encoding)
    name = "Unknown"
    # check to see if we have found a match
    if True in matches:
        # find the indexes of all matched faces then initialize a
        # dictionary to count the total number of times each face
        # was matched
        matchedIdxs = [i for (i, a) in enumerate(matches) if a]
        print(matchedIdxs)
        counts = {}

        # loop over the matched indexes and maintain a count for
        # each recognized face
        for i in matchedIdxs:
            name = data["names"][i]
            print("ID-name pair:", i, name, round(distance[i],2))
            counts[name] = counts.get(name, 0) + 1

        # determine the recognized face with the largest number of
        # votes (note: in the event of an unlikely tie Python will
        # select first entry in the dictionary)
        name = max(counts, key=counts.get)
        print("Final name:", name)
    # update the list of names
    names.append(name)
# time
time_4 = time.time()
time_x = time_4 -time_3
print("kNN: " + str(round(time_x,4)) + " seconds")

# loop over the recognized faces
for ((top, right, bottom, left), name) in zip(boxes, names):
    # draw the predicted face name on the image
    cv2.rectangle(rgb, (left, top), (right, bottom), (0, 0, 255), 2)
    y = top - 15 if top - 15 > 15 else top + 15
    cv2.putText(rgb, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 255), 2)
# time
time_5 = time.time()
time_x= time_5-time_4
print("Labeling images: " + str(round(time_x,4)) + " seconds")

# show the output image
plt.imshow(rgb)
plt.show()
# time
time_6 = time.time()
time_x= time_6-time_5
print("Display image: " + str(round(time_x,4)) + " seconds")

total_time = time.time()-start
print("Total time: " + str(round(total_time,4)) + " seconds")

# Face Recognition on a Video

[Back to top](#TABLE-OF-CONTENT)

## 1. Using frequency

[Back to top](#TABLE-OF-CONTENT)

In [None]:
# path to serialized db of facial encodings
input_encodings = "C:/Users/Administrator/Desktop/my_1&2_combine_jitter_1.pickle"
# path to input video,
video_path = "C:/Users/Administrator/Desktop/meletop.mp4"
# path to output video, None if no output video required
output_path = "C:/Users/Administrator/Desktop/meletop_detected.mp4"
# whether or not to display output frame to screen, 1 display, 0 does not
display = 0
# just in case if updating face detection model to use: either `hog` or `cnn`
detection_method = "cnn"
# how much distance between faces to consider it a match, lower is more strict
tolerance = 0.5
# changing the pyplot figure size
plt.rcParams['figure.figsize'] = [10, 10]
# number of frame to skip every iteration
skip = 25
# path to text output for each names
text_output = "C:/Users/Administrator/Desktop/result_jitter_1.txt"
# path to saved output frame as image directory, None if not required.
image_path = None
# the threshold value for a frequency percentage to be deemed significant
freq_tolerance = 0
# frame speed for the output video
frame_speed = 5

print("Defined variable," , detection_method+", skip="+str(skip)+", frequency tolerance="+str(freq_tolerance) + "." )

In [None]:
start = time.time()
# load the known faces and embeddings
print("[INFO] loading encodings...")
data = pickle.loads(open(input_encodings, "rb").read())
time_1 = time.time()
time_x = time_1 - start
print("Loading encodings: " + str(round(time_x,4)) + " seconds")

# initialize the pointer to the video file and the video writer
print("[INFO] processing video...")
stream = cv2.VideoCapture(video_path)
writer = None
frame_num = 0
freq = {}
# loop over frames from the video file stream
time_2 = time.time()
time_x = time_2 - time_1
print("Initialize video capture: " + str(round(time_x,4)) + " seconds")
while True:
    time_1=time.time()
    # grab the next frame
    (grabbed, frame) = stream.read()
    frame_num += 1
    if frame_num % skip != 0:    #skipping through n number of frames
        if frame is not None:   #as long as there is a frame
            continue
        else:
            break
    print(frame_num)
    time_2 = time.time()
    time_x = time_2 - time_1
    print("Grabbing the right frame: " + str(round(time_x,4)) + " seconds")
    # convert the input frame from BGR to RGB then resize it to have
    # we can use imutils resize function to change the width to 750px (to speedup processing) if needed
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    r = frame.shape[1] / float(rgb.shape[1])
    
    # detect the (x, y)-coordinates of the bounding boxes
    # corresponding to each face in the input frame, then compute
    # the facial embeddings for each face
    boxes = face_recognition.face_locations(rgb, model=detection_method)
    time_3 = time.time()
    time_x = time_3 - time_2
    print("Locating faces: " + str(round(time_x,4)) + " seconds")
    encodings = face_recognition.face_encodings(rgb, boxes)
    time_4 = time.time()
    time_x = time_4 - time_3
    print("Encoding faces: " + str(round(time_x,4)) + " seconds")
    names = []
    time_5 = time.time()
    # loop over the facial embeddings
    for encoding in encodings:
        # attempt to match each face in the input image to our known
        # encodings
        matches = face_recognition.compare_faces(data["encodings"], encoding, tolerance)
        time_5 = time.time()
        time_x = time_5 - time_4
        print("kNN: " + str(round(time_x,4)) + " seconds")
        name = "Unknown"

        # check to see if we have found a match
        if True in matches:
            # find the indexes of all matched faces then initialize a
            # dictionary to count the total number of times each face
            # was matched
            matchedIdxs = [i for (i, b) in enumerate(matches) if b]
            counts = {}

            # loop over the matched indexes and maintain a count for
            # each recognized face face
            for i in matchedIdxs:
                name = data["names"][i]
                counts[name] = counts.get(name, 0) + 1

            # determine the recognized face with the largest number
            # of votes (note: in the event of an unlikely tie Python
            # will select first entry in the dictionary)
            name = max(counts, key=counts.get)
            
        # update the list of names
        names.append(name)
    time_6 = time.time()
    time_x = time_6 - time_5
    print("Getting name from kNN: " + str(round(time_x,4)) + " seconds")
    
    # introducing a frequency counter after the faces are identified
    
    # loop over the recognized faces
    for ((top, right, bottom, left), name) in zip(boxes, names):
        # rescale the face coordinates
        top = int(top * r)
        right = int(right * r)
        bottom = int(bottom * r)
        left = int(left * r)
        
        # draw the predicted face name on the image
        cv2.rectangle(rgb, (left, top), (right, bottom), (0, 255, 0), 2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(rgb, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)
        
        # draw the predicted face name on the image to save using OpenCV
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 255, 0), 2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(frame, name, (left, y), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 255, 0), 2)
        
        freq[name] = freq.get(name, 0) + 1
        
# The end of processing the frame images
# Now back to output video

    # if the video writer is None *AND* we are supposed to write
    # the output video to disk initialize the writer
    if writer is None and output_path is not None:
        fourcc = cv2.VideoWriter_fourcc(*"MJPG")
        writer = cv2.VideoWriter(output_path, fourcc, frame_speed ,(frame.shape[1], frame.shape[0]), True)

    # if the writer is not None, write the frame with recognized faces to disk
    if writer is not None:
        writer.write(frame)
        time_7 = time.time()
        time_x = time_7 - time_6
        print("Output video: " + str(round(time_x,4)) + " seconds")

    # check to see if we are supposed to display the output frame to the screen
    if display > 0:
        plt.imshow(rgb)
        plt.show()
        
    # check to see if we are supposed to save the output frame as images
    if image_path != None
        image_full_path = image_path+str(frame_num)+".png"
        #cv2.imwrite(image_full_path, frame)

#output the result into text file
f = open(text_output,'a')
iter_num = frame_num/skip
for key, val in freq.items():
    freq_ratio = round(val/iter_num*100,2)
    if key is "Unknown":
        continue
    elif freq_ratio > freq_tolerance:
        f.write(str(key)+" "+str(freq_ratio)+"%" +'\n')
        
f.write("Total number of frames: "+str(frame_num-1)+'\n')
f.write("Total number of iterations: "+str(iter_num)+'\n')
f.close()
time_8 = time.time()
time_x = time_8 - time_7
print("Output text file: " + str(round(time_x,4)) + " seconds")

# close the video file pointers
stream.release()

# check to see if the video writer point needs to be released
if writer is not None:
    writer.release()
    
total_time = time.time()-start
print("Total time: " + str(round(total_time,2)) + " seconds")

## 2. Using area ratio relative to frame size

[Back to top](#TABLE-OF-CONTENT)

These codes are not as detailed as the video FR using frequency, some changes are necessary to be as functional as the code that uses frequency.

## 3. Using Euclidean distance from the centre

[Back to top](#TABLE-OF-CONTENT)

# Quality Control

## To check video FPS

[Back to top](#TABLE-OF-CONTENT)

In [None]:
start = time.time()
# path to input video,
video_path = "/Users/Elwin/Desktop/FaceR/face-recognition-opencv/demo/demo_my_outlier/jep.mp4"

# initialize the pointer to the video file
stream = cv2.VideoCapture(video_path)
fps = stream.get(cv2.CAP_PROP_FPS)
length = stream.get(cv2.CAP_PROP_FRAME_COUNT)

# close the video file pointers
stream.release()
print("FPS: " + str(fps))
print("Number of frames: " + str(length))
    
total_time = time.time()-start
print("Total time: " + str(round(total_time,2)) + " seconds")

## To check quality of the dataset encodings

[Back to top](#TABLE-OF-CONTENT)

In [None]:
# path to serialized db of facial encodings
check = "C:/Users/Administrator/Desktop/5_ori_1.pickle"

#previous_output
file_1 = open(check, "rb")
pkl_1 = pickle.load(file_1)
file_1.close()
test_count = 0
for i,j in enumerate(pkl_1["encodings"]):        
    encoding = j
    name = pkl_1["names"][i]

    matches = face_recognition.face_distance(pkl_1["encodings"], encoding)

    # to check if similar names has too big difference
    for k,m in enumerate(matches):
        if m < 0.4:
            if i != k and pkl_1["names"][k] != name:
                test_count +=1
                print("Fault",test_count,":",i, pkl_1["names"][i])
                print(k, pkl_1["names"][k],round(m,2), "Very confident")
                print("\n")

In [None]:
name = None
number = 0
for i, t in enumerate(pkl_1["names"]):
    if t != name:
        number = 1
        name = t
    else: 
        number +=1
    if i == 69:
        print(i, t, number)

In [None]:
file_1 = open(output_path, "rb")
pkl_1 = pickle.load(file_1)
file_1.close()
print(len(pkl_1["encodings"]))



# Face Clustering

[Back to top](#TABLE-OF-CONTENT)

In [None]:
# path to input directory of faces + images
input_path = ""
# path to serialized db of facial encodings
output_path = "/Users/Elwin/Desktop/FaceR/face-recognition-opencv/encodings/enc_clus_2.pickle"
# face detection model to use: either `hog` or `cnn`
detection_method = "cnn"
# changing the pyplot figure size
plt.rcParams['figure.figsize'] = [10, 10]
# path to serialized db of facial encodings
input_encodings = output_path
# 1 to display the image, 0 does not 
display = 0
# of parallel jobs to run (-1 will use all CPUs)
job = -1

In [None]:
# grab the paths to the input images in our dataset, then initialize
# out data list (which we'll soon populate)
start = time.time()
print("[INFO] quantifying faces...")
imagePaths = list(paths.list_images(input_path))
data = []

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
# load the input image and convert it from RGB (OpenCV ordering)
# to dlib ordering (RGB)
    print("[INFO] processing image {}/{}".format(i + 1, len(imagePaths)))
    print(imagePath)
    image = cv2.imread(imagePath)
    rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# detect the (x, y)-coordinates of the bounding boxes corresponding to each face in the input image
    boxes = face_recognition.face_locations(rgb, model=detection_method)
    for ((top, right, bottom, left)) in boxes:
        # draw the predicted face name on the image
        cv2.rectangle(rgb, (left, top), (right, bottom), (0, 0, 255), 2)

    # show the output image
    if display > 0:
        plt.imshow(rgb)
        plt.show()
    
    # compute the facial embedding for the face
    encodings = face_recognition.face_encodings(rgb, boxes)

    # build a dictionary of the image path, bounding box location, and facial encodings for the current image
    d = [{"imagePath": imagePath, "loc": box, "encoding": enc} for (box, enc) in zip(boxes, encodings)]
    data.extend(d)

# dump the facial encodings data to disk
print("[INFO] serializing encodings...")
f = open(output_path, "wb")
f.write(pickle.dumps(data))
f.close()

print("[INFO] done encodings!")
total_time = time.time()-start
print("Total time: " + str(round(total_time,2)) + " seconds")

In [None]:
from sklearn.cluster import DBSCAN
from imutils import build_montages

# load the serialized face encodings + bounding box locations from disk, to cluster them
print("[INFO] loading encodings...")
data = pickle.loads(open(input_encodings, "rb").read())
data = np.array(data)
encodings = [d["encoding"] for d in data]

# cluster the embeddings
print("[INFO] clustering...")
clt = DBSCAN(metric="euclidean", n_jobs=job)
clt.fit(encodings)

# determine the total number of unique faces found in the dataset
labelIDs = np.unique(clt.labels_)
numUniqueFaces = len(np.where(labelIDs > -1)[0])
print("[INFO] # unique faces: {}".format(numUniqueFaces))

# loop over the unique face integers
for labelID in labelIDs:
    # find all indexes into the `data` array that belong to the
    # current label ID, then randomly sample a maximum of 25 indexes
    # from the set
    print("[INFO] faces for face ID: {}".format(labelID))
    idxs = np.where(clt.labels_ == labelID)[0]
    idxs = np.random.choice(idxs, size=min(25, len(idxs)), replace=False)

    # initialize the list of faces to include in the montage
    faces = []

# loop over the sampled indexes
    for i in idxs:
        # load the input image and extract the face ROI
        image = cv2.imread(data[i]["imagePath"])
        rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        (top, right, bottom, left) = data[i]["loc"]
        face = rgb[top:bottom, left:right]

        # force resize the face ROI to 96x96 and then add it to the
        # faces montage list
        face = cv2.resize(face, (96, 96))
        faces.append(face)

    # create a montage using 96x96 "tiles" with 5 rows and 5 columns
    montage = build_montages(faces, (96, 96), (5, 5))[0]

    # show the output montage
    title = "Face ID #{}".format(labelID)
    title = "Unknown Faces" if labelID == -1 else title
    
    plt.rcParams['figure.figsize'] = [20, 20]
    # show the output image
    plt.imshow(montage)
    plt.show()

# Improvement Lists

[Back to top](#TABLE-OF-CONTENT)

1) It is understood that k-NN might be slow due to its lazy learning nature when the number of encodings is large. Thus, the next step is probably using approximate search to speed up the process (or even use another classifying algorithm altogether).<br><br>
2)	The tolerance has been tweaked around to see its effect on the result of the classifier. It is shown that lower tolerance i.e. stricter produce more accurate result, where smaller value of tolerance will result in the classifier being unable to identify any faces. <br><br>
3)	After a few experimentations, the current code is still maintained, but a possible improvement is giving back the name “unknown” when the number of votes does not reach a threshold value to reduce the number of false positive (where increased number of false positives is the result of reducing the number of false negatives by having low tolerance). This method can be implemented quickly but has no proper evaluation on whether it will actually improve the accuracy.<br> <br>
4)	The quality of encodings is quantified by comparing the distance of an encoding of a dataset to the other encodings of the same dataset. It can be seen that the instances where a pair of encodings from two different people having a distance lower than 0.5 is more often in Asian dataset in comparison to Caucasian dataset. This shows that the distance variability of the encodings in Asian dataset might not be as great as Caucasian dataset, leading to an assumption that the encoding-producing CNN network is the reason the face recognition network does not work well with Asian faces.<br><br>
5)	An Asian dataset has been found provided by Microsoft DeepGlint which has 2 datasets, one is a a cleaned smaller version of MS-Celeb-1M dataset (3.9M images, 87k classes), another is Asian Celeb dataset (2.8M images, 94k classes). The Asian Celeb dataset has been used to train the encodings CNN, but the distance variability is actually worse than the original model, even after second retraining after resizing the dataset from 400x400 to 150x150 to match the original Caucasian dataset which the CNN is initially trained with.<br> <br>
6)	Based on searches on existing face recognition algorithms, it is found that FaceNet (which is developed by Google Research) has the highest accuracy of 99.63% in Labelled Faces in the Wild (LFW) dataset evaluation. However, it should be noted that the TensorFlow (by David Sandberg) and OpenFace implementations of FaceNet have varied accuracy due to the differences in training set (and also the quality of the implementations might affect the result in some way that has not been clearly understood). Notable algorithms are DeepFace by Facebook, VGG Face by VGG Oxford Group, and Fusion (which is also developed by Facebook).<br>
<br>
