# Real-time Face Recognition

Let's apply all our previous knowledge into a real-time face recognition. We will try to continuously perform facial recognition on a video stream. In this example I will use an offline video as input, but in reality you can apply it using a webcam, a cctv's live camera feed, etc! 

In [15]:
import cv2
import matplotlib.pyplot as plt
import keras_vggface as kv
import modules.utils as utils
import tensorflow as tf
import os
import pandas as pd
import numpy as np
import nmslib

In [16]:
# Declare a FacePreprocess instance.
from modules.FacePreprocess import FacePreprocess
ssd_model = r'./models/ssd/deploy.prototxt.txt'
ssd_weights = r'./models/ssd/res10_300x300_ssd_iter_140000.caffemodel'
processor = FacePreprocess(ssd_model, ssd_weights)

In [17]:
# Use the facial embedding model you want to use
model = kv.VGGFace(
    model='resnet50', 
    include_top=False, 
    input_shape=(224, 224, 3), 
    pooling='avg'
)
input_size = (224, 224)

In [18]:
# importing our nmslib index tree
nmslib_path = './output/large_scale_face_recognition/'

# load id_list
id_list = pd.read_csv(nmslib_path + '/IDlist.csv')

# Euclidean distance
index_l2 = nmslib.init(method='hnsw', space='l2', data_type=nmslib.DataType.DENSE_VECTOR)
index_l2.loadIndex(nmslib_path + 'resnet50_index_l2.bin')

# Cosine similarity
index_cos = nmslib.init(method='hnsw', space='cosinesimil', data_type=nmslib.DataType.DENSE_VECTOR)
index_cos.loadIndex(nmslib_path + 'resnet50_index_cos.bin')

For easier evaluation, we will use a video with only one of the subjects (joy).

In [19]:
# original clip from https://youtu.be/Ia3x_X_OX58?si=aA5GdMpRGcCar2xF 
video_path = './dataset/test/test_2/joy.mp4'

## Capture frame-by-frame

We will use openCV's `VideoCapture` which will return the video frame by frame to try and recognize the person in the video. To test the accuracy, we will keep track of the predictions. We will also output a video showing the prediction results. 

In [20]:
results = pd.DataFrame(
    columns = ['count', 'irene', 'seulgi', 'wendy', 'joy', 'yeri'], 
    index = ['l2', 'cosinesimil']
)
results.fillna(0, inplace=True)

`count` will keep track the number of predictions made, and the columns for the other subjects will count how many times where they predicted as the person in the video.

In [21]:
# read: https://docs.opencv.org/3.4/dd/d9e/classcv_1_1VideoWriter.html#afec93f94dc6c0b3e28f4dd153bc5a7f0 
fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')

#### Euclidean dist. `l2`

In [22]:
dist = 'l2'

# load video
cap = cv2.VideoCapture(video_path)

# setup video writer
vid = './output/real_time_face_recognition/resnet50_l2.mp4'
out = cv2.VideoWriter() 
out.open(vid, fourcc, 24.0, (1920, 1080), True) # frame rate, frame size (w, h) must be same as input

# read video frame-by-frame
while(cap.isOpened()):
    ret, frame = cap.read()
    try:
        img = frame.copy()
        faces = processor.preproc(img)

        if len(faces)>0:
            for face in faces:
                results['count'][dist] += 1

                # target embeddings
                target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                target = np.array(target, dtype='f')
                target = np.expand_dims(target, axis=0)

                # predict
                neighbors, distances = index_l2.knnQueryBatch(target, k=1, num_threads=4)[0]
                
                # results
                name = id_list['name'][neighbors[0]]
                results[name][dist] += 1

            # add video frame
            top, bottom, left, right = face[1][0], face[1][1], face[1][2], face[1][3]
            cv2.putText(frame, str(name), (int(left), int(top-10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)
            cv2.rectangle(frame, (left, top), (right, bottom), (255, 255, 255), 1) 
        out.write(frame)

        # if you unblock this part a window will pop up and show the frames, but only works if you run with .py 
        # On jupyter notebooks the window will crash after the loop ends and you have to restart the kernel
        # cv2.imshow("frame", frame)
        # cv2.waitKey(1)
        
    except:
        break

cap.release()
out.release()
cv2.destroyAllWindows()

2023-10-21 16:43:37.693285: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


#### Cosine simil. `cosinesimil`

In [23]:
dist = 'cosinesimil'

# load video
cap = cv2.VideoCapture(video_path)

# setup video writer
vid = './output/real_time_face_recognition/resnet50_cosinesimil.mp4'
out = cv2.VideoWriter() 
out.open(vid, fourcc, 24.0, (1920, 1080), True) # frame rate, frame size (w, h) must be same as input

# read video frame-by-frame
while(cap.isOpened()):
    ret, frame = cap.read()
    try:
        img = frame.copy()
        faces = processor.preproc(img)

        if len(faces)>0:
            for face in faces:
                results['count'][dist] += 1

                # target embeddings
                target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                target = np.array(target, dtype='f')
                target = np.expand_dims(target, axis=0)

                # predict
                neighbors, distances = index_cos.knnQueryBatch(target, k=1, num_threads=4)[0]
                
                # results
                name = id_list['name'][neighbors[0]]
                results[name][dist] += 1

            # add video frame
            top, bottom, left, right = face[1][0], face[1][1], face[1][2], face[1][3]
            cv2.putText(frame, str(name), (int(left), int(top-10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)
            cv2.rectangle(frame, (left, top), (right, bottom), (255, 255, 255), 1) 
        out.write(frame)

        # if you unblock this part a window will pop up and show the frames, but only works if you run with .py 
        # On jupyter notebooks the window will crash after the loop ends and you have to restart the kernel
        # cv2.imshow("frame", frame)
        # cv2.waitKey(1)
        
    except:
        break

cap.release()
out.release()
cv2.destroyAllWindows()
results

Unnamed: 0,count,irene,seulgi,wendy,joy,yeri
l2,542,0,0,0,542,0
cosinesimil,542,0,0,0,542,0


The test video only contains Joy, so it looks like the prediction results are correct. Using both L2 and Cosine distance achieve similar results. 

## Full test

Let's benchmark the other videos in the test set too.

In [24]:
results_all = pd.DataFrame(
    columns = ['ID', 'Model', 'Dist.', 'True', 'False', 'Avg. Distance', 'Std. Distance'], 
)

In [25]:
for file in os.listdir('./dataset/test/test_2/'):
    video_path = './dataset/test/test_2/'+file
    id = file.replace('.mp4', '')

    cap = cv2.VideoCapture(video_path)
    count = {
        'l2':{'True': 0, 'False':0, 'conf':[]},
        'cosine':{'True': 0, 'False':0, 'conf':[]},
    }
    while(cap.isOpened()):
        ret, frame = cap.read()
        try:
            img = frame.copy()
            faces = processor.preproc(img)

            if len(faces)>0:
                for face in faces:
                    # target embeddings
                    target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                    target = np.array(target, dtype='f')
                    target = np.expand_dims(target, axis=0)

                    # l2
                    neighbors, distances = index_l2.knnQueryBatch(target, k=1, num_threads=4)[0]
                    name = id_list['name'][neighbors[0]]
                    if name == id:
                        count['l2']['True'] += 1
                        count['l2']['conf'].append(distances[0])
                    else:
                        count['l2']['False'] += 1

                    # cosinesimil
                    neighbors, distances = index_cos.knnQueryBatch(target, k=1, num_threads=4)[0]
                    name = id_list['name'][neighbors[0]]
                    if name == id:
                        count['cosine']['True'] += 1
                        count['cosine']['conf'].append(distances[0])
                    else:
                        count['cosine']['False'] += 1
        except:
            break
    cap.release()

    results_all.loc[len(results_all)] = [id, 'resnet50', 'l2', count['l2']['True'], count['l2']['False'], np.average(count['l2']['conf']), np.std(count['l2']['conf'])]
    results_all.loc[len(results_all)] = [id, 'resnet50', 'cosinesimil', count['cosine']['True'], count['cosine']['False'], np.average(count['cosine']['conf']), np.std(count['cosine']['conf'])]
results_all

Unnamed: 0,ID,Model,Dist.,True,False,Avg. Distance,Std. Distance
0,yeri,resnet50,l2,307,106,5638.69043,960.973389
1,yeri,resnet50,cosinesimil,383,30,0.262753,0.084763
2,seulgi,resnet50,l2,217,2,5110.183105,852.788635
3,seulgi,resnet50,cosinesimil,217,2,0.20578,0.043738
4,irene,resnet50,l2,158,97,5322.081543,799.970825
5,irene,resnet50,cosinesimil,165,90,0.269541,0.040678
6,wendy,resnet50,l2,136,72,8033.90625,1106.328613
7,wendy,resnet50,cosinesimil,146,62,0.34404,0.052192
8,joy,resnet50,l2,542,0,5262.92334,1023.09198
9,joy,resnet50,cosinesimil,542,0,0.238197,0.039151


In [26]:
accuracy = np.sum(results_all['True'])/(np.sum(results_all['True'])+np.sum(results_all['False']))*100
print('Accuracy: {:0.2f} %'.format(accuracy))

Accuracy: 85.92 %


In [28]:
output_path = './output/real_time_face_recognition/results.xlsx'
with pd.ExcelWriter(output_path, engine='openpyxl', mode='w') as writer:  
    results_all.to_excel(writer, sheet_name='default', index=False)

Although this notebook uses video as example, this code also works if you have a webcam/video feed, which will make it 'real-time'. You just need to change the input source of `cv2.VideoCapture`.

## Conclusion

In this notebook I showed how to perform facial recognition in real time. This has many real world applications, including surveillance, smart home automatic door, etc. I hope you find this useful for your projects!