# Improving Accuracy

It's not always the case that you have the perfect dataset and the perfect test subjects. Imagine a situation where you only have one picture/subject in your dataset, and you have to find the face of one person from thousands of images. The results can very quickly get messy. In this notebook, I will show a way to tackle this problem.

In [30]:
import cv2
import matplotlib.pyplot as plt
import keras_vggface as kv
import modules.utils as utils
import tensorflow as tf
import os
import pandas as pd
import numpy as np
import nmslib

In [31]:
# Declare a FacePreprocess instance.
from modules.FacePreprocess import FacePreprocess
ssd_model = r'./models/ssd/deploy.prototxt.txt'
ssd_weights = r'./models/ssd/res10_300x300_ssd_iter_140000.caffemodel'
processor = FacePreprocess(ssd_model, ssd_weights)

In [32]:
# Use the facial embedding model you want to use
model = kv.VGGFace(
    model='resnet50', 
    include_top=False, 
    input_shape=(224, 224, 3), 
    pooling='avg'
)
input_size = (224, 224)

In [33]:
# ==========================================================================
# Compute Embeddings
id_list = pd.DataFrame(columns=['name', 'file'])
embeddings = []
for id in os.listdir('./dataset/train/'):
    folder = './dataset/train/{}/'.format(id)
    
    # use img_1 for every subject
    file = 'img_1.jpg'
    try:
        filepath = os.path.join(folder, file)
        processed_img = processor.preproc(cv2.imread(filepath))[0][0]
        embedding = model.predict(utils.resize(processed_img, input_size), verbose=False)[0,:]

        id_list.loc[len(id_list.index)] = [id, filepath]
        embeddings.append(embedding)
    except:
        print('Failed to process/predict {}'.format(filepath))
print('='*10 + '\n{} images processed.'.format(len(embeddings)))

# ==========================================================================
# Initialize nmslib
index_time_params = {'M': 15, 'indexThreadQty': 4, 'efConstruction': 100, 'post' : 0}

# l2 dist.
index_l2 = nmslib.init(
    method = 'hnsw', # hierarchical navigable small world graph
    space = 'l2', # euclidean
    data_type = nmslib.DataType.DENSE_VECTOR
) 
index_l2.addDataPointBatch(embeddings)
index_l2.createIndex(index_time_params)

# cosine simil.
index_cos = nmslib.init(
    method = 'hnsw', # hierarchical navigable small world graph
    space = 'cosinesimil', # cosine
    data_type = nmslib.DataType.DENSE_VECTOR
) 
index_cos.addDataPointBatch(embeddings)
index_cos.createIndex(index_time_params)

2023-10-21 16:50:27.883128: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


5 images processed.


In [34]:
output_path = './output/real_time_face_recognition/results.xlsx'

Let's test this model on `./dataset/test/test_2/`

In [35]:
results_all = pd.DataFrame(
    columns = ['ID', 'Model', 'Dist.', 'True', 'False', 'Avg. Distance', 'Std. Distance'], 
)

for file in os.listdir('./dataset/test/test_2/'):
    video_path = './dataset/test/test_2/'+file
    id = file.replace('.mp4', '')

    cap = cv2.VideoCapture(video_path)
    count = {
        'l2':{'True': 0, 'False':0, 'conf':[]},
        'cosine':{'True': 0, 'False':0, 'conf':[]},
    }
    while(cap.isOpened()):
        ret, frame = cap.read()
        try:
            img = frame.copy()
            faces = processor.preproc(img)

            if len(faces)>0:
                for face in faces:
                    # target embeddings
                    target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                    target = np.array(target, dtype='f')
                    target = np.expand_dims(target, axis=0)

                    # l2
                    neighbors, distances = index_l2.knnQueryBatch(target, k=1, num_threads=4)[0]
                    name = id_list['name'][neighbors[0]]
                    if name == id:
                        count['l2']['True'] += 1
                        count['l2']['conf'].append(distances[0])
                    else:
                        count['l2']['False'] += 1

                    # cosinesimil
                    neighbors, distances = index_cos.knnQueryBatch(target, k=1, num_threads=4)[0]
                    name = id_list['name'][neighbors[0]]
                    if name == id:
                        count['cosine']['True'] += 1
                        count['cosine']['conf'].append(distances[0])
                    else:
                        count['cosine']['False'] += 1
        except:
            break
    cap.release()

    results_all.loc[len(results_all)] = [id, 'resnet50', 'l2', count['l2']['True'], count['l2']['False'], np.average(count['l2']['conf']), np.std(count['l2']['conf'])]
    results_all.loc[len(results_all)] = [id, 'resnet50', 'cosinesimil', count['cosine']['True'], count['cosine']['False'], np.average(count['cosine']['conf']), np.std(count['cosine']['conf'])]
results_all

Unnamed: 0,ID,Model,Dist.,True,False,Avg. Distance,Std. Distance
0,yeri,resnet50,l2,318,95,7122.841309,991.787048
1,yeri,resnet50,cosinesimil,338,75,0.33253,0.102192
2,seulgi,resnet50,l2,214,5,6040.395996,1006.721985
3,seulgi,resnet50,cosinesimil,214,5,0.245507,0.049272
4,irene,resnet50,l2,166,89,6901.102539,689.785156
5,irene,resnet50,cosinesimil,154,101,0.330749,0.034829
6,wendy,resnet50,l2,102,106,9797.87207,1324.106812
7,wendy,resnet50,cosinesimil,98,110,0.418905,0.062939
8,joy,resnet50,l2,541,1,7854.515625,1052.877441
9,joy,resnet50,cosinesimil,541,1,0.326842,0.036252


In [36]:
accuracy = np.sum(results_all['True'])/(np.sum(results_all['True'])+np.sum(results_all['False']))*100
print('Accuracy: {:0.2f} %'.format(accuracy))

Accuracy: 82.04 %


In [37]:
with pd.ExcelWriter(output_path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:  
    results_all.to_excel(writer, sheet_name='single sample', index=False)

As we can see, with only one image/subject, the accuracy isn't as great (For reference, with 5 images/subject the accuracy is 85.92%). In this case we only have 5 subjects in our dataset, so the results aren't as bad. But when you have more people in your dataset, the results will only get worse. These are methods you can use to improve prediction accuracy.

## Method 1: Add distance threshold

Every prediction will return two items, the nearest neighbours and the distances. The closer the distance, the more similar the faces are. We don't want predictions where the distances are too far, so let's set a threshold. 

#### Determine threshold

Considering the avg. and std. distance results can help determining the threshold for your dataset.

In [38]:
# l2
avg_l2 = np.average(results_all[results_all['Dist.']=='l2']['Avg. Distance'])
std_l2 = np.average(results_all[results_all['Dist.']=='l2']['Std. Distance'])
print('l2 --> avg: {:0.2f}   std: {:0.2f}'.format(avg_l2, std_l2))

# cosinesimil
avg_cos = np.average(results_all[results_all['Dist.']=='cosinesimil']['Avg. Distance'])
std_cos = np.average(results_all[results_all['Dist.']=='cosinesimil']['Std. Distance'])
print('cosinesimil --> avg: {:0.2f}   std: {:0.2f}'.format(avg_cos, std_cos))

l2 --> avg: 7543.35   std: 1013.06
cosinesimil --> avg: 0.33   std: 0.06


In [39]:
thld_l2 = avg_l2+std_l2
thld_cos = avg_cos+std_cos
print('l2 threshold: {:0.2f}\ncosinesimil threshold: {:0.2f}'.format(thld_l2, thld_cos))

l2 threshold: 8556.40
cosinesimil threshold: 0.39


Let's try adding the threshold to our predictions. This time, we will add a new class called `Unknown`. If the distance is above our threshold, we will classify it as `Unknown`.

In [40]:
results_1 = pd.DataFrame(
    columns = ['ID', 'Model', 'Dist.', 'True', 'False', 'Unknown', 'Avg. Distance', 'Std. Distance'], 
)

for file in os.listdir('./dataset/test/test_2/'):
    video_path = './dataset/test/test_2/'+file
    id = file.replace('.mp4', '')

    cap = cv2.VideoCapture(video_path)
    count = {
        'l2':{'True': 0, 'False':0, 'Unknown':0, 'conf':[]},
        'cosine':{'True': 0, 'False':0, 'Unknown':0, 'conf':[]},
    }
    while(cap.isOpened()):
        ret, frame = cap.read()
        try:
            img = frame.copy()
            faces = processor.preproc(img)

            if len(faces)>0:
                for face in faces:
                    # target embeddings
                    target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                    target = np.array(target, dtype='f')
                    target = np.expand_dims(target, axis=0)

                    # l2
                    neighbors, distances = index_l2.knnQueryBatch(target, k=1, num_threads=4)[0]
                    name = id_list['name'][neighbors[0]]
                    if distances[0] <= thld_l2: # add threshold 
                        if name == id:
                            count['l2']['True'] += 1
                            count['l2']['conf'].append(distances[0])
                        else:
                            count['l2']['False'] += 1
                    else:
                        count['l2']['Unknown'] += 1

                    # cosinesimil
                    neighbors, distances = index_cos.knnQueryBatch(target, k=1, num_threads=4)[0]
                    name = id_list['name'][neighbors[0]]
                    if distances[0] <= thld_cos: # add threshold 
                        if name == id:
                            count['cosine']['True'] += 1
                            count['cosine']['conf'].append(distances[0])
                        else:
                            count['cosine']['False'] += 1
                    else:
                        count['cosine']['Unknown'] += 1
        except:
            break
    cap.release()

    results_1.loc[len(results_1)] = [
        id, 'resnet50', 'l2', 
        count['l2']['True'], count['l2']['False'], count['l2']['Unknown'], 
        np.average(count['l2']['conf']), np.std(count['l2']['conf'])
    ]
    results_1.loc[len(results_1)] = [
        id, 'resnet50', 'cosinesimil', 
        count['cosine']['True'], count['cosine']['False'], count['l2']['Unknown'], 
        np.average(count['cosine']['conf']), np.std(count['cosine']['conf'])
    ]
results_1

Unnamed: 0,ID,Model,Dist.,True,False,Unknown,Avg. Distance,Std. Distance
0,yeri,resnet50,l2,295,67,51,6944.459473,769.410645
1,yeri,resnet50,cosinesimil,309,65,51,0.304352,0.038588
2,seulgi,resnet50,l2,214,0,5,6040.395996,1006.721985
3,seulgi,resnet50,cosinesimil,213,0,5,0.24481,0.048326
4,irene,resnet50,l2,164,88,3,6878.487793,662.611389
5,irene,resnet50,cosinesimil,146,92,3,0.326389,0.030051
6,wendy,resnet50,l2,19,4,185,8247.113281,225.731934
7,wendy,resnet50,cosinesimil,37,3,185,0.361471,0.016875
8,joy,resnet50,l2,408,0,134,7420.274414,796.087402
9,joy,resnet50,cosinesimil,506,0,134,0.321764,0.031645


In [41]:
accuracy = np.sum(results_1['True'])/(np.sum(results_1['True'])+np.sum(results_1['False']))*100
print('Accuracy: {:0.2f} %'.format(accuracy))

with pd.ExcelWriter(output_path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:  
    results_1.to_excel(writer, sheet_name='threshold', index=False)

Accuracy: 87.87 %


As we can see, simply by adding a distance threshold, our accuracy improved so much. The prediction is even better than having 5 images/subject in our dataset. With this method, although we make less `True` predictions, we can be more sure of our predictions.

## Method 2: Double verification

For this method, we will use both distance models at once. If both models return the same ID, we'll add it to the tally, otherwise we'll count it as unknown.

![fig](./assets/fig3.svg)

In [42]:
results_2 = pd.DataFrame(
    columns = ['ID', 'Model', 'True', 'False', 'Unknown'], 
)

for file in os.listdir('./dataset/test/test_2/'):
    video_path = './dataset/test/test_2/'+file
    id = file.replace('.mp4', '')

    cap = cv2.VideoCapture(video_path)
    count = {'True': 0, 'False':0, 'Unknown':0}
    while(cap.isOpened()):
        ret, frame = cap.read()
        try:
            img = frame.copy()
            faces = processor.preproc(img)

            if len(faces)>0:
                for face in faces:
                    # target embeddings
                    target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                    target = np.array(target, dtype='f')
                    target = np.expand_dims(target, axis=0)

                    # query
                    n_l2, d_l2 = index_l2.knnQueryBatch(target, k=1, num_threads=4)[0]
                    n_cos, d_cos = index_cos.knnQueryBatch(target, k=1, num_threads=4)[0]

                    # classify
                    if n_l2[0] == n_cos[0]:
                        name = id_list['name'][n_l2[0]]
                        if name == id:
                            count['True'] += 1
                        else:
                            count['False'] += 1
                    else:
                        count['Unknown'] += 1
        except:
            break
    cap.release()

    results_2.loc[len(results_2)] = [
        id, 'resnet50', count['True'], count['False'], count['Unknown'], 
    ]
results_2

Unnamed: 0,ID,Model,True,False,Unknown
0,yeri,resnet50,315,62,36
1,seulgi,resnet50,214,3,2
2,irene,resnet50,154,89,12
3,wendy,resnet50,95,97,16
4,joy,resnet50,541,1,0


In [43]:
accuracy = np.sum(results_2['True'])/(np.sum(results_2['True'])+np.sum(results_2['False']))*100
print('Accuracy: {:0.2f} %'.format(accuracy))

with pd.ExcelWriter(output_path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:  
    results_2.to_excel(writer, sheet_name='double verification', index=False)

Accuracy: 83.96 %


The results are a little bit different than the first method. The accuracy here is lower, but with this method we have more `True` predictions (pay attention to the predictions for Wendy), with improved accuracy (compared to only using one image/subject without additional methods). 

Note that using this method comes at a higher computational cost, so it is up to you to consider if it is worth it. In this case, the accuracy only improved by a bit, however in my own project this method greatly improved the accuracy, which I think is very worth the additional costs.

## Method 3: Threshold + Double Verification

To be even stricter, let's see what happens when we combine both methods. The conditions for the prediction to be `True` are:

1. Both distances has to be less than equal to the threshold.
2. Both predictions has to return the same ID

In [44]:
results_3 = pd.DataFrame(
    columns = ['ID', 'Model', 'True', 'False', 'Unknown'], 
)

for file in os.listdir('./dataset/test/test_2/'):
    video_path = './dataset/test/test_2/'+file
    id = file.replace('.mp4', '')

    cap = cv2.VideoCapture(video_path)
    count = {'True': 0, 'False':0, 'Unknown':0}
    while(cap.isOpened()):
        ret, frame = cap.read()
        try:
            img = frame.copy()
            faces = processor.preproc(img)

            if len(faces)>0:
                for face in faces:
                    # target embeddings
                    target = model.predict(utils.resize(face[0], input_size), verbose=False)[0,:]
                    target = np.array(target, dtype='f')
                    target = np.expand_dims(target, axis=0)

                    # query
                    n_l2, d_l2 = index_l2.knnQueryBatch(target, k=1, num_threads=4)[0]
                    n_cos, d_cos = index_cos.knnQueryBatch(target, k=1, num_threads=4)[0]

                    # classify
                    if (d_l2[0] <= thld_l2) and (d_cos[0] <= thld_cos):
                        if n_l2[0] == n_cos[0]:
                            name = id_list['name'][n_l2[0]]
                            if name == id:
                                count['True'] += 1
                            else:
                                count['False'] += 1
                        else: 
                            count['Unknown'] += 1
                    else:
                        count['Unknown'] += 1
        except:
            break
    cap.release()

    results_3.loc[len(results_3)] = [
        id, 'resnet50', count['True'], count['False'], count['Unknown'], 
    ]
results_3

Unnamed: 0,ID,Model,True,False,Unknown
0,yeri,resnet50,292,56,65
1,seulgi,resnet50,213,0,6
2,irene,resnet50,145,85,25
3,wendy,resnet50,15,0,193
4,joy,resnet50,408,0,134


In [45]:
accuracy = np.sum(results_3['True'])/(np.sum(results_3['True'])+np.sum(results_3['False']))*100
print('Accuracy: {:0.2f} %'.format(accuracy))

with pd.ExcelWriter(output_path, engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:  
    results_3.to_excel(writer, sheet_name='threshold+DV', index=False)

Accuracy: 88.39 %


As expected, combining the methods will increase our accuracy even further.

## Conclusion

This notebook have shown how to improve accuracy, without the need for additional training. This is just my personal creative way to face the problems I have on my personal project, there are many other methods that you can experiment on too. The things to consider are the costs you're willing to compromise (time, computing power) and your dataset. I hope this can inspire your creativity!

\* all results are stored in [results.xlsx](./output/real_time_face_recognition/results.xlsx)