# Large Scale Face recognition

What if your dataset contains thousands of identities, with thousands of pictures for each of them? It will be impossible to compare their embeddings one-by-one to find the closest match. This notebook will show you how to use `NMSLIB`, a similarity search library based on approximate nearest neighbors (ANN).

source: https://github.com/nmslib/nmslib 

In [1]:
import cv2
import matplotlib.pyplot as plt
import keras_vggface as kv
import modules.utils as utils
from tensorflow.keras.models import save_model
import tensorflow as tf
from tensorflow.python.keras.engine import training
from tensorflow import keras
import os
from statistics import mean
import pandas as pd

In [2]:
# Declare a FacePreprocess instance.
from modules.FacePreprocess import FacePreprocess
ssd_model = r'./models/ssd/deploy.prototxt.txt'
ssd_weights = r'./models/ssd/res10_300x300_ssd_iter_140000.caffemodel'
processor = FacePreprocess(ssd_model, ssd_weights)

In [3]:
# Use the facial embedding model you want to use
model = kv.VGGFace(
    model='resnet50', 
    include_top=False, 
    input_shape=(224, 224, 3), 
    pooling='avg'
)
input_size = (224, 224)

## Compute Embeddings

Since we want to build a tree using the facial embeddings, let's compute embeddings of all the images in our dataset. 

In [8]:
embeddings = []
for id in os.listdir('./dataset/train/'):
    folder = './dataset/train/{}/'.format(id)
    for file in os.listdir(folder):
        try:
            filepath = os.path.join(folder, file)
            processed_img = processor.preproc(cv2.imread(filepath))[0][0]
            embedding = model.predict(utils.resize(processed_img, input_size), verbose=False)[0,:]

            embeddings.append([id, embedding])
        except:
            print('Failed to process/predict {}'.format(filepath))
print('='*10 + '\n{} images processed.'.format(len(embeddings)))

25 images processed.
