# Face detection and recognition inference pipeline

The following example illustrates how to use the `facenet_pytorch` python package to perform face detection and recogition on an image dataset using an Inception Resnet V1 pretrained on the VGGFace2 dataset.

The following Pytorch methods are included:
* Datasets
* Dataloaders
* GPU/CPU processing

In [2]:
from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
import numpy as np
import pandas as pd
import os
import cv2
import preprocess
from PIL import Image
workers = 0 if os.name == 'nt' else 4

#### Determine if an nvidia GPU is available

In [7]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# device = torch.device('cpu')
print('Running on device: {}'.format(device))

Running on device: cuda:0


#### Define MTCNN module

Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.

See `help(MTCNN)` for more details.

In [8]:
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)

#### Define Inception Resnet V1 module

Set classify=True for pretrained classifier. For this example, we will use the model to output embeddings/CNN features. Note that for inference, it is important to set the model to `eval` mode.

See `help(InceptionResnetV1)` for more details.

In [5]:
resnet = InceptionResnetV1(classify=True, pretrained='vggface2').eval().to(device)

#### Define a dataset and data loader

We add the `idx_to_class` attribute to the dataset to enable easy recoding of label indices to identity names later one.

In [7]:
def collate_fn(x):
    return x[0]

dataset = datasets.ImageFolder('D:\\dev\\project\\pythonProject\\temp_img')
dataset.idx_to_class = {i:c for c, i in dataset.class_to_idx.items()}
loader = DataLoader(dataset, collate_fn=collate_fn, num_workers=workers)

FileNotFoundError: Couldn't find any class folder in D:\dev\project\pythonProject\temp_img.

#### Perfom MTCNN facial detection

Iterate through the DataLoader object and detect faces and associated detection probabilities for each. The `MTCNN` forward method returns images cropped to the detected face, if a face was detected. By default only a single detected face is returned - to have `MTCNN` return all detected faces, set `keep_all=True` when creating the MTCNN object above.

To obtain bounding boxes rather than cropped face images, you can instead call the lower-level `mtcnn.detect()` function. See `help(mtcnn.detect)` for details.

In [6]:
aligned = []
names = []
folder_name = 'D:/dev/project/pythonProject/imgmoi/'
for (root, dirs, files) in os.walk(folder_name, topdown = True):
    for path in files:
        # preprocess.autorotate(folder_name + path)
        # preprocess.autoresize(folder_name + path)
        img = cv2.imread(folder_name + path)
        x_aligned, prob = mtcnn(img, return_prob=True)
        print(path, ' : ')
        if x_aligned is not None:
            print('Face detected with probability: {:8f}'.format(prob))
            aligned.append(x_aligned)
            names.append(path)
        else:
            pass
            # print("bro wtf") 

0332051368_10.jpg  : 
Face detected with probability: 0.999704
0332051368_11.jpg  : 
Face detected with probability: 0.999991
0332051368_6.jpg  : 
Face detected with probability: 0.999772
0332051368_7.jpg  : 
Face detected with probability: 0.999470
0332051368_8.jpg  : 
Face detected with probability: 0.992151
0359505511_1.jpg  : 
Face detected with probability: 0.998771
0359505511_10.jpg  : 
Face detected with probability: 0.999953
0359505511_11.jpg  : 
Face detected with probability: 0.999778
0359505511_12.jpg  : 
Face detected with probability: 0.998244
0359505511_13.jpg  : 
Face detected with probability: 0.999840
0359505511_14.jpg  : 
Face detected with probability: 0.999946
0359505511_15.jpg  : 
Face detected with probability: 0.999958
0359505511_16.jpg  : 
0359505511_17.jpg  : 
Face detected with probability: 0.999637
0359505511_18.jpg  : 
Face detected with probability: 0.999854
0359505511_19.jpg  : 
Face detected with probability: 0.999747
0359505511_2.jpg  : 
Face detected wi

#### Calculate image embeddings

MTCNN will return images of faces all the same size, enabling easy batch processing with the Resnet recognition module. Here, since we only have a few images, we build a single batch and perform inference on it. 

For real datasets, code should be modified to control batch sizes being passed to the Resnet, particularly if being processed on a GPU. For repeated testing, it is best to separate face detection (using MTCNN) from embedding or classification (using InceptionResnetV1), as calculation of cropped faces or bounding boxes can then be performed a single time and detected faces saved for future use.

In [7]:
aligned = torch.stack(aligned).to(device)
embeddings = resnet(aligned).detach().cpu()

In [10]:
print(folder_name + names[0])
cv2.imshow('pls', cv2.imread(folder_name + names[100]))
cv2.waitKey(0)
cv2.destroyAllWindows()

D:/dev/project/pythonProject/imgmoi/0332051368_10.jpg


#### Print distance matrix for classes

In [12]:
dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]
#print(pd.DataFrame(dists, columns=names, index=names))
for threshold in np.arange(0.6, 0.9, 0.02):
    trupos = truneg = falpos = falneg = 0
    cnt0 = cnt1 = 0
    for a in names:
        for b in names:
            st = a.split("_")[0]
            nd = b.split("_")[0]
            if dists[cnt0][cnt1] <= threshold and st == nd:
                trupos += 1
            elif dists[cnt0][cnt1] > threshold and st != nd:
                truneg += 1
            elif dists[cnt0][cnt1] <= threshold and st != nd:
                falpos += 1
            else:
                falneg += 1
            cnt1+=1
        cnt1 = 0
        cnt0 += 1
    print("Threhold:", threshold)
    print("True positive: ", trupos)
    print("True negative: ", truneg)
    print("False positive: ", falpos)
    print("False negative: ", falneg, "\n")

Threhold: 0.6
True positive:  3327
True negative:  56374
False positive:  18
False negative:  2282 

Threhold: 0.62
True positive:  3587
True negative:  56354
False positive:  38
False negative:  2022 

Threhold: 0.64
True positive:  3869
True negative:  56302
False positive:  90
False negative:  1740 

Threhold: 0.66
True positive:  4143
True negative:  56248
False positive:  144
False negative:  1466 

Threhold: 0.68
True positive:  4375
True negative:  56168
False positive:  224
False negative:  1234 

Threhold: 0.7000000000000001
True positive:  4581
True negative:  56022
False positive:  370
False negative:  1028 

Threhold: 0.7200000000000001
True positive:  4753
True negative:  55824
False positive:  568
False negative:  856 

Threhold: 0.7400000000000001
True positive:  4911
True negative:  55548
False positive:  844
False negative:  698 

Threhold: 0.7600000000000001
True positive:  5049
True negative:  55198
False positive:  1194
False negative:  560 

Threhold: 0.78000000000

In [5]:
help(InceptionResnetV1)

Help on class InceptionResnetV1 in module facenet_pytorch.models.inception_resnet_v1:

class InceptionResnetV1(torch.nn.modules.module.Module)
 |  InceptionResnetV1(pretrained=None, classify=False, num_classes=None, dropout_prob=0.6, device=None)
 |  
 |  Inception Resnet V1 model with optional loading of pretrained weights.
 |  
 |  Model parameters can be loaded based on pretraining on the VGGFace2 or CASIA-Webface
 |  datasets. Pretrained state_dicts are automatically downloaded on model instantiation if
 |  requested and cached in the torch cache. Subsequent instantiations use the cache rather than
 |  redownloading.
 |  
 |  Keyword Arguments:
 |      pretrained {str} -- Optional pretraining dataset. Either 'vggface2' or 'casia-webface'.
 |          (default: {None})
 |      classify {bool} -- Whether the model should output classification probabilities or feature
 |          embeddings. (default: {False})
 |      num_classes {int} -- Number of output classes. If 'pretrained' is s