# Face Recognition (Part 3: VGG-Face)

In this tutorial, we are going to use a pre-trained VGG-Face network to recognize the faces of celebrities. The VGG-Face is a deep-CNN which has been trained using Softmax loss to recognize faces of 2,622 celebrity identities. We will use face images of a random subset of 10 celebrities, out of the 2,622.

Let's go through these step-by-step.

# Image Classification

In this experiment, we shall see:

- **1. VGG-Face (deep CNN) for Face Recognition**
    - using a pretrained deep CNN for Face Recognition


- **2. Feature Extraction**:
    - extracting deep features 
    
Let us go through these step-by-step.

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import math

import torch
from torch.utils.serialization import load_lua
from torch.legacy import nn

DATA_ROOT = "/tmp/data/lab3"

# Recap of Part 1 (Preprocessing)

In Part 1, we understood the data, and split it into train, val, and test. We then manipulated it so that the data is of uniform size, normalized, and mean subtracted. (Not that these operations can be performed on any data, not just images). We then saved the final datasets as "data.npz" file.

However, we will not be using that same dataset here. Reasons:
- The old dataset has just 120 training images (12 per class) which is very less for a deep-CNN architecture
- The VGG-Face network takes RGB images as input
- the input dimension should be: $(num\_class * n\_train\_per\_class)\times3\times224\times224$
- Other preprocessing steps like resizing, normalization, mean subtraction will still be employed though.

So, here we will load a small test dataset (**test_data_vgg_face.npz**) which has 10 random celebs (1 image per celeb) and perform the necessary preprocessing steps.

In [None]:
data = np.load(DATA_ROOT+"/test_data_vgg_face.npz")
image = data["images"][0]
label = data["labels"][0]
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.title(label)
plt.axis("off")
plt.show()
print("all images in test data = ",data["labels"])

However, can we simply perform a resize operation on our input image and convert it to the appropriate size? Performing a simple resize operation on arbitrary sized input images may result in a change in the aspect ratio, thereby making the face image look distorted.

To avoid that, we perform the following sequence of operations
- Resize the image such that the smaller dimension (out of height and width) is 256 and the aspect ratio remains the same.
- Crop a 224 $\times$ 224 region from the center of the resized image.

In [None]:
def optimized_resize(inputImg, targetH, targetW):
    inpH, inpW = inputImg.shape[0], inputImg.shape[1]
    # re-scale the smaller dim (among width, height) to refSize
    refSize = 256
    if inpW < inpH:
        resizedImg = cv2.resize(inputImg, (refSize, int(refSize*inpH/inpW)))
    else:
        resizedImg = cv2.resize(inputImg, (int(refSize*inpW/inpH), refSize))

    # center-crop
    iH, iW = resizedImg.shape[0], resizedImg.shape[1]
    anchorH, anchorW = int(math.ceil((iH - targetH)/2)), int(math.ceil((iW - targetW) / 2))
    croppedImg = resizedImg[anchorH:anchorH+targetH, anchorW:anchorW+targetW]
    return croppedImg


print("old dimensions:",image.shape)
naive_resized_image = cv2.resize(image, (224, 224))
opt_resized_image = optimized_resize(image, 224, 224)

plt.figure(figsize=(20,20))
plt.subplot(131)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.title("original image")
plt.axis("off")

plt.subplot(132)
plt.imshow(cv2.cvtColor(naive_resized_image, cv2.COLOR_BGR2RGB))
plt.title("naive resize")
plt.axis("off")

plt.subplot(133)
plt.imshow(cv2.cvtColor(opt_resized_image, cv2.COLOR_BGR2RGB))
plt.title("resize+crop")
plt.axis("off")

plt.show()
print("dimension after resizing:", opt_resized_image.shape)

## change dimension from 224x224x3 to 3x224x224
r, g, b = opt_resized_image[:, :, 0], opt_resized_image[:, :, 1], opt_resized_image[:, :, 2]
final_image = np.empty([3, opt_resized_image.shape[0], opt_resized_image.shape[1]])
final_image[0], final_image[1], final_image[2] = r, g, b
print("new_dimensions:",final_image.shape)

## Calculated mean from trained data (available for VGG-Face)
trainingMean = [129.1863, 104.7624, 93.5940]
for i in range(3): final_image[i] = final_image[i] - trainingMean[i]

print(final_image.shape, label)

# 1. VGG-Face (Deep CNN for face recognition)

Here, we will use a pretrained VGG Face model to generate predictions for the test images.

In [None]:
# load pre-trained VGG-Face network
vggFace = load_lua(DATA_ROOT+"/VGG_FACE_pyTorch_small.t7")
vggFace.modules[31] = nn.View(1, 25088)
print(vggFace)

As you can see, there are 40 layers in total. The architecture can be divided into 5 Convolutional blocks followed by 2 fc layers and a classification layer. Each convolutional block consists of multiple Conv+ReLU layers followed by a pooling layer.Let us now send our pre-processed input image through the network i.e. we are going to do a forward pass through the pre-trained VGG-Face network.

In [None]:
# forward pass
input = torch.Tensor(final_image).unsqueeze(0)
output = vggFace.forward(input)
output = output.cpu().numpy()

print(output.shape)

The network has returned 2,622 entries corresponding to each test image. These values are the normalized log probabilities of our input face image belonging to each of the 2,622 celebrity IDs in the training set. So, in order to know which ID is the most likely (as per the network), we figure out the one which has the maximum probability.

To get the name of the ID from the index, we will use the list of ID names present in **names.txt**. As you can guess, the order of names in the list is important and serves as a mapping between the index number and the name.

In [None]:
 def getNameList(filePath):
    names = []
    with open(filePath) as f:
        names = [ line.strip() for line in f ]
    return names

idNames = getNameList(DATA_ROOT+"/names.txt")
print("predicted_label = ", idNames[np.argmax(output[0])], "\ntrue_label = ", label,"\n" )

# 2. FEATURE EXTRACTION (VGG-Face)

## Deep features

We have seen how to use the pre-trained net to make predictions (face recognition). The VGG-Face net can also be used as a fixed feature extractor for face images. We simply ignore the outputs of the classification layer and take the output of the last fc-layer instead.

In [None]:
vggFeatures = vggFace.modules[35].output.cpu().numpy()
print(vggFeatures.shape)
print(np.linalg.norm(vggFeatures))

The 4096-d face features that we obtain in this fashion can be used with the classifiers that we used in the previous tutorials for a variety of face-related tasks.

## Exercises

5. Modify the code to support bacth mode of operation -- get top-k predictions for multiple face images at once.
6. Along with the predicted ID name, also visualize the probability of prediction.