# Face Detection using Sliding Window

After the classifier was trained, the next step was to use it for face detection. We used the sliding window to create an image pyramid, which was used to extract the region of interest (ROI) from the image. The ROI was then resized to 36x36 pixels and fed into the classifier. The classifier then outputs the probability of the ROI being a face. The ROI was then classified as a face if the probability was greater than 0.95. Non-maximum suppression was then used to remove overlapping bounding boxes.

## Result

The bounding boxes are not perfect fit as its size is limited to a 36x36 square, but they were right on the faces. There were many false positive detections, which we were able to remove by reducing the threshold of the non-maxima suppression function, or by reducing the size of the original image. Having the NMS threshold at 5% with the image width of 1500 pixels gave us the same result as having the NMS threshold at 10% with the image width of 1200 pixels.

![10%, 1500x1000](https://paapuruhoshi.s-ul.eu/H5TRW5s1)
<div align="center">10% threshold, 1500x1000 image</div>

![5%, 1500x1000](https://paapuruhoshi.s-ul.eu/OQElaFCn)
<div align="center">5% threshold, 1500x1000</div>

![10%, 1200x800](https://paapuruhoshi.s-ul.eu/jvi9KBm3)
<div align="center">10% threshold, 1200x800</div>


In [91]:
import torch
import torchvision.transforms as transforms
import torchvision.ops as ops
from net import Net
import cv2
import warnings
from sliding_window import sliding_window
from image_pyramid import image_pyramid
import torch.nn.functional as F
import imutils

#suppress warning
warnings.filterwarnings('ignore')

In [92]:
face_detection_net = Net()
face_detection_net.load_state_dict(torch.load("./saved_model.pth"))

<All keys matched successfully>

In [93]:
# initialize variables used for the object detection procedure
MAX_WIDTH = 1200
PYR_SCALE = 1.5
WINDOW_STEP = 16
ROI_SIZE = (128,128)
INPUT_SIZE = (36,36)

In [94]:
transform = transforms.Compose([ 
    transforms.ToPILImage(),
    transforms.ToTensor(),    
    transforms.Resize(INPUT_SIZE)
])

In [95]:
rois = []
locs = []

#image_path = "./image_face_detection/0000bee39176697a.jpg"
image_path = './face_detection_images/detection_test2.jpg'
original_image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

if original_image.shape[1] > MAX_WIDTH:
	original_image = imutils.resize(original_image, width=MAX_WIDTH)

(H, W) = original_image.shape[:2]

pyramid = image_pyramid(original_image, scale=PYR_SCALE, min_size=ROI_SIZE)

for image in pyramid:
    # determine the scale factor between the *original* image
    # dimensions and the *current* layer of the pyramid
    scale = W / float(image.shape[1])
    
    # for each layer of the image pyramid, loop over the sliding
    # window locations
    for (x, y, roiOrig) in sliding_window(image, WINDOW_STEP, ROI_SIZE):
        # scale the (x, y)-coordinates of the ROI with respect to the
        # *original* image dimensions
        x = int(x * scale)
        y = int(y * scale)
        w = int(ROI_SIZE[0] * scale)
        h = int(ROI_SIZE[1] * scale)
        # take the ROI and preprocess it so we can later classify the region 

        roi_tensor_gray = transform(roiOrig)

        # update our list of ROIs and associated coordinates
        rois.append(roi_tensor_gray)
        locs.append((x, y, x + w, y + h))


In [96]:
stacked_tensor = torch.stack(rois, dim=0)

print(stacked_tensor.size())

output = face_detection_net(stacked_tensor)

torch.Size([4415, 1, 36, 36])


In [97]:
probs = F.softmax(output, dim=1)
probs_list = probs.tolist()

labels = {'valid_probs': [],
          'boxes': []}

In [98]:
for i in range(0,len(probs_list)):
    if probs_list[i][1] >= 0.95:
        box = locs[i]

        labels['valid_probs'].append(probs_list[i][1])
        labels['boxes'].append(box)

tensor_boxes = torch.Tensor(labels['boxes'])
print(tensor_boxes)
tensor_probs = torch.Tensor(labels['valid_probs'])

valid_box = ops.nms(tensor_boxes, tensor_probs, iou_threshold=0.1)

tensor([[ 544.,  192.,  672.,  320.],
        [ 560.,  192.,  688.,  320.],
        [ 560.,  208.,  688.,  336.],
        [ 496.,  224.,  624.,  352.],
        [ 512.,  224.,  640.,  352.],
        [ 528.,  224.,  656.,  352.],
        [ 544.,  224.,  672.,  352.],
        [ 560.,  224.,  688.,  352.],
        [ 496.,  240.,  624.,  368.],
        [ 512.,  240.,  640.,  368.],
        [ 528.,  240.,  656.,  368.],
        [ 544.,  240.,  672.,  368.],
        [ 560.,  240.,  688.,  368.],
        [  32.,  256.,  160.,  384.],
        [ 496.,  256.,  624.,  384.],
        [ 512.,  256.,  640.,  384.],
        [ 528.,  256.,  656.,  384.],
        [ 544.,  256.,  672.,  384.],
        [   0.,  272.,  128.,  400.],
        [  16.,  272.,  144.,  400.],
        [  32.,  272.,  160.,  400.],
        [  48.,  272.,  176.,  400.],
        [ 528.,  272.,  656.,  400.],
        [ 544.,  272.,  672.,  400.],
        [ 768.,  272.,  896.,  400.],
        [ 784.,  272.,  912.,  400.],
        [ 80

In [99]:
print(labels['boxes'][1])

print(valid_box)

(560, 192, 688, 320)
tensor([ 8, 20, 42, 25, 40, 45])


In [100]:
 
# Blue color in BGR 
color = (255, 0, 0) 
  
# Line thickness of 2 px
thickness = 2

img_color = cv2.imread(image_path)

if img_color.shape[1] > MAX_WIDTH:
	img_color = imutils.resize(img_color, width=MAX_WIDTH)
  
# Using cv2.rectangle() method 
# Draw a rectangle with blue line borders of thickness of 1 px 
for index in valid_box:
    box = labels['boxes'][index]
    (x,y,z,t) = box
    cv2.rectangle(img_color, (x,y), (z,t), color, thickness)

cv2.imshow('image', img_color)

# add wait key. window waits until user presses a key
cv2.waitKey(0)
# and finally destroy/close all open windows
cv2.destroyAllWindows()

KeyboardInterrupt: 