# Face detectors

In [1]:
from imutils.video import VideoStream
import imutils
import time
import cv2
import dlib
import numpy as np

## 1. Haar Cascade

OpenCV’s Haar cascade face detector is the original face detector that shipped with the library. It’s also the face detector that is familiar to almost everyone.

**Pros:**

- Very fast, capable of running in super real-time
- Low computational requirements — can easily be run on embedded, resource-constrained devices such as the Raspberry Pi (RPi), NVIDIA Jetson Nano, and Google Coral
- Small model size (just over 400KB; for reference, most deep neural networks will be anywhere between 20-200MB).

**Cons:**

- Highly prone to false-positive detections
- Typically requires manual tuning to the detectMultiScale function
- Not anywhere near as accurate as its HOG + Linear SVM and deep learning-based face detection counterparts

In [None]:
#Initialize the model
#Remember to use the full path
detector = cv2.CascadeClassifier('./haarcascade_frontalface_default.xml')

In [None]:
#Read the image, resizezz it and recolor it
image = cv2.imread('./messi.png')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

Let’s take a look at what each of these arguments means:

**scaleFactor:** 

How much the image size is reduced at each image scale. This value is used to create the scale pyramid. To detect faces at multiple scales in the image (some faces may be closer to the foreground, and thus be larger, other faces may be smaller and in the background, thus the usage of varying scales). **A value of 1.05 indicates that we are reducing the size of the image by 5% at each level in the pyramid.**


**minNeighbors:** 

How many neighbors each window should have for the area in the window to be considered a face. The cascade classifier will detect multiple windows around a face. **This parameter controls how many rectangles (neighbors) need to be detected for the window to be labeled a face.**


**minSize:** 

A tuple of width and height (in pixels) **indicating the window’s minimum size**. **Bounding boxes smaller than this size are ignored.** It is a good idea to start with (30, 30) and fine-tune from there.

In [None]:
#Run the model
rects = detector.detectMultiScale(gray, scaleFactor=1.05,
	            minNeighbors=5, minSize=(30, 30),
	            flags=cv2.CASCADE_SCALE_IMAGE)

In [None]:
#get the coordinates of the bounding box and draw it
for (x, y, w, h) in rects:
    cv2.rectangle(image, (x,y), (x+w, y+h), (0,255,0), 2)

In [None]:
cv2.imshow('Image', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### On video

In [None]:
vs = VideoStream(0).start()
time.sleep(2.0)

In [None]:
while True:
    frame  = vs.read()
    frame = imutils.resize(frame, width=500)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

	# perform face detection
    rects = detector.detectMultiScale(gray, scaleFactor=1.05,
    minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)

	# loop over the bounding boxes
    for (x, y, w, h) in rects:
		# draw the face bounding box on the image
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

	# show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

## CV2 DNN face detection

OpenCV’s deep learning face detector is based on a Single Shot Detector (SSD) with a small ResNet backbone, allowing it to be both accurate and fast.

**Pros:**

- Accurate face detector
- Utilizes modern deep learning algorithms
- No parameter tuning required
- Can run in real-time on modern laptops and desktops
- Model is reasonably sized (just over 10MB)
- Relies on OpenCV’s cv2.dnn module
- Can be made faster on embedded devices by using OpenVINO and the Movidius NCS

**Cons:**

- More accurate than Haar cascades and HOG + Linear SVM, but not as accurate as dlib’s CNN MMOD face detector
- May have unconscious biases in the training set — may not detect darker-skinned people as accurately as lighter-skinned people

In [None]:
#Initialize the model
#We need to point the path to a prototxt and caffe model files
net = cv2.dnn.readNetFromCaffe('./res10_300x300_ssd_iter_140000.caffemodel')

In [None]:
#The dnn.blobFromImage takes care of pre-processing which includes setting the blob dimensions and normalization.
image = cv2.imread('./iron_chic.jpg')
(h,w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300,300)), 1.0, (300,300), (104.0, 177.0,123.0))

In [None]:
#pass the blob through the network and obtain the detections and predictions
net.setInput(blob)
detections = net.forward()

In [None]:
#loop over the obtained detections and draw them
for i in range(0, detections.shape[2]):
    #extract the coordinates where the confidence (probability) associated with the prediction
    confidence = detections[0, 0, i, 2]

    if confidence > 0.5: #set threshold
        #compute the x, y coordinates if the bounding box
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")

        #Draw the bounding box
        text = '{:.2f}'.format(confidence*100)
        #set the value of y to locate the text on the rectangle
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(image, (startX, startY), (endX, endY), (0, 0, 255), 2)
        cv2.putText(image, text, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0,0.255), 2)
    
    cv2.imshow("output", image)
cv2.waitKey(0)
cv2.destroyAllWindows()


## 3. DLIB HOG + Linear SVM

Similar to Haar cascades, HOG + Linear SVM relies on image pyramids and sliding windows to detect objects/faces in an image.

The algorithm is a classic in computer vision literature and is still used today.

**Pros:**

- More accurate than Haar cascades
- More stable detection than Haar cascades (i.e., fewer parameters to tune)
- Extremely well documented, both in terms of the dlib implementation and the HOG + Linear SVM framework in the computer vision literature

**Cons:**

- Only works on frontal views of the face — profile faces will not be detected as the HOG descriptor does not tolerate changes in rotation or viewing angle well
- Requires an additional library (dlib) be installed — not necessarily a problem per se, but if you’re using just OpenCV, then you may find adding another library into the mix cumbersome
- Not as accurate as deep learning-based face detectors
- For the accuracy, it’s actually quite computationally expensive due to image pyramid construction, sliding windows, and computing HOG features at every stop of the window

In [2]:
#First is necessary to define a helper function: convert the resulting dlib rectangle objects to bounding boxes,
# then ensure the bounding boxes are all within the bounds of the
# input image

def convert_and_trim_bb(image, rect):
	# extract the starting and ending (x, y)-coordinates of the bounding box
	startX = rect.left()
	startY = rect.top()
	endX = rect.right()
	endY = rect.bottom()

	# ensure the bounding box coordinates fall within the spatial dimensions of the image
	startX = max(0, startX)
	startY = max(0, startY)
	endX = min(endX, image.shape[1])
	endY = min(endY, image.shape[0])

	# compute the width and height of the bounding box
	w = endX - startX
	h = endY - startY

	# return our bounding box coordinates
	return (startX, startY, w, h)

In [3]:
#Initialize the detector
detector = dlib.get_frontal_face_detector()

In [4]:
#read, resize and recolor the image
image = cv2.imread('./images/concert.jpg')
image = imutils.resize(image, width=600)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

In [5]:
#perform face detection
rects = detector(rgb, 1 ) # 1 is the number of times to upsample

In [6]:
#obtain the boxes coordinates from the face detection
boxes = [convert_and_trim_bb(image, r) for r in rects]

#Draw the bb on the image
for (x, y, w, h) in boxes:
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow("image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

## 4. Dlib’s CNN face detector

Davis King, the creator of dlib, trained a CNN face detector based on his work on max-margin object detection. The method is highly accurate

Without GPU acceleration, this model cannot realistically run in real-time.

**Pros:**

- Incredibly accurate face detector
- Small model size (under 1MB)
- Expertly implemented and documented

**Cons:**

- Requires an additional library (dlib) be installed
- Code is more verbose — end-user must take care to convert and trim bounding box coordinates if using OpenCV
- Cannot run in real-time without GPU acceleration
- Not out-of-the-box compatible for acceleration via OpenVINO, Movidius NCS, NVIDIA Jetson Nano, or Google Coral

In [8]:
#instead of using get_frontal_face_detector() we use cnn_face_detection_model_v1() and add the model path

#Initialize the detector
detector = dlib.cnn_face_detection_model_v1('./mmod_human_face_detector.dat')

In [9]:
#read, resize and recolor the image
image = cv2.imread('./images/concert.jpg')
image = imutils.resize(image, width=600)
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

In [10]:
#perform face detection. This takes some time (at least compared to dlibs HOG+)
rects = detector(rgb, 1 ) # 1 is the number of times to upsample

In [12]:
#obtain the boxes coordinates from the face detection
boxes = [convert_and_trim_bb(image, r.rect) for r in rects]

#Draw the bb on the image
for (x, y, w, h) in boxes:
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow("image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()