# Face Detection - OpenCV, Dlib and Deep Learning
# 1. Introduction
In this tutorial, we will discuss the various Face Detection methods in OpenCV and Dlib and compare the methods quantitatively. We will share code in C++ and Python for the following Face Detectors :

1. Haar Cascade Face Detector in OpenCV
2. Deep Learning based Face Detector in OpenCV
3. HoG Face Detector in Dlib
4. Deep Learning based Face Detector in Dlib

We will not go into the theory of any of them and only discuss their usage. We will also share some rules of thumb on which model to prefer according to your application.

# 2. Haar Cascade Face Detector in OpenCV
Haar Cascade based Face Detector was the state-of-the-art in Face Detection for many years since 2001, when it was introduced by Viola and Jones. There has been many improvements in the recent years. OpenCV has many Haar based models which can be found [here](https://github.com/opencv/opencv/tree/master/data/haarcascades).

In [16]:
from matplotlib import pyplot as plt
import numpy as np
import cv2

face_cascade = cv2.CascadeClassifier('models/haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('models/haarcascade_eye.xml')

img = cv2.imread('data/MU.jpg', cv2.IMREAD_COLOR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)

for (x,y,w,h) in faces:
    img = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    roi_gray = gray[y:y+h, x:x+w]
    roi_color = img[y:y+h, x:x+w]
    eyes = eye_cascade.detectMultiScale(roi_gray)
    for (ex,ey,ew,eh) in eyes:
        cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)

cv2.imshow("Face Detection Comparison", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The above code snippet loads the haar cascade model file and applies it to a grayscale image. the output is a list containing the detected faces. Each member of the list is again a list with 4 elements indicating the (x, y) coordinates of the top-left corner and the width and height of the detected face.

## Pros
1. Works almost real-time on CPU
2. Simple Architecture
3. Detects faces at different scales

## Cons
1. The major drawback of this method is that it gives a lot of False predictions.
2. Doesn’t work on non-frontal images.
3. Doesn’t work under occlusion

# 3. DNN Face Detector in OpenCV
This model was included in OpenCV from version 3.3. It is based on [**Single-Shot-Multibox detector**](https://arxiv.org/abs/1512.02325) and uses **ResNet-10** Architecture as backbone. The model was trained using images available from the web, but the source is not disclosed. OpenCV provides 2 models for this face detector.

* Floating point 16 version of the original caffe implementation ( 5.4 MB )
* 8 bit quantized version using Tensorflow ( 2.7 MB )

We have included both the models along with the code.

In [14]:
import numpy as np
import cv2

modelFile = "models/res10_300x300_ssd_iter_140000.caffemodel"
configFile = "models/deploy.prototxt"

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(configFile, modelFile)

# load the input image and construct an input blob for the image
# by resizing to a fixed 300x300 pixels and then normalizing it
image = cv2.imread("data/MU.jpg")
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0,
    (300, 300), (104.0, 177.0, 123.0))

# pass the blob through the network and obtain the detections and
# predictions
print("[INFO] computing object detections...")
net.setInput(blob)
detections = net.forward()

# loop over the detections
for i in range(0, detections.shape[2]):
    # extract the confidence (i.e., probability) associated with the
    # prediction
    confidence = detections[0, 0, i, 2]

    # filter out weak detections by ensuring the `confidence` is
    # greater than the minimum confidence
    if confidence > 0.2:
        # compute the (x, y)-coordinates of the bounding box for the
        # object
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")
 
        # draw the bounding box of the face along with the associated
        # probability
        text = "{:.2f}%".format(confidence * 100)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(image, (startX, startY), (endX, endY),
            (0, 0, 255), 2)
        cv2.putText(image, text, (startX, y),
            cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

# show the output image
cv2.imshow("Output", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

[INFO] loading model...
[INFO] computing object detections...


In the above code, the image is converted to a blob and passed through the network using the `forward()` function. The output detections is a 4-D matrix, where

* The 3rd dimension iterates over the detected faces. (i is the iterator over the number of faces)
* The fourth dimension contains information about the bounding box and score for each face. For example, `detections[0,0,0,2]` gives the confidence score for the first face, and `detections[0,0,0,3:6]` give the bounding box.

The output coordinates of the bounding box are normalized between [0,1]. Thus the coordinates should be multiplied by the height and width of the original image to get the correct bounding box on the image.

## Pros
The method has the following merits :

1. Most accurate out of the four methods
2. Runs at real-time on CPU
3. Works for different face orientations – up, down, left, right, side-face etc.
4. Works even under substantial occlusion
5. Detects faces across various scales ( detects big as well as tiny faces )

The DNN based detector overcomes all the drawbacks of Haar cascade based detector, without compromising on any benefit provided by Haar. We could not see any major drawback for this method except that it is slower than the Dlib HoG based Face Detector discussed next.

# 4. HoG Face Detector in Dlib
This is a widely used face detection model, based on HoG features and SVM. You can read more about HoG in [this post](https://www.learnopencv.com/histogram-of-oriented-gradients/). The model is built out of 5 HOG filters – front looking, left looking, right looking, front looking but rotated left, and a front looking but rotated right. The model comes embedded in the [header file](https://github.com/davisking/dlib/blob/master/dlib/image_processing/frontal_face_detector.h) itself.

The dataset used for training, consists of 2825 images which are obtained from LFW dataset and manually annotated by Davis King, the author of Dlib.

In [13]:
import cv2
import dlib

image = cv2.imread("data/MU.jpg")
hogFaceDetector = dlib.get_frontal_face_detector()
faceRects = hogFaceDetector(image, 0)
for faceRect in faceRects:
    x1 = faceRect.left()
    y1 = faceRect.top()
    x2 = faceRect.right()
    y2 = faceRect.bottom()
    cv2.rectangle(image, (x1, y1), (x2, y2),(0, 0, 255), 2)
cv2.imshow("image",image)
cv2.waitKey(0)
cv2.destroyAllWindows()

In the above code, we first load the face detector. Then we pass it the image through the detector. The second argument is the number of times we want to upscale the image. The more you upscale, the better are the chances of detecting smaller faces. However, upscaling the image will have substantial impact on the computation speed. The output is in the form of a list of faces with the (x, y) coordinates of the diagonal corners.

## Pros
1. Fastest method on CPU
2. Works very well for frontal and slightly non-frontal faces
3. Light-weight model as compared to the other three.
4. Works under small occlusion

Basically, this method works under most cases except a few as discussed below.

## Cons
1. The major drawback is that it does not detect small faces as it is trained for minimum face size of 80×80. Thus, you need to make sure that the face size should be more than that in your application. You can however, train your own face detector for smaller sized faces.
2. The bounding box often excludes part of forehead and even part of chin sometimes.
3. Does not work very well under substantial occlusion
4. Does not work for side face and extreme non-frontal faces, like looking down or up.

# 4. CNN Face Detector in Dlib
This method uses a **Maximum-Margin Object Detector** ( [MMOD](https://arxiv.org/pdf/1502.00046.pdf) ) with CNN based features. The training process for this method is very simple and you don’t need a large amount of data to train a custom object detector. For more information on training, visit [the website](http://blog.dlib.net/2016/10/easily-create-high-quality-object.html).

The model can be downloaded from the [dlib-models repository](https://github.com/davisking/dlib-models).
It uses a dataset manually labeled by its Author, Davis King, consisting of images from various datasets like ImageNet, PASCAL VOC, VGG, WIDER, Face Scrub. It contains 7220 images.

In [8]:
import dlib
import cv2

model_path = "models/mmod_human_face_detector.dat"
cnn_face_detector = dlib.cnn_face_detection_model_v1(model_path)
img = cv2.imread("data/MU.jpg")
# The 1 in the second argument indicates that we should upsample the image
# 1 time.  This will make everything bigger and allow us to detect more faces.
dets = cnn_face_detector(img, 1)
print("Number of faces detected: {}".format(len(dets)))
for i, d in enumerate(dets):
    x1 = d.rect.left()
    y1 = d.rect.top()
    x2 = d.rect.right()
    y2 = d.rect.bottom()
    print("Detection {}: Left: {} Top: {} Right: {} Bottom: {} Confidence: {}".format(
        i, d.rect.left(), d.rect.top(), d.rect.right(), d.rect.bottom(), d.confidence))
    cv2.rectangle(img, (x1, y1), (x2, y2),(0, 0, 255), 2) 

rects = dlib.rectangles()
rects.extend([d.rect for d in dets])

cv2.imshow("image",img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Number of faces detected: 4
Detection 0: Left: 257 Top: 157 Right: 296 Bottom: 197 Confidence: 1.1261976957321167
Detection 1: Left: 359 Top: 37 Right: 416 Bottom: 94 Confidence: 1.114716649055481
Detection 2: Left: 462 Top: 55 Right: 519 Bottom: 112 Confidence: 1.0968915224075317
Detection 3: Left: 146 Top: 89 Right: 202 Bottom: 146 Confidence: 1.088628888130188


The code is similar to the HoG detector except that in this case, we load the cnn face detection model. Also, the coordinates are present inside a rect object.

## Pros
1. Works for different face orientations
2. Robust to occlusion
3. Works very fast on GPU
4. Very easy training process

## Cons
1. Very slow on CPU
2. Does not detect small faces as it is trained for minimum face size of 80×80. Thus, you need to make sure that the face size should be more than that in your application. You can however, train your own face detector for smaller sized faces.
3. The bounding box is even smaller than the HoG detector.

# 5. Accuracy Comparison
I tried to evaluate the 4 models using the FDDB dataset using [the script used for evaluating the OpenCV-DNN model](https://github.com/opencv/opencv/blob/master/modules/dnn/misc/face_detector_accuracy.py). However, I found surprising results. *Dlib* had worse numbers than *Haar*, although visually dlib outputs look much better. Given below are the Precision scores for the 4 methods.

![](https://www.learnopencv.com/wp-content/uploads/2018/10/face-detection-coco-comparison.jpg)

Where,
* AP_50 = Precision when overlap between Ground Truth and predicted bounding box is at least 50% ( IoU = 50% )
* AP_75 = Precision when overlap between Ground Truth and predicted bounding box is at least 75% ( IoU = 75% )
* AP_Small = Average Precision for small size faces ( Average of IoU = 50% to 95% )
* AP_medium = Average Precision for medium size faces ( Average of IoU = 50% to 95% )
* AP_Large = Average Precision for large size faces ( Average of IoU = 50% to 95% )
* mAP = Average precision across different IoU ( Average of IoU = 50% to 95% )

On closer inspection I found that this evaluation is not fair for Dlib.

## 5.1. Evaluating accuracy the wrong way!
According to my analysis, the reasons for lower numbers for dlib are as follows :

1. The major reason is that dlib was trained using standard datasets BUT, without their annotations. The images were annotated by its author. Thus, I found that even when the faces are detected, **the bounding boxes are quite different** than that of Haar or OpenCV-DNN. They were smaller and often clipped parts of forehead and chin as shown below.

![](https://www.learnopencv.com/wp-content/uploads/2018/10/fd-acc-result3-e1539872783684.jpg)

This can be further explained from the AP_50 and AP_75 scores in the above graph. AP_X means precision when there is X% overlap between ground truth and detected boxes. The AP_75 scores for dlib models are 0 although AP_50 scores are higher than that of Haar. This only means that the Dlib models are able to detect **more faces** than that of Haar, but the smaller bounding boxes of dlib lower their AP_75 and other numbers.

2. The second reason is that dlib is unable to detect small faces which further drags down the numbers.

# 6. Speed Comparison
We used a 300×300 image for the comparison of the methods. The MMOD detector can be run on a GPU, but the support for NVIDIA GPUs in OpenCV is still not there. So, we evaluate the methods on CPU only and also report result for MMOD on GPU as well as CPU.

Hardware used:
* Processor : Intel Core i7 6850K – 6 Core
* RAM : 32 GB
* GPU : NVIDIA GTX 1080 Ti with 11 GB RAM
* OS : Linux 16.04 LTS
* Programming Language : Python

We run each method 10000 times on the given image and take 10 such iterations and average the time taken. Given below are the results.

![](https://www.learnopencv.com/wp-content/uploads/2018/10/face-detection-speed-comparison.jpg)

As you can see that for the image of this size, **all the methods perform in real-time**, except MMOD. **MMOD detector is very fast on a GPU but is very slow on a CPU**.

It should also be noted that these numbers can be different on different systems.

# 7. Comparison under various conditions
Apart from accuracy and speed, there are some other factors which help us decide which one to use. In this section we will compare the methods on the basis of various other factors which are also important.

## 7.1. Detection across scale
We will see an example where, in the same video, the person goes back n forth, thus making the face smaller and bigger. We notice that the OpenCV DNN detects all the faces while Dlib detects only those faces which are bigger in size. We also show the size of the detected face along with the bounding box.

![](https://www.learnopencv.com/wp-content/uploads/2018/10/face-detection-scale-comparison.gif)

It can be seen that dlib based methods are able to detect faces of size upto ~(70×70) after which they fail to detect. As we discussed earlier, I think this is the major drawback of Dlib based methods. Since it is not possible to know the size of the face before-hand in most cases. We can get rid of this problem by upscaling the image, but then the speed advantage of dlib as compared to OpenCV-DNN goes away.

## 7.2. Non-frontal Face
Non-frontal can be looking towards right, left, up, down. Again, to be fair with dlib, we make sure the face size is more than 80×80. Given below are some examples.

![](https://www.learnopencv.com/wp-content/uploads/2018/10/fd-non-frontal-result2.jpg)

As expected, Haar based detector fails totally. HoG based detector does detect faces for left or right looking faces ( since it was trained on them ) but not as accurately as the DNN based detectors of OpenCV and Dlib.

## 7.3. Occlusion
Let us see how well the methods perform under occlusion.

![](https://www.learnopencv.com/wp-content/uploads/2018/10/fd-occlusion-result1.jpg)

Again, the DNN methods outperform the other two, with OpenCV-DNN slightly better than Dlib-MMOD. This is mainly because the CNN features are much more robust than HoG or Haar features.

# 8. Conclusion
We had discussed the pros and cons of each method in the respective sections. I recommend to try both OpenCV-DNN and HoG methods for your application and decide accordingly. We share some tips to get started.

**General Case**

In most applications, we won’t know the size of the face in the image before-hand. Thus, it is better to use OpenCV – DNN method as it is pretty fast and very accurate, even for small sized faces. It also detects faces at various angles. We recommend to use **OpenCV-DNN** in most

**For medium to large image sizes**

Dlib HoG is the fastest method on CPU. But it does not detect small sized faces ( < 70x70 ). So, if you know that your application will not be dealing with very small sized faces ( for example a selfie app ), then HoG based Face detector is a better option. Also, If you can use a GPU, then **MMOD face detector** is the best option as it is very fast on GPU and also provides detection at various angles.

**High resolution images**

Since feeding high resolution images is not possible to these algorithms ( for computation speed ), **HoG / MMOD** detectors might fail when you scale down the image. On the other hand, **OpenCV-DNN** method can be used for these since it detects small faces.

# 9. Source: [learnopencv](https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/)