# Multi Task Cascaded CNN

MTCNN (Multi-task Cascaded Convolutional Neural Networks) is an algorithm consisting of 3 stages, which detects the bounding boxes of faces in an image along with their 5 Point Face Landmarks Each stage gradually improves the detection results by passing it’s inputs through a CNN, which returns candidate bounding boxes with their scores, followed by non max suppression.


In stage 1 the input image is scaled down multiple times to build an image pyramid and each scaled version of the image is passed through it’s CNN. In stage 2 and 3 we extract image patches for each bounding box and resize them (24x24 in stage 2 and 48x48 in stage 3) and forward them through the CNN of that stage. Besides bounding boxes and scores, stage 3 additionally computes 5 face landmarks points for each bounding box.

The MTCNN algorithm works in three steps and use one neural network for each. The first part is a proposal network. It will predict potential face positions and their bounding boxes like an attention network in Faster R-CNN. The result of this step is a large number of face detections and lots of false detections. The second part uses images and outputs of the first prediction. It makes a refinement of the result to eliminate most of false detections and aggregate bounding boxes. The last part refines even more the predictions and adds facial landmarks predictions (in the original MTCNN implementation).

In [3]:
from mtcnn import MTCNN
import cv2
import numpy as np
print("imported")

imported


In [4]:
detector = MTCNN()
print("loaded")

loaded


In [6]:
cap = cv2.VideoCapture(0)

print("Stream started")
while True:
    # Capture frame-by-frame

    _, frame = cap.read()

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    #gray = cv2.resize(gray, (0, 0), fx=0.25, fy=0.25)

    results = detector.detect_faces(frame)
    if results != []:
        for result in results:
            bounding_box = result['box']


            # Rectangle around the face
            cv2.rectangle(frame,(bounding_box[0], bounding_box[1]),(bounding_box[0]+bounding_box[2], bounding_box[1] + bounding_box[3]), (255, 0, 0), 3)


    # Display the video output
    cv2.imshow('Video', frame)


    # Quit video by typing Q
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break


cap.release()
cv2.destroyAllWindows()

Stream started


In [None]:
#faster but acc still not that good but best one so far....
