Paper Information

Title

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

Author

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao

Published

IEEE Signal Processing Letter, CCF C

Code

Official matlab, PyTorch(only implement inference, convert weights from matlab)

Paper Content

Why

to improve the performance face detection and face alignment using deep learning.

The main problems are:

occlusion
illumination
various poses

What

using multi-task leaning, here they using face detection(binary classification, bounding box regression, and face landmark detection)
using cascaded convolution networks, 3 networks here to generate the final result from coarse to fine.

How

classification with softmax
bounding box regression and landmark regression using euclidean loss

Results

The effectiveness of online hard sample mining
The effectiveness of joint detection and alignment
Evaluation on face detection
Evaluation on face alignment

Thoughts

the first cascade face detection classifier was proposed in 2004, not by this paper. In this paper, they change the traditional methods into deep learning.
the idea of coarse to fine has a long history.
even though someone has use face detection and face alignment to do multi-task learning
deformable part model require high computational cost
any better hard samples mining methods? like focal-loss?
this paper using image pyramid? can we use feature pyramid instead?
for detection problems, there are some subproblems
1. construct the training sets, like roidb in faster-rcnn
2. using roidb for training
why the input size is so small?
do we really need facial landmark in training Pnet and Rnet?
train 3 networks independently or jointly? Independently