Facial Keypoint Detection with CNNs

This repository contains a facial keypoint detection project. Deep Convolutional Neural Networks (CNNs) are used to perform a regression which maps grayscale images to 68 facial landmarks with floating point (x,y) coordinates each. The original starter code comes from a project/challenge presented in the Udacity Computer Vision Nanodegree, which can be found here: P1_Facial_Keypoints. Pytorch is used as the deep learning framework.

The final result is far from production deployment quality; among others, the architecture, the model hyperparameters and the training have a large margin of improvement, as one can deduce from the evaluation metrics -- the Improvements section provides some hints about what I'd try to change if I had time. However, I am impressed by how easy it is with the modern tools we have for free to address problems that not long ago were considered difficult. The fact that less than 30 minutes of GPU training are enough to achieve a plausible prediction is also astonishing. My son below finds it cool, too 😄

Images from the Youtube Faces Dataset are used used to train the network; the preprocessed pictures as well as their ground-truth facial keypoints can be downloaded from the Udacity repository P1_Facial_Keypoints, and they consist of 5770 face instances altogether.

In short, the following is implemented:

Custom data loader class which yields images and facial keypoints.
Custom transform classes.
The definition of a CNN from scratch, as well as its training and evaluation.
The implementation of face detection using the Haar cascades classifier from OpenCV.
Image in-painting of funny objects (e.g., glasses, etc.) by using the predicted facial landmarks.

In summary, I think the project is a nice example of a computer vision application which leverages artifical neural networks.

For the original project implementation instructions and evaluation overview, see the file Instructions.md; instead, this text focuses on more general explanations related to the methods used in the project.

Table of contents:

Facial Keypoint Detection with CNNs

Overview and File Structure

The project is mainly implemented in four notebooks that guide the undertaking end-to-end and two python scripts; the latter consist of the data loader and model definition. Altogether, the folder contains the following files:

1. Load and Visualize Data.ipynb                        # Dataset preprocessing an loader
2. Define the Network Architecture.ipynb                # Network training
3. Facial Keypoint Detection, Complete Pipeline.ipynb   # Face detection and keypoint inference
4. Fun with Keypoints.ipynb                             # Image in-painting of objects
Instructions.md                                         # Original instructions
LICENSE                                                 # MIT license by Udacity
README.md                                               # Current file
data/                                                   # Dataset images and facial keypoints
data_load.py                                            # Data loader definition
detector_architectures/                                 # Haar cascades classifiers
images/                                                 # Auxiliary images
models.py                                               # Model definiton
requirements.txt                                        # Dependencies to be installed

How to Use This

The implementation has a research side-project character, thus, most of the code is contained in enumerated Jupyter notebooks which should be executed in order and from start to end; note that the data loader and the model definition are imported in the notebooks from separate scripts.

I you want to train the model, you should consider doing it on a machine with (powerful) GPUs, although that is not a necessary condition.

Dependencies

Please, have a look at the dependencies description in the repository P1_Facial_Keypoints.

A short summary of commands required to have all in place:

conda create -n faces python=3.6
conda activate faces
conda install pytorch torchvision -c pytorch 
conda install pip
pip install -r requirements.txt

Face Detection and the Facial Keypoint Regression Model

The final application can be broken down to two major steps:

Face detection: given an image, the bounding boxes of the regions that contain human faces are returned.
Facial landmark estimation: given a Region of Interest (ROI) which contains an image patch under one of the detected bounding boxes, the trained network performs a regression of the parametrized 68 points to adjust them to the face in the ROI.

In order to implement the face detection, the Haar cascades classifier from OpenCV is employed. This classifier applies several pre-determined filters in sequence on the target image to determine whether it contains a face. In addition to that, bounding boxes with the likeliest face regions are returned.

The second step works with a custom-defined Convolutional Neural Network (CNN) that follows the guidelines in the literature; it consists of 4 convolutional layers that repeat the basic formula Convolution (5x5 or 3x3) + ReLU + MaxPool, and a final sequence of two fully connected layers that map the feature maps of the four convolutional layer to the 136 = 68 x 2 facial keypoint coordinates. Dropout is also applied in the last two convolutional layers and after every linear layer to prevent overfitting. The model has around 9.4 million parameters.

This basic network resembles LeNet, one of the first CNNs; despite its simplicity it achieves a considerable performance after short training times. Several straightforward modifications are worth trying, e.g.:

The extension of the architecture with at least one more convolutional layer and a linear one.
The use of batch normalization to center weights, stabilize the training and avoid dropout in the convolutional layers.
The use of transfer learning with backbone models pre-trained on the ImageNet dataset; e.g., VGG16, or even better, ResNet50.

More improvements are briefly summarized in the next section.

Improvements and Possible Extensions

Authorship

Mikel Sagardia, 2022.
No guarantees.

You are free to use this project, but please link it back to the original source.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github/workflows		.github/workflows
.ipynb_checkpoints		.ipynb_checkpoints
data		data
detector_architectures		detector_architectures
images		images
saved_models		saved_models
.gitignore		.gitignore
1. Load and Visualize Data.ipynb		1. Load and Visualize Data.ipynb
2. Define the Network Architecture.html		2. Define the Network Architecture.html
2. Define the Network Architecture.ipynb		2. Define the Network Architecture.ipynb
3. Facial Keypoint Detection, Complete Pipeline.html		3. Facial Keypoint Detection, Complete Pipeline.html
3. Facial Keypoint Detection, Complete Pipeline.ipynb		3. Facial Keypoint Detection, Complete Pipeline.ipynb
4. Fun with Keypoints.html		4. Fun with Keypoints.html
4. Fun with Keypoints.ipynb		4. Fun with Keypoints.ipynb
5. Zip Your Project Files and Submit.ipynb		5. Zip Your Project Files and Submit.ipynb
CODEOWNERS		CODEOWNERS
Instructions.md		Instructions.md
LICENSE		LICENSE
README.md		README.md
data_load.py		data_load.py
issues.txt		issues.txt
models.py		models.py
requirements.txt		requirements.txt
workspace_utils.py		workspace_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Facial Keypoint Detection with CNNs

Overview and File Structure

How to Use This

Dependencies

Face Detection and the Facial Keypoint Regression Model

Improvements and Possible Extensions

Authorship

About

Releases

Packages

Languages

License

mxagar/P1_Facial_Keypoints

Folders and files

Latest commit

History

Repository files navigation

Facial Keypoint Detection with CNNs

Overview and File Structure

How to Use This

Dependencies

Face Detection and the Facial Keypoint Regression Model

Improvements and Possible Extensions

Authorship

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages