ROB311 - TP n°3 - KNN for Facial Expression Detection

Introduction to the task

---

The objective of this task is to sort out a dataset of size 48 x 48 pixel grayscale images depicting human expressions. 

After extracting features through the Local Binary Patterns (LBP) method, the images from the FER-2013 dataset will be classified in seven categories of emotions with a KNN-Classifier. 

Principles

---

° The Facial Expression Recognition (FER) 2013 Dataset contains around 30,000 labeled 48 x 48 pixel grayscale images depicting human expressions, that can be classified into seven categories:

0 - Angry; 1 - Disgust; 2 - Fear; 3 - Happy; 4 - Sad; 5 - Surprise ; 6 - Neutral

This dataset is widely used in the field of facial expression recognition. 

° The Local Binary Patterns method consists in thresholding the neighbourhood of each pixel (that is to say, its 8 neighbors). The threshold value being the value of the considered pixel, neighbors with a value inferior to the threshold are assigned the value 0 while the others are assigned the value 1. After converting the resulting binary number of 8 bits into an integer, its value is assigned to the considered pixel and is called the LBP value of the pixel. 

For an image, the LBP method consists in dividing the image into blocks, calculating the LBP value of each of its pixels, and represent these values in a histogram for each block. Then, all of these histograms are concatenated in a feature vector. 

° KNN:

Once every image is translated into a feature vector, the images can be represented in a "feature space", where each image is a point while the different types of features are the axes. As a result, images with similar features form groups in the feature space. 

To classify correctly a new, unlabeled image, we use the LBP method to translate the image into a feature vector, that is to say, a new point in the feature space. 

Then, we use the KNN method, where KNN stands for "K Nearest Neighbors", with K being chosen as the number of nearest neighbors to consider for the request. We calculate a distance (for instance a Euclidean distance) between the new point and all the other labeled points, and pick the K nearest points. A vote is held between the labels of these K points, and the label with the majority is finally chosen as the label of our new image.

Description of the code

---

Five functions compose the following code:

° "lecture" allows to:

- Convert each image of a given folder into a feature vector through the LBP method. These feature vectors are saved in a list containing arrays, called data. 
- Save the label associated to the image in a list: 0 for "Angry", 1 for "Disgust", 2 for "Fear", 3 for "Happy", 4 for "Sad", 5 for "Surprise" and 6 for "Neutral"

° "repertoire_train" creates the lists X_train and y_train:

- X_train contains the histograms associated with the images reserved for training
- y_train contains the labels of these images

° "repertoire_test" creates the lists X_test and y_test:

- X_test contains the histograms associated with the images reserved for testing
- y_test contains the labels of these images

X_test and y_test allow us to evaluate the performances of the recognition algorithm on data it has not yet seen.

° "LBP" assigns each image the corresponding LBP histogram.

° "classification" applies the KNN method to X_train, y_train, X_test dans y_test:

- It creates a class "knn" that contains the KNN Classifier
- Then, the fit function trains the model on X_train and y_train : it distinguishes the different features in the data and helps the model associate them different parameters.
- Once our model is trained, it is able to make a prediction on X_test
- "accuracy" compares the predicted labels y_pred with the true labels y_train

In [92]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
Created on Mon Oct  9 08:56:37 2023

@author: remimotio

ROB311 - Apprentissage pour la robotique 

TP : Facial Expression Classification

"""

import os

import skimage as ski

from skimage.feature import local_binary_pattern

from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics

PATH_TRAIN = './archive/train/'

train = os.listdir(PATH_TRAIN) # train est un tableau contenant les sept mots désignant les catégories d'expressions faciales

def lecture(folder, data, labels, label):
    dossier = os.listdir(folder)
    for img in dossier :
        image_path = os.path.join(folder, img)
        image = ski.io.imread(image_path)
        features = LBP(image)
        data.append(features.flatten())
        labels.append(label)
    return data, labels
        
def repertoire_train():
    folder_angry = './archive/train/angry'
    folder_disgust = './archive/train/disgust'
    folder_fear = './archive/train/fear'
    folder_happy = './archive/train/happy'
    folder_neutral = './archive/train/neutral'
    folder_sad = './archive/train/sad'
    folder_surprise = './archive/train/surprise'
    X_train = []
    y_train = []
    X_train, y_train = lecture(folder_angry, X_train, y_train, 0)
    X_train, y_train = lecture(folder_disgust, X_train, y_train, 1)
    X_train, y_train = lecture(folder_fear, X_train, y_train, 2)
    X_train, y_train = lecture(folder_happy, X_train, y_train, 3)
    X_train, y_train = lecture(folder_neutral, X_train, y_train, 4)
    X_train, y_train = lecture(folder_sad, X_train, y_train, 5)
    X_train, y_train = lecture(folder_surprise, X_train, y_train, 6)
    return X_train, y_train

def repertoire_test():
    folder_angry = './archive/test/angry'
    folder_disgust = './archive/test/disgust'
    folder_fear = './archive/train/fear'
    folder_happy = './archive/train/happy'
    folder_neutral = './archive/train/neutral'
    folder_sad = './archive/train/sad'
    folder_surprise = './archive/train/surprise'
    X_test = []
    y_test = []
    X_test, y_test = lecture(folder_angry, X_test, y_test, 0)
    X_test, y_test = lecture(folder_disgust, X_test, y_test, 1)
    X_test, y_test = lecture(folder_fear, X_test, y_test, 2)
    X_test, y_test = lecture(folder_happy, X_test, y_test, 3)
    X_test, y_test = lecture(folder_neutral, X_test, y_test, 4)
    X_test, y_test = lecture(folder_sad, X_test, y_test, 5)
    X_test, y_test = lecture(folder_surprise, X_test, y_test, 6)
    return X_test, y_test

def LBP(image):
    radius = 2 
    n_points = 8 * radius
    lbp = local_binary_pattern(image, n_points, radius)
    return lbp

def classification(X_train, y_train, X_test, y_test):
    knn = KNeighborsClassifier(n_neighbors=5)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = metrics.accuracy_score(y_test, y_pred)
    return accuracy

In [None]:
X_train, y_train = repertoire_train()

In [None]:
X_test, y_test = repertoire_test()

In [None]:
classification(X_train, y_train, X_test, y_test)

Results

---

We can able the folders one at a time to evaluate the performances of the algorithm:

- With only the "angry" folder given to X_train and X_test, we obtain the expected accuracy of 1.0 
- With "angry" and "disgust", we obtain an accuracy of 0.8811973807296539
- Adding "fear", we reach 0.705187766163376
- With "happy", the accuracy is 0.7193280025846055
- With "neutral", it is 0.657961489680618
- Adding "sad", the accuracy is 0.5903679653679653
- And finally, with all seven emotions, we obtain an accuracy of 0.5835404584368958

After numerous trials, it appears that having radius = 2 and n_points = 8 * radius provides us with the best performance on this data. According to python, n_points represents the number of circularly symmetric neighbour set points, while radius is the spatial resolution of the operator. 

Analysis

---

The combination of the LBP and KNN methods provide a performing facial expression recognition algorithm, with a competitive accuracy on a large dataset such as FER-2013.

Nonetheless, these performances could still be improved, through various means:

1. The computation time of the algorithm is still high, and might be reduced by reorganizing the algorithm.

2. In some applications, an accuracy of 58 % might not be sufficient, and could justify using different methods, or combining the one presented here with other algorithms. 

3. An improved training would be likely improve the performances of the algorithm. However, optimizing the training, in particular the dataset provided to the algorithm to train, demands knowledge and efforts. Moreover, the context of use must be taken into account to provide adequate training.