## Emotion Recognition using Facial Landmarks, Python, DLib and OpenCV

van Gent, P. (2016). Emotion Recognition Using Facial Landmarks, Python, DLib and OpenCV. A tech blog about fun things with Python and embedded electronics. Retrieved from: http://www.paulvangent.com/2016/08/05/emotion-recognition-using-facial-landmarks/

http://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html
    


## Testing the landmark detector

This will result in your face with a lot of dots outlining the shape and all the “moveable parts”. The latter is of course important because it is what makes emotional expressions possible.

I have added a code to treat some exceptions.

This [stackoverflow question](https://stackoverflow.com/questions/43184887/dll-load-failed-error-when-importing-cv2) solved a big issue with DLLs, opencv and anaconda use.

To install Dlib, please use conda:
**conda install -c conda-forge dlib=19.4**

In [26]:
#Import required modules
import cv2
import dlib
import time

#Set up some required objects
video_capture = cv2.VideoCapture(1) #Webcam object
detector = dlib.get_frontal_face_detector() #Face detector

#Landmark identifier.
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat") 

while True:
        
    ret, frame = video_capture.read()
           
    #Sometimes, cap may not have initialized the capture.
    #In that case, this code shows error. You can check whether it is initialized or not by the method cap.isOpened().
    #If it is True, OK. Otherwise open it using cap.open().
    if not video_capture.isOpened():
        video_capture.open()
    
    #video_capture.read() returns a bool (True/False).
    #If frame is read correctly, it will be True. So you can check end of the video by checking this return value.
    if not ret:
        break 
        
    #Waiting input key from the user
    k = cv2.waitKey(1)
    
    if k%256 == 27: # ESC pressed
        break
    else:
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
        clahe_image = clahe.apply(gray)

        detections = detector(clahe_image, 1)#Detect the faces in the image
        for k,d in enumerate(detections): #For each detected face
            shape = predictor(clahe_image, d) #Get coordinates
            for i in range(1,68): #There are 68 landmark points on each face
                     #For each point, draw a red circle with thickness2 on the original frame
                    cv2.circle(frame, (shape.part(i).x, shape.part(i).y), 1, (0,0,255), thickness=2)
        cv2.imshow("image", frame) #Display the frame

# When everything done, release the capture
video_capture.release()
cv2.destroyAllWindows()

Landmarks detected in real time through Dlib
![landmarks](landmarks.png)



## Extracting features from the faces

   The first thing to do is find ways to transform these nice dots overlaid on your face into features to feed the classifer. Features are little bits of information that describe the object or object state that we are trying to divide into categories.

   How you extract features from your source data is actually where a lot of research is, it’s not just about creating better classifying algorithms but also about finding better ways to collect and describe data. The same classifying algorithm might function tremendously well or not at all depending on how well the information we feed it is able to discriminate between different objects or object states. If, for example, we would extract eye colour and number of freckles on each face, feed it to the classifier, and then expect it to be able to predict what emotion is expressed, we would not get far. However, the facial landmarks from the same image material describe the position of all the “moving parts” of the depicted face, the things you use to express an emotion. This is certainly useful information!
   
   To get started, let’s take the code from the example above and change it so that it fits our current needs, like this:

In [8]:
import cv2
import dlib

detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

def get_landmarks(image):
    detections = detector(image, 1)
    for k,d in enumerate(detections): #For all detected face instances individually
        shape = predictor(image, d) #Draw Facial Landmarks with the predictor class
        xlist = []
        
        ylist = []
        for i in range(1,68): #Store X and Y coordinates in two lists
            xlist.append(float(shape.part(i).x))
            ylist.append(float(shape.part(i).y))
            
        for x, y in zip(xlist, ylist): #Store all landmarks in one list in the format x1,y1,x2,y2,etc.
            landmarks.append(x)
            landmarks.append(y)
    if len(detections) > 0:
        return landmarks
    else: #If no faces are detected, return error message to other function to handle
        landmarks = "error"
        return landmarks

Here we extract the coordinates of all face landmarks. These coordinates are the first collection of features, and this might be the end of the road. You might also continue and try to derive other measures from this that will tell the classifier more about what is happening on the face. Whether this is necessary or not depends. For now let’s assume it is necessary, and look at ways to extract more information from what we have. Feature generation is always a good thing to try, if only because it brings you closer to the data and might give you ideas or alternative views at it because you’re getting your hands dirty. Later on we’ll see if it was really necessary at a classification level.

To start, look at the coordinates. They may change as my face moves to different parts of the frame. I could be expressing the same emotion in the top left of an image as in the bottom right of another image, but the resulting coordinate matrix would express different numerical ranges. However, the relationships between the coordinates will be similar in both matrices so some information is present in a location invariant form, meaning it is the same no matter where in the picture my face is.

Maybe the most straightforward way to remove numerical differences originating from faces in different places of the image would be normalising the coordinates between 0 and 1.This is easily done by:
\begin{equation*}
Xnorm = \frac{X-Xmin}{Xmax-Xmin}
\end{equation*}

However, there is a problem with this approach because it fits the entire face in a square with both axes ranging from 0 to 1. Imagine one face with its eyebrows up high and mouth open, the person could be surprised. Now imagine an angry face with eyebrows down and mouth closed. If we normalise the landmark points on both faces from 0-1 and put them next to each other we might see two very similar faces. Because both distinguishing features lie at the edges of the face, normalising will push both back into a very similar shape. The faces will end up looking very similar. Take a moment to appreciate what we have done; we have thrown away most of the variation that in the first place would have allowed us to tell the two emotions from each other! Probably this will not work. Of course some variation remains from the open mouth, but it would be better not to throw so much away.

A less destructive way could be to calculate the position of all points relative to each other. To do this we calculate the mean of both axes, which results in the point coordinates of the sort-of “centre of gravity” of all face landmarks. We can then get the position of all points relative to this central point. 

But, you may ask, why don’t we take for example the tip of the nose as the central point? This would work as well, but would also throw extra variance in the mix due to short, long, high- or low-tipped noses. The “centre point method” also introduces extra variance; the centre of gravity shifts when the head turns away from the camera, but I think this is less than when using the nose-tip method because most faces more or less face the camera in our sets. There are techniques to estimate head pose and then correct for it, but that is beyond this article.

There is one last thing to note. Faces may be tilted, which might confuse the classifier. We can correct for this rotation by assuming that the bridge of the nose in most people is more or less straight, and offset all calculated angles by the angle of the nose bridge. This rotates the entire vector array so that tilted faces become similar to non-tilted faces with the same expression.

We just slightly modify the **get_landmarks()** function from above.

In [9]:
def get_landmarks(image):
    detections = detector(image, 1)
    for k,d in enumerate(detections): #For all detected face instances individually
        shape = predictor(image, d) #Draw Facial Landmarks with the predictor class
        xlist = []
        ylist = []
        for i in range(1,68): #Store X and Y coordinates in two lists
            xlist.append(float(shape.part(i).x))
            ylist.append(float(shape.part(i).y))
            
        xmean = np.mean(xlist) #Find both coordinates of centre of gravity
        ymean = np.mean(ylist)
        xcentral = [(x-xmean) for x in xlist] #Calculate distance centre <-> other points in both axes
        ycentral = [(y-ymean) for y in ylist]
        
        landmarks_vectorised = []
        for x, y, w, z in zip(xcentral, ycentral, xlist, ylist):
            landmarks_vectorised.append(w)
            landmarks_vectorised.append(z)
            meannp = np.asarray((ymean,xmean))
            coornp = np.asarray((z,w))
            dist = np.linalg.norm(coornp-meannp)
            landmarks_vectorised.append(dist)
            landmarks_vectorised.append((math.atan2(y, x)*360)/(2*math.pi))

        data['landmarks_vectorised'] = landmarks_vectorised
    if len(detections) < 1: 
        data['landmarks_vestorised'] = "error"

That was actually quite manageable, no? **Now it’s time to put all of the above together with some stuff from the first post**. The goal is to read the existing dataset into a training and prediction set with corresponding labels, train the classifier (we use Support Vector Machines with linear kernel from SKLearn, but feel free to experiment with other available kernels such as polynomial or rbf, or other classifiers!), and evaluate the result. This evaluation will be done in two steps; first we get an overall accuracy after ten different data segmentation, training and prediction runs, second we will evaluate the predictive probabilities.

**The next thing we will be doing is returning to the two datasets from the [original post](http://www.paulvangent.com/2016/04/01/emotion-recognition-with-python-opencv-and-a-face-dataset/). Let’s see how this approach stacks up.**

## Emotion Recognition With Python, OpenCV and a Face Dataset

In the readme file, the authors mention that only a subset (327 of the 593) of the emotion sequences actually contain archetypical emotions. Each image sequence consists of the forming of an emotional expression, starting with a neutral face and ending with the emotion. So, from each image sequence we want to extract two images; one neutral (the first image) and one with an emotional expression (the last image). To help, let’s write a small python snippet to do this for us:

In [16]:
import glob
from shutil import copyfile

emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] #Define emotion order
participants = glob.glob("source_emotion\\*") #Returns a list of all folders with participant numbers

for x in participants:
    part = "%s" %x[-4:] #store current participant number
    for sessions in glob.glob("%s\\*" %x): #Store list of sessions for current participant
        for files in glob.glob("%s\\*" %sessions):
            current_session = files[20:-30]
            file = open(files, 'r')
            
            emotion = int(float(file.readline())) #emotions are encoded as a float, readline as float, then convert to integer.
            
            sourcefile_emotion = glob.glob("source_images\\%s\\%s\\*" %(part, current_session))[-1] #get path for last image in sequence, which contains the emotion
            sourcefile_neutral = glob.glob("source_images\\%s\\%s\\*" %(part, current_session))[0] #do same for neutral image
            
            dest_neut = "sorted_set\\neutral\\%s" %sourcefile_neutral[25:] #Generate path to put neutral image
            dest_emot = "sorted_set\\%s\\%s" %(emotions[emotion], sourcefile_emotion[25:]) #Do same for emotion containing image
            
            copyfile(sourcefile_neutral, dest_neut) #Copy file
            copyfile(sourcefile_emotion, dest_emot) #Copy file

## Extracting faces

The classifier will work best if the training and classification images are all of the same size and have (almost) only a face on them (no clutter). We need to find the face on each image, convert to grayscale, crop it and save the image to the dataset. We can use a HAAR filter from OpenCV to automate face finding. Actually, OpenCV provides 4 pre-trained classifiers, so to be sure we detect as many faces as possible let’s use all of them in sequence, and abort the face search once we have found one. Get them from the OpenCV directory or from [here](http://www.paulvangent.com/wp-content/uploads/2016/04/OpenCV_FaceCascade.zip) and extract to the same file you have your python files.

Create another folder called “dataset”, and in it create subfolders for each emotion (“neutral”, “anger”, etc.). The dataset we can use will live in these folders. Then, detect, crop and save faces as such;

In [17]:
import cv2
import glob

faceDet = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
faceDet_two = cv2.CascadeClassifier("haarcascade_frontalface_alt2.xml")
faceDet_three = cv2.CascadeClassifier("haarcascade_frontalface_alt.xml")
faceDet_four = cv2.CascadeClassifier("haarcascade_frontalface_alt_tree.xml")

emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] #Define emotions

def detect_faces(emotion):
    files = glob.glob("sorted_set\\%s\\*" %emotion) #Get list of all images with emotion

    filenumber = 0
    for f in files:
        frame = cv2.imread(f) #Open image
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert image to grayscale
        
        #Detect face using 4 different classifiers
        face = faceDet.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)
        face_two = faceDet_two.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)
        face_three = faceDet_three.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)
        face_four = faceDet_four.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=10, minSize=(5, 5), flags=cv2.CASCADE_SCALE_IMAGE)

        #Go over detected faces, stop at first detected face, return empty if no face.
        if len(face) == 1:
            facefeatures = face
        elif len(face_two) == 1:
            facefeatures = face_two
        elif len(face_three) == 1:
            facefeatures = face_three
        elif len(face_four) == 1:
            facefeatures = face_four
        else:
            facefeatures = ""
        
        #Cut and save face
        for (x, y, w, h) in facefeatures: #get coordinates and size of rectangle containing face
            print ("face found in file: %s" %f)
            gray = gray[y:y+h, x:x+w] #Cut the frame to size
            
            try:
                out = cv2.resize(gray, (350, 350)) #Resize face so all images have same size
                cv2.imwrite("dataset\\%s\\%s.jpg" %(emotion, filenumber), out) #Write image
            except:
               pass #If error, pass file
        filenumber += 1 #Increment image number

for emotion in emotions: 
    detect_faces(emotion) #Call functiona

face found in file: sorted_set\neutral\00_002_00000001.png
face found in file: sorted_set\neutral\00_005_00000001.png
face found in file: sorted_set\neutral\00_006_00000001.png
face found in file: sorted_set\neutral\01_001_00000001.png
face found in file: sorted_set\neutral\01_002_00000001.png
face found in file: sorted_set\neutral\01_004_00000001.png
face found in file: sorted_set\neutral\01_006_00000001.png
face found in file: sorted_set\neutral\02_001_00000001.png
face found in file: sorted_set\neutral\02_002_00000001.png
face found in file: sorted_set\neutral\02_003_00000001.png
face found in file: sorted_set\neutral\02_004_00000001.png
face found in file: sorted_set\neutral\02_009_00000001.png
face found in file: sorted_set\neutral\03_001_00000001.png
face found in file: sorted_set\neutral\03_002_00000001.png
face found in file: sorted_set\neutral\03_006_00000001.png
face found in file: sorted_set\neutral\04_001_00000001.png
face found in file: sorted_set\neutral\04_002_00000001.p

## Creating the training and classification set

Now we get to the fun part! The dataset has been organised and is ready to be recognized, but first we need to actually teach the classifier what certain emotions look like. The usual approach is to split the complete dataset into a training set and a classification set. We use the training set to teach the classifier to recognize the to-be-predicted labels, and use the classification set to estimate the classifier performance.

Note the reason for splitting the dataset: estimating the classifier performance on the same set as it has been trained is unfair, because we are not interested in how well the classifier memorizes the training set. Rather, we are interested in how well the classifier generalizes its recognition capability to never-seen-before data.

For now let’s create the training and classification set, we randomly sample and train on 80% of the data and classify the remaining 20%, and repeat the process 10 times. Afterwards we play around with several settings a bit and see what useful results we can get.

In [22]:
import cv2
import glob
import random
import numpy as np

#emotions = ["neutral", "anger", "contempt", "disgust", "fear", "happy", "sadness", "surprise"] #Emotion list
emotions = ["anger", "disgust", "happiness", "neutral", "surprise"]
fishface = cv2.face.FisherFaceRecognizer_create() #createFisherFaceRecognizer() #Initialize fisher face classifier


data = {}

def get_files(emotion): #Define function to get file list, randomly shuffle it and split 80/20
    files = glob.glob("dataset\\%s\\*" %emotion)
    random.shuffle(files)
    training = files[:int(len(files)*0.8)] #get first 80% of file list
    prediction = files[-int(len(files)*0.2):] #get last 20% of file list
    return training, prediction

def make_sets():
    training_data = []
    training_labels = []
    prediction_data = []
    prediction_labels = []
    for emotion in emotions:
        training, prediction = get_files(emotion)
        #Append data to training and prediction list, and generate labels 0-7
        for item in training:
            image = cv2.imread(item) #open image
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #convert to grayscale
            training_data.append(gray) #append image array to training data list
            training_labels.append(emotions.index(emotion))
    
        for item in prediction: #repeat above process for prediction set
            image = cv2.imread(item)
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            prediction_data.append(gray)
            prediction_labels.append(emotions.index(emotion))

    return training_data, training_labels, prediction_data, prediction_labels

def run_recognizer():
    training_data, training_labels, prediction_data, prediction_labels = make_sets()
    
    print ("training fisher face classifier")
    print ("size of training set is:", len(training_labels), "images")
    fishface.train(training_data, np.asarray(training_labels))

    print ("predicting classification set")
    cnt = 0
    correct = 0
    incorrect = 0
    for image in prediction_data:
        pred, conf = fishface.predict(image)
        if pred == prediction_labels[cnt]:
            correct += 1
            cnt += 1
        else:
            cv2.imwrite("dataset\\difficult\\%s_%s_%s.jpg" %(emotions[prediction_labels[cnt]], emotions[pred], cnt), image) #<-- this one is new
            incorrect += 1
            cnt += 1
    return ((100*correct)/(correct + incorrect))

#Now run it
metascore = []
for i in range(0,10):
    correct = run_recognizer()
    print ("got", correct, "percent correct!")
    metascore.append(correct)

print ("\n\nend score:", np.mean(metascore), "percent correct!")

training fisher face classifier
size of training set is: 395 images
predicting classification set
got 87.62886597938144 percent correct!
training fisher face classifier
size of training set is: 395 images
predicting classification set
got 87.62886597938144 percent correct!
training fisher face classifier
size of training set is: 395 images
predicting classification set
got 91.75257731958763 percent correct!
training fisher face classifier
size of training set is: 395 images
predicting classification set
got 90.72164948453609 percent correct!
training fisher face classifier
size of training set is: 395 images
predicting classification set
got 91.75257731958763 percent correct!
training fisher face classifier
size of training set is: 395 images
predicting classification set
got 89.69072164948453 percent correct!
training fisher face classifier
size of training set is: 395 images
predicting classification set
got 89.69072164948453 percent correct!
training fisher face classifier
size of t

## Looking at mistakes

The last thing that might be nice to look at is what mistakes the algorithm makes. Maybe the mistakes are understandable, maybe not. Add an extra line to the the last part of the function run_recognizer() to copy images that are wrongly classified, also create a folder “difficult” in your root working directory to house the images:

"Neutral", classified as "Sadness"
![neutral_sadness](neutral_sadness_10.jpg)

"Anger", classified as "Neutral"
![anger_neutral_29](anger_neutral_29.jpg)


Returning to the tutorial: **Emotion Recognition using Facial Landmarks, Python, DLib and OpenCV**

First let’s write some code. The approach is to first extract facial landmark points from the images, randomly divide 80% of the data into a training set and 20% into a test set, then feed these into the classifier and train it on the training set. Finally we evaluate the resulting model by predicting what is in the test set to see how the model handles the unknown data. Basically a lot of the steps are the same as what we did earlier.

In [24]:
import cv2
import glob
import random
import math
import numpy as np
import dlib
import itertools
from sklearn.svm import SVC

#emotions = ["anger", "contempt", "disgust", "fear", "happiness", "neutral", "sadness", "surprise"] #Emotion list
emotions = ["anger", "disgust", "happiness", "neutral", "surprise"] #Emotion list
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat") #Or set this to whatever you named the downloaded file
clf = SVC(kernel='linear', probability=True, tol=1e-5, verbose = False) #Set the classifier as a support vector machines with polynomial kernel

data = {} #Make dictionary for all values
#data['landmarks_vectorised'] = []

def get_files(emotion): #Define function to get file list, randomly shuffle it and split 80/20
    files = glob.glob("dataset\\%s\\*" %emotion)
    random.shuffle(files)
    training = files[:int(len(files)*0.8)] #get first 80% of file list
    prediction = files[-int(len(files)*0.2):] #get last 20% of file list
    return training, prediction

def get_landmarks(image):
    detections = detector(image, 1)
    for k,d in enumerate(detections): #For all detected face instances individually
        shape = predictor(image, d) #Draw Facial Landmarks with the predictor class
        xlist = []
        ylist = []
        for i in range(1,68): #Store X and Y coordinates in two lists
            xlist.append(float(shape.part(i).x))
            ylist.append(float(shape.part(i).y))
            
        xmean = np.mean(xlist)
        ymean = np.mean(ylist)
        xcentral = [(x-xmean) for x in xlist]
        ycentral = [(y-ymean) for y in ylist]

        landmarks_vectorised = []
        for x, y, w, z in zip(xcentral, ycentral, xlist, ylist):
            landmarks_vectorised.append(w)
            landmarks_vectorised.append(z)
            meannp = np.asarray((ymean,xmean))
            coornp = np.asarray((z,w))
            dist = np.linalg.norm(coornp-meannp)
            landmarks_vectorised.append(dist)
            landmarks_vectorised.append((math.atan2(y, x)*360)/(2*math.pi))

        data['landmarks_vectorised'] = landmarks_vectorised
    if len(detections) < 1: 
        data['landmarks_vestorised'] = "error"

def make_sets():
    training_data = []
    training_labels = []
    prediction_data = []
    prediction_labels = []
    for emotion in emotions:
        print(" working on %s" %emotion)
        training, prediction = get_files(emotion)
        #Append data to training and prediction list, and generate labels 0-7
        for item in training:
            image = cv2.imread(item) #open image
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #convert to grayscale
            clahe_image = clahe.apply(gray)
            get_landmarks(clahe_image)
            if data['landmarks_vectorised'] == "error":
                print("no face detected on this one")
            else:
                training_data.append(data['landmarks_vectorised']) #append image array to training data list
                training_labels.append(emotions.index(emotion))
    
        for item in prediction:
            image = cv2.imread(item)
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
            clahe_image = clahe.apply(gray)
            get_landmarks(clahe_image)
            if data['landmarks_vectorised'] == "error":
                print("no face detected on this one")
            else:
                prediction_data.append(data['landmarks_vectorised'])
                prediction_labels.append(emotions.index(emotion))

    return training_data, training_labels, prediction_data, prediction_labels   

accur_lin = []
for i in range(0,10):
    print("Making sets %s" %i) #Make sets by random sampling 80/20%
    training_data, training_labels, prediction_data, prediction_labels = make_sets()

    npar_train = np.array(training_data) #Turn the training set into a numpy array for the classifier
    npar_trainlabs = np.array(training_labels)
    print("training SVM linear %s" %i) #train SVM
    clf.fit(npar_train, training_labels)

    print("getting accuracies %s" %i) #Use score() function to get accuracy
    npar_pred = np.array(prediction_data)
    pred_lin = clf.score(npar_pred, prediction_labels)
    print ("linear: ", pred_lin)
    accur_lin.append(pred_lin) #Store accuracy in a list

print("Mean value lin svm: %s" %np.mean(accur_lin)) #FGet mean accuracy of the 10 runs

Making sets 0
 working on anger
 working on disgust
 working on happiness
 working on neutral
 working on surprise
training SVM linear 0
getting accuracies 0
linear:  0.917525773196
Making sets 1
 working on anger
 working on disgust
 working on happiness
 working on neutral
 working on surprise
training SVM linear 1
getting accuracies 1
linear:  0.865979381443
Making sets 2
 working on anger
 working on disgust
 working on happiness
 working on neutral
 working on surprise
training SVM linear 2
getting accuracies 2
linear:  0.886597938144
Making sets 3
 working on anger
 working on disgust
 working on happiness
 working on neutral
 working on surprise
training SVM linear 3
getting accuracies 3
linear:  0.855670103093
Making sets 4
 working on anger
 working on disgust
 working on happiness
 working on neutral
 working on surprise
training SVM linear 4
getting accuracies 4
linear:  0.927835051546
Making sets 5
 working on anger
 working on disgust
 working on happiness
 working on neut

## Results

In the previous post, for the standard set at 8 categories we managed to get **83.13%** accuracy with the FisherFace classifier. This approach yields **83,52%** on the same data with the linear kernel, a few better. 
Using the polynomial SVM for the 8 categories we managed to get **80.76%**

Let's run with a reduced dataset with 5 emotions (leaving out "contempt", "fear" and "sadness"), because the 3 categories had very few images:  
Fisher Face classifier:  **90.30%**  
Linear SVM:  **89.69**  
Polynomial SVM: **89.58%**

Let's summarize the results in the next table: 

| Classifier | Full Dataset Acc | Reduced Dataset Acc         
| :-: |:-------------: | :-:
| Fisher  | 83.13% | 90.30%
| Linear SVM | 83,52% | 89.69
| Polynomial SVM | 80.76% | 89.58%

Therefore, we can note that all classifiers were better when using the reduced dataset because the categories are more balanced.
Also, we can note that the Fisher Classifier had a similar or better accuracy than others classifiers. I am not sure why my results are different from the [original tutorial](http://www.paulvangent.com/2016/08/05/emotion-recognition-using-facial-landmarks/), but I can suppose that the Fisher Classifier that I have used is different from him: I have used the "cv2.face.FisherFaceRecognizer_create()" while the original tutorial has used "cv2.createFisherFaceRecognizer()". Besides that, the environment in general (python, dlib, opencv) can have different versions.

The main problem using this tutorial was in the inital setup. I could not managed to follow the steps suggested by the author,  so I tried to setup my environment by myself:  
Dowload OpenCV from [here](https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv) ou try "pip install opencv-python"  
Install Dlib through conda: "conda install -c conda-forge dlib=19.4"

Finally, I am sure that I have learned new things about how to detect faces, landmarks and how to use them in a classifier. Dlib is a useful library and makes the work to be easy!