# <font style = "color:rgb(50,120,229)">Facial Recognition (Fisher faces)</font>

# <font style = "color:rgb(50,120,229)">Introduction</font>
In the last lecture we learned about Principal Component Analysis (PCA). The idea behind PCA was to transform data so that the dimensions along which the variance was high were preserved and the dimensions along with data does not vary much were removed. 

There is one shortcoming of PCA when used for image classification or face recognition. It is independent the classes in the data. It is not designed for separability of data, but simply for dimensionality reduction. 

We can do better with Linear Discriminant Analysis (LDA). 

**<font style="color:rgb(255,0,0)">Note:</font>** It is a pleasure to present this idea in this class because I have published several research papers with two of the authors of the Fisher Faces paper, Dr. David Kriegman and Dr. Peter Belhumeur. In addition, Dr. David Kriegman was my Ph.D. adviser and I co-founded my first company with him.

# <font style = "color:rgb(50,120,229)">Linear Discriminant Analysis (LDA)</font>

Linear Discriminant Analysis was invented in 1936 by Ronald A. Fisher. It is similar in spirit to PCA in that the goal is to reduce the dimensions, but unlike PCA, LDA takes class (category) information in the data. 

1. Data belonging to the same class should cluster tightly in the new space
2. Data belonging to different classes should far removed (i.e. separable) from each other. 

## <font style = "color:rgb(50,120,229)">PCA vs LDA</font>

In Figure 1, we have shown data points that belong to two different classes -- red and blue. Performing PCA on this data will result in the principal component shown using the solid line in Figure 1 (a). In Figure 1 (b) we have rotated the space so the principal component is horizontal. Finally, we project the points vertically down in Figure $(c)$ . This represents the coordinates of points along the principal axis. You can see that even though now instead of two dimensions, we have 1 dimension, the dots belonging to two different classes are not separable. 


| <center><a href="https://www.learnopencv.com/wp-content/uploads/2018/01/opcv4face-w9-m2-dataPointsOfTwoClasses.png"><img src="https://www.learnopencv.com/wp-content/uploads/2018/01/opcv4face-w9-m2-dataPointsOfTwoClasses.png"/></a></center> | 
| -------- | 
| Figure 1. (a) Data points belonging to two different classes are shown. The line is the first principal component. The dashed line show direction of projection. (b) The space is rotated so the principal component is horizontal ( for better visualization ) $(c)$ The data points are projected onto the principal component. Notice the red and blue dots overlap and there is no clear way to classify the dots when projected onto the principal components.     | 


 

This is why depending on the data, applying PCA may not be a great idea after all if the final goal is classification. 

On the other hand, the goal of LDA is to reduce dimension while discriminating between classes. Figure 2 shows the projection of data along the first LDA component. You can see in Figure 2 $(c)$ the two classes cluster nicely and easily separable. 

| <center><a href="https://www.learnopencv.com/wp-content/uploads/2018/01/opcv4face-w9-m2-dataPointsOfTwoClasses1.png"><img src="https://www.learnopencv.com/wp-content/uploads/2018/01/opcv4face-w9-m2-dataPointsOfTwoClasses1.png"/></a></center> | 
| -------- | 
|Figure 2. (a) Data points belonging to two different classes are shown. The line is the first LDA component. The dashed line show direction of projection. (b) The space is rotated so LDA component is horizontal ( for better visualization ) $(c)$ The data points are projected onto the LDA component. Notice the red and blue form separate clusters.     |

 

### <font style = "color:rgb(50,120,229)">How does LDA work?</font>

As mentioned earlier, the goal of LDA is to choose a dimension where the classes are nicely separated. Let’s look at the simple example of two classes in two dimension. For example, the red and blue dots shown in Figure 1 and Figure 2. 

Let, the two classes have means $\mu_1$ and $\mu_2$, and variances $\sigma_1$ and $\sigma_2$ after they have been projected onto an axis. LDA finds the line the maximizes the following objective 

$$
\frac{(\mu_1 − \mu_2)^2}{\sigma_1^2 + \sigma_2^2} \hspace{9cm}(1)
$$

Now, remember the mean of a class represents a dot in the center of all data points of the same class. The variance is a measure of spread of the data points. If the data points are far from each other, the  variance is high. 

The objective shown in (1) is maximized when 
1. The means $\mu_1$ and $\mu_2$ are far from each other so that the numerator is large. 
2. The variances $\sigma_1$  and $\sigma_2$ are low so the denominator is small. When this happens, the data from the same class clusters tightly. 

When you have more than two classes, we use an extension of Fisher’s discrimant by Dr. C. R. Rao. We first find the mean $\mu$ of all data points (regardless of their class) and maximize the distance of class means from this global mean. Multiclass LDA optimizes the following objective 


$$
\frac{\sum^n_{i=1} ( \mu − \mu_i)^2}{ \sum_{i=1}^n \sigma_i^2} \hspace{9cm}(2)
$$


# <font style = "color:rgb(50,120,229)">Fisherfaces</font> 
Fisherfaces is the idea of LDA applied to the problem of face recognition. The most compelling argument given in favor of Fisherfaces in the original paper was that it deals with lighting variations much better than Eigenfaces. 

Keeping the pose of the head constant, if you only changed the lighting in the scene, the face pixels can look dramatically different. The raw pixel values are often closer for images of two different people under the same lighting conditions compared to images of the same person under different lighting conditions.

But there is hope. Even though the space of 100x100 faces is 10,000x1 dimensional, the lighting variations are limited to a three dimensional linear subspace of this large space! 

The authors in the Fisherfaces paper “choose projection directions that are nearly orthogonal to the within-class scatter, projecting away variations in lighting and facial expression while maintaining discriminability.” 

# <font style = "color:rgb(50,120,229)">References and Further Reading</font>
[Linear Discriminant Analysis](https://en.wikipedia.org/wiki/Linear_discriminant_analysis)

[Fisher Faces - Scholarpedia](http://www.scholarpedia.org/article/Fisherfaces)

[Fisher Faces - Blog](http://www.bytefish.de/blog/fisherfaces/)