# Table of Contents
---
* Understand how it works
* Case study
* Implementation
* Understanding the Implementation
* Applications of the algorithm

## Understand how it works
---
### Feature vector
---
In general, ML algorithms take a dataset as input and learn from the given data. The algorithm goes through the data and identifies patterns. To identify whose face is present in a given image, we could look at multiple things to find a pattern:
* Heigh/width of the face. It might not be reliable since the image could be rescaled to a smaller face. However, even after rescaling, what remains unchanged are the ratios – they won't change.
* Colour of the face.
* Width of other parts of the face like nose, lips, etc.

As you can see, clearly there is a pattern as different faces have different dimensions. Similar faces have similar dimensions.

The main challenge is to convert a particular face into numbers – ML algorithms only understand numbers. Therefore, the numerical representation of a "face" (or any element in the training set) is termed as a **feature vector**, compromising various numbers in a specific order.

As a simple example, we can map a "face" into a feature vector which can compromise features like:
* Height of face (cm)
* Width of face (cm)
* Average colour of face (RGB)
* Width of lips (cm)
* Height of nose (cm)

We can then convert it to a feature vector like:

| Height of face (cm) | Width of face (cm) | Average colour of face (RGB) | Width of lips (cm) | Height of nose (cm) |
| --- | --- | --- | --- | --- |
| 23.1 | 15.8 | (255, 224, 189) | 5.2 | 4.4 |

Therefore, the vector could be represented as (23.1, 15.8, 255, 224, 189, 5.2, 4.4).

Obviously there could be countless other features that could be derived from the image (for instance, hair colour, facial hair, glasses, etc).

### After encoding each image into a feature vector
---
Now that everything is encoded, the problem becomes much simpler.
Clearly, when we have 2 faces (images) that represent the same person, the feature vectors derived will be quite similar. **The "distance" between the 2 feature vectors will be quite small!**.

ML can help with 2 things:
1. *Deriving the feature vector*: it is difficult to manually list down all of the features because there are too many. A ML algorithm can intelligently label out many of such features. For example, a complex feature could be: ratio of height of nose and width of forehead. Now it will be quite difficult for a human to list down all such "second order" features.
2. *Matching algorithms*: a ML algorithm needs to match a new image with the set of feature vectors present in the corpus.

## Case Study
---
We are given a bunch of faces – possibly of celebrities like Mark Zuckerberg, Warren Buffett, Bill Gates, Shah Rukh Khan, etc. Call this bunch of faces as our "corpus". Now, we are given image of yet another celebrity ("new celebrity"). The task is simple – identify if this "new celebrity" is among those present in the "corpus".

Here are some of the images in the corpus:

![faces](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/08/face1.png "Some faces!")

Now here is the "new celebrity":
![faces](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/08/face2.png "Some faces!")

There is a simple Python library that encapsulates all of what we learn above – creating features vectors out of faces and knowing how to differentiate across faces. The library is called **face_recognition** and deep within, it employs **dlib** – a modern C++ toolkit that contains several ML algorithms that help in writing sophisticated C++ based applications.

**face_recognition** library in Python can perform a large number of tasks:
* Find all the faces in a given image
* Find and manipulate facial features in an image
* Identify faces in image
* Real-time face recognition

## Implementation
---
This section covers the code for building a straightforward face recognition system using the mentioned libraries.

In [11]:
#import the libraries
import os
import face_recognition

#make a list of all the available images
images = os.listdir('images')

#load image
image_to_be_matched = face_recognition.load_image_file('guess.jpeg')

In [12]:
#turn into a feature vector
image_to_be_matched_encoded = face_recognition.face_encodings(image_to_be_matched)[0]

In [13]:
#iterate over each image
for image in images:
    #load
    current_image = face_recognition.load_image_file("images/"+image)
    #turn into feature vector
    current_image_encoded = face_recognition.face_encodings(current_image)[0]
    #match with the iamge and check if it matches
    result = face_recognition.compare_faces(
    [image_to_be_matched_encoded], current_image_encoded)
    #check if it was a match
    if result[0] == True:
        print("Matched: " + image)
    else:
        print("Not matched: " + image)

Not matched: hunter.jpg
Not matched: kaitlyn.jpg
Not matched: mikaela.jpg
Matched: brie.jpg
Not matched: zendaya.jpg
Not matched: barbie.jpg
Not matched: diana.jpg
