# Face Clustering

Face clustering is an unsupervised learning task to find unique faces in a group of unlabeled faces. I have created a module `FacialClustering`, that will cluster faces using two methods: `DBSCAN` and `chinese_whispers`, utilizing the facial embedding models from the previous notebooks. 

In [8]:
# The module takes a FacePreprocess class as input
# I designed it this way since we need to load the ssd model and weights
from modules.FacePreprocess import FacePreprocess
ssd_model = r'./models/ssd/deploy.prototxt.txt'
ssd_weights = r'./models/ssd/res10_300x300_ssd_iter_140000.caffemodel'
processor = FacePreprocess(ssd_model, ssd_weights)

## Initialize the `FacialClustering` module

In [9]:
from modules.FacialClustering import FacialClustering

# set input paths --> make sure that every image inside the directory ends with '.jpg' or '.png'
input_paths = [
    '.\dataset'
]
output_path = '.\output\\face_clustering'
cluster = FacialClustering(
    pathlist = input_paths, 
    processor = processor, 
    out_path = output_path,
    preprocess = True, # since the images in our dataset hasn't been preprocessed, set this as True
)

If the module loads correctly, you should see a `log.txt` file inside your output directory. This file will log all the clustering parameters we used.

## Method 1: Chinese Whispers

reference: 
- https://github.com/zhly0/facenet-face-cluster-chinese-whispers-/blob/master/clustering.py 
- https://en.wikipedia.org/wiki/Chinese_whispers_(clustering_method) 

In [10]:
cluster.chinese_whispers(
    FE = 'kv-resnet50', # choose your best feature extractor
    threshold = 8000, # min distance between clusters
    iterations = 3000, # number of iterations
    saveas = False, # save a copy of the clustered faces
)

It takes trial and error to adjust the threshold and number of iterations, so take your time. The `log.txt` file will keep a history of all the clustering results with the parameters used (as long as you're using the same `FacialClustering` instance), and every call to the clustering function will output an excel file containing the clustering results. You can choose to make a copy of the clustered faces separated into its own folders using the `saveas` parameter.

## Method 2: DBSCAN

reference:
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html 
- https://en.wikipedia.org/wiki/DBSCAN 
- https://github.com/AsutoshPati/Face-Clustering-using-DBSCAN 

In [11]:
cluster.DBSCAN(
    FE = 'kv-resnet50', # choose your best feature extractor
    eps = 75, # epsilon -> maximum distance between two samples in the same cluster
    min_samples = 3, # min number of samples in a neighbourhood
    metric = 'euclidean', # distance metric
    saveas = False, # save a copy of the clustered faces
)

The metrics available are metrics that are allowed by [sklearn.metrics.pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html#sklearn.metrics.pairwise_distances). 

The epsilon should be as small as possible. The logic is if it's too large all the faces will be clustered into the same cluster, however if it's too small, it will be labeled as `no_class`. 

## Conclusion

This notebook is a tutorial on how to use the `FacialClustering` module. I didn't write the algorithms myself, so a huge thanks to [zhly0](https://github.com/zhly0/) and [AsutoshPati](https://github.com/AsutoshPati/)! I compiled their codes into the same class to make it easier to test out the different algorithms and parameters.

Remember that for face recognition tasks (or basically any machine learning task), it takes trial and error to find the right model and configurations, so I hope you find this notebook useful!