# Introduction

In this notebook, all experiments on this project will be demonstrated. First, an evaluation of information loss (IL) and re-identification (RI) metric will be run. Second, a single k value is chosen with a corresponding partitioning algorithm and a embedding method to show box-plot and average pair-wise distance results. Third, different versions of AnonFACES will be evaluated. 

Two dataset will be chosen, the first one is RafD dataset with only 67 identities. The second one is a sample CelebA with 979 identities. Note that we only chose a sample of CelebA mainly due to running time, we are not going to evaluate with a range of k values running in the whole dataset of CelebA with 10177 identies since it would require significant higher waiting time.

In [None]:
#Load important modules
%load_ext autoreload
%autoreload 2

import utils
import numpy as np
import pickle
import pandas as pd

import partitioning
import anonymizer


# IL and RI evaluations

There are three seperate experiment in this section, including:
- Embbeding evaluation: Dlib, FaceNet and AAM
- Partitioning evaluation: Hierachical Partitioning (HP), k-Mean Partitioning, k-NN Partitioning.
- Generator evaluation: StyleGAN, CNN, AAM

In [None]:
#Load the embbeds 
raw_dlib, data_dlib = utils.load_pickle('datasets/encoding_data/dlib_celeba_979.pickle')
raw_pca, data_pca = utils.load_pickle('datasets/encoding_data/pca_celeba_979.pickle')
raw_facenet, data_facenet = utils.load_pickle('datasets/encoding_data/facenet_celeba_979.pickle')

#Load latent vectors for StyleGAN
latent_vecs = utils.get_vectornames('datasets/stylegan_data/latent_vectors/', raw_dlib)

#Load StyleGAN 
generator_network, _, _ = anonymizer.styleGan.load_styleGan()

#K values
k_values = range(2,20)

In [16]:
#Prepare for results
IL_results = pd.DataFrame({'k': k_values})
RI_results = pd.DataFrame({'k': k_values})

## Embbeding evaluation
This experiment will be carried on the sample of CelebA with HP algorithm and StyleGAN generator. With each embbeding, k value will be vary in range(2,20).

In [None]:
#Dlib
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   isAdjustWeight=True
                                                )
IL_results['hier_dlib_adjust'] = ILs
RI_results['hier_dlib_adjust'] = RIs

#FaceNet
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   clt_data= data_facenet,
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   isAdjustWeight=True
                                                )
IL_results['hier_facenet_adjust'] = ILs
RI_results['hier_facenet_adjust'] = RIs

#PCA
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   clt_data= data_pca,
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   isAdjustWeight=True
                                                )
IL_results['hier_pca_adjust'] = ILs
RI_results['hier_pca_adjust'] = RIs

# Save result
IL_results.to_pickle('Outputs/IL_results.pickle')
RI_results.to_pickle('Outputs/RI_results.pickle')

## Partitioning algorithm evaluation
In this experiment, Dlib embedding and StyleGAN generator will be chosen. Two test will be conducted with k-Mean and k-NN partitioning algorithm (the test with HP have already done above)

In [None]:

#FaceNet
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   clustering=partitioning.kmeans_partition,
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   isAdjustWeight=True
                                                )
IL_results['kmeans_dlib_adjust'] = ILs
RI_results['kmeans_dlib_adjust'] = RIs

#PCA
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   clustering=partitioning.kNN_partition,
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   isAdjustWeight=True
                                                )
IL_results['knn_dlib_adjust'] = ILs
RI_results['knn_dlib_adjust'] = RIs

# Save result
IL_results.to_pickle('Outputs/IL_results.pickle')
RI_results.to_pickle('Outputs/RI_results.pickle')

## Generator evaluation
Three generators will be compared on RafD dataset. The reason for choosing this dataset is that the CNN has been trained on it and the training code is relatively difficult to modify for another dataset. In this experiment, Dlib embedding and HP partitioning will be used. 

In [None]:
# Load latent vectors for RafD dataset
rafd_latents = 'datasets/stylegan_data/latent_vectors_rafd/'

#Load embbedings for RafD dataset
raw_data, data = utils.load_pickle('datasets/encoding_data/encodings.pickle')

# StyleGAN
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(rafd_latents, 
                                                       data, 
                                                       raw_data, 
                                                       generator_network, 
                                                       k_range=k_values, 
                                                       isBackward=True,
                                                       isAdjustWeight=True
                                                      )
IL_results['stylegan_rafd'] = ILs
RI_results['stylegan_rafd'] = RIs

#CNN
ILs, RIs = anonymizer.cnn.evaluate_cnn(data, raw_data, k_range=k_values)
IL_results['cnn_rafd'] = ILs
RI_results['cnn_rafd'] = RIs

# Save result
IL_results.to_pickle('Outputs/IL_results.pickle')
RI_results.to_pickle('Outputs/RI_results.pickle')

The AAM generator has different dependent packages, its results have been run on another Notebook (folder /related_works/k-same-m/). Here we only load the results.

In [29]:
aam_results = pd.read_pickle('related_works/k-same-m/k_same_m_k20.pkl')
IL_results['aam_rafd'] = aam_results['IL']
RI_results['aam_rafd'] = aam_results['FailProb']
# Save result
IL_results.to_pickle('Outputs/IL_results.pickle')
RI_results.to_pickle('Outputs/RI_results.pickle')

# Pair-wise distance evaluation
Based on the IL metric, it is possible to calculate the pair-wise distance between original and anonymized images. Given a k value (k=5), this experiment is conducted on a sample of CelebA dataset. This instance of AnonFACES includes Dlib embbeding, HP partitioning and StyleGAN generator. 

In [None]:
# Do clustering
k_value = 5
clusters = partitioning.hierarchical_partition(data_dlib, cluster_size= k_value)

# Syntherize new images for clusters
avg_dist, pair_wise_dists,_, label_list,_, _ = anonymizer.styleGan.cluster_gen(latent_vecs, clusters, data_dlib, raw_dlib, 
                                                            generator_network,
                                                            isAdjustWeight=True,
                                                            isBackward=False,
                                                            k=k_value
                                                           )

# Save results
with open('Outputs/pair_wise_dists_k5.pickle', 'wb') as f: 
    pickle.dump(pair_wise_dists, f)
avg_per_cluster = pd.DataFrame({'label': label_list, 'avg_dist': avg_dist})
avg_per_cluster.to_pickle('Outputs/avg_per_cluster_k5.pickle')

# AnonFACES versions
Different versions of AnonFACES will be evaluated. There are two on-off switches to set: isAdjustedWeight and isRandWeight. In total, there would be four combinations to test, however, clasifying based on security level we will have three test cases.The first option (called Naive) in which both of the switches turned off, the second option (called Weight Adjusted) in which only isAdjustedWeight is turned on and the last option (called Random Weight) where both of the switches turned on. As a default option, Dlib embbeding, HP partitioning and StyleGAN generator will be used.

In [None]:
AnonFACES_ILs = pd.DataFrame({'k': k_values})
AnonFACES_RIs = pd.DataFrame({'k': k_values})
# Naive
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                )
AnonFACES_ILs['Naive'] = ILs
AnonFACES_RIs['Naive'] = RIs

# Weight Adjusted 
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   isAdjustWeight=True
                                                )
AnonFACES_ILs['Weight Adjusted'] = ILs
AnonFACES_RIs['Weight Adjusted'] = RIs

# Random Weight
ILs, RIs = anonymizer.styleGan.evaluate_styleGan(  latent_vecs, 
                                                   data_dlib, 
                                                   raw_dlib, 
                                                   generator_network, 
                                                   k_range=k_values, 
                                                   isBackward=False,
                                                   alpha=1.67,
                                                   isAdjustWeight=True,
                                                   isRandWeight=True
                                                )
AnonFACES_ILs['Random Weight'] = ILs
AnonFACES_RIs['Random Weight'] = RIs

#Save results
AnonFACES_ILs.to_pickle('Outputs/AnonFACES_ILs.pickle')
AnonFACES_RIs.to_pickle('Outputs/AnonFACES_RIs.pickle')