# Model Comparison

## 1. Introduction

We will now classify the test sets using our two models and compare the average F1-scores.

The F1-score is the harmonic mean between the precision and recall, and so a good measure of success when precision and recall are equally important. This is explained further, along with other options for evaluation metrics within multi-class classification, in [Evaluating Multi-Class Classifiers](https://medium.com/apprentice-journal/evaluating-multi-class-classifiers-12b2946e755b#:~:text=Two%20methods%2C%20micro%2Daveraging%2C,class%20to%20calculate%20the%20average.). If there are fewer classes, other techniques might be appropriate to look at alongside F1-scores, such as confusion matrices.

The F1-score is given for each separate class, but we want an overall score for all the classes. We therefore choose to take the macro average. The macro-average is the unweighted average of the F1-scores of each class. This means that each class is treated as equally important regardless of how many instances there are of this class. This corrects for imbalanced classes within the test sets.

In [2]:
import pandas as pd
import numpy as np
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
import tensorflow as tf

## 2. Load in test sets

We will be looking at each one separately as well as the combined scores, in order to see the strengths and weaknesses of the models.

In [3]:
test_labels=np.load('32_filter_test_label.npy')

orig_test=np.load('32_filter_test_data.npy')
noisy_test=np.load('noisy_test.npy')
colour_patch_test=np.load('colour_patched_test.npy')
other_patch_test=np.load('other_patched_test.npy')
same_patch_test=np.load('same_patched_test.npy')
cw_005_advs_test=np.load('005_cw_advs.npy')
cw_02_advs_test=np.load('02_cw_advs.npy')

In [4]:
cw_advs_test=np.concatenate((cw_005_advs_test, cw_02_advs_test), axis=0)

## 2. Basic CNN 
First we load in the model.

In [5]:
cnn_model=tf.keras.models.load_model('basic_cnn_model_filter.h5')

Then we get the predictions for the test sets. We begin with the original test set.

The first line below returns probabilities across the classes. We then take the index of the maximum in each row to be the predicted class.

In [6]:
y_pred_prob = cnn_model.predict(orig_test)



In [8]:
y_pred = np.argmax(y_pred_prob, axis=-1)

We can look at the classification report if we want, which shows the precision, recall, F1-score and number of that class in the test set for all of the classes, as well as averages of these.

In [57]:
from sklearn.metrics import classification_report
print(classification_report(test_labels, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        60
           1       0.99      1.00      1.00       720
           2       1.00      1.00      1.00       750
           3       0.99      0.98      0.98       450
           4       1.00      0.99      1.00       660
           5       0.93      1.00      0.96       630
           6       1.00      0.85      0.92       150
           7       1.00      1.00      1.00       450
           8       1.00      0.95      0.98       450
           9       1.00      1.00      1.00       480
          10       1.00      1.00      1.00       660
          11       0.99      0.99      0.99       420
          12       1.00      1.00      1.00       690
          13       1.00      1.00      1.00       720
          14       0.97      1.00      0.98       270
          15       1.00      1.00      1.00       210
          16       1.00      1.00      1.00       150
          17       1.00    

But we have chosen to look at the macro average F1-score as our evaluation metric, so we can just report that as so.

In [10]:
report=classification_report(test_labels, y_pred,output_dict=True)
macro_f1 = report['macro avg']['f1-score']
print(macro_f1)

0.9832841326326808


We do the same for each of our adversarial test sets.

In [11]:
y_pred_prob_noisy = cnn_model.predict(noisy_test)
y_pred_noisy = np.argmax(y_pred_prob_noisy, axis=-1)
report_noisy=classification_report(test_labels, y_pred_noisy,output_dict=True)
macro_f1_noisy = report_noisy['macro avg']['f1-score']
print(macro_f1_noisy)

0.9190819000175338


In [12]:
y_pred_prob_colour_patch = cnn_model.predict(colour_patch_test)
y_pred_colour_patch = np.argmax(y_pred_prob_colour_patch, axis=-1)
report_colour_patch=classification_report(test_labels, y_pred_colour_patch,output_dict=True)
macro_f1_colour_patch = report_colour_patch['macro avg']['f1-score']
print(macro_f1_colour_patch)

0.9422006227902302


In [13]:
y_pred_prob_other_patch = cnn_model.predict(other_patch_test)
y_pred_other_patch = np.argmax(y_pred_prob_other_patch, axis=-1)
report_other_patch=classification_report(test_labels, y_pred_other_patch,output_dict=True)
macro_f1_other_patch = report_other_patch['macro avg']['f1-score']
print(macro_f1_other_patch)

0.9368071543386852


In [14]:
y_pred_prob_same_patch = cnn_model.predict(same_patch_test)
y_pred_same_patch = np.argmax(y_pred_prob_same_patch, axis=-1)
report_same_patch=classification_report(test_labels, y_pred_same_patch,output_dict=True)
macro_f1_same_patch = report_same_patch['macro avg']['f1-score']
print(macro_f1_same_patch)

0.9517097250498128


In [15]:
y_pred_prob_cw = cnn_model.predict(cw_advs_test)
y_pred_cw = np.argmax(y_pred_prob_cw, axis=-1)
report_cw=classification_report(test_labels, y_pred_cw,output_dict=True)
macro_f1_cw = report_cw['macro avg']['f1-score']
print(macro_f1_cw)

0.02174800572586635


We can see that the score for the CW adversarial images is very bad. This is because these were created to specifically fool this model. Out of the other adversarial images, the noisy ones have the next lowest score, although it is not very low. Then these are followed by images patched with other images, images with randomly coloured patches, and images patched with the same image.

Now we obtain a combined F1-score for how well the model performed on the test set as a whole. This score represents how well the model performs if we say that we care equally about misclassification in each of the separate test sets.

In [16]:
y_pred_prob_combined=np.concatenate((y_pred_prob, y_pred_prob_noisy, y_pred_prob_colour_patch, y_pred_prob_other_patch, y_pred_prob_same_patch, y_pred_prob_cw), axis=0)

In [17]:
y_pred_combined=np.concatenate((y_pred, y_pred_noisy, y_pred_colour_patch, y_pred_other_patch, y_pred_same_patch, y_pred_cw), axis=0)

In [18]:
test_labels_repeated=np.concatenate((test_labels, test_labels, test_labels, test_labels, test_labels, test_labels), axis=0)

In [19]:
report_combined=classification_report(test_labels_repeated, y_pred_combined,output_dict=True)
macro_f1_combined = report_combined['macro avg']['f1-score']
print(macro_f1_combined)

0.7623218084605388


## 3. Model With Defences

This model consists of three stages: an autoencoder, a classifier of whether an image is adversarial or not, and then the CNN. So we begin by loading in the three models.

In [21]:
autoencoder=tf.keras.models.load_model('denoising_autoencoder.h5')

In [22]:
adv_classifier=tf.keras.models.load_model('adv_classifier_200.h5')

In [23]:
defences_cnn=tf.keras.models.load_model('DEFENCE_cnn_200.h5')

First we pass each set of images through the autoencoder. This should remove any noise from the images.

In [24]:
orig_test_autoencoder=autoencoder.predict(orig_test)
noisy_test_autoencoder=autoencoder.predict(noisy_test)
colour_patch_test_autoencoder=autoencoder.predict(colour_patch_test)
other_patch_test_autoencoder=autoencoder.predict(other_patch_test)
same_patch_test_autoencoder=autoencoder.predict(same_patch_test)
cw_advs_test_autoencoder=autoencoder.predict(cw_advs_test)



Next we obtain a classification of whether the image is adversarial or not.

In [25]:
orig_test_classifications=adv_classifier.predict(orig_test_autoencoder)
noisy_test_classifications=adv_classifier.predict(noisy_test_autoencoder)
colour_patch_test_classifications=adv_classifier.predict(colour_patch_test_autoencoder)
other_patch_test_classifications=adv_classifier.predict(other_patch_test_autoencoder)
same_patch_test_classifications=adv_classifier.predict(same_patch_test_autoencoder)
cw_advs_test_classifications=adv_classifier.predict(cw_advs_test_autoencoder)



The above are probabilities so we have to convert these into binary outputs by checking if they are over 0.5 or not.

In [26]:
orig_test_classifications_b = np.where(orig_test_classifications[:, 0] > 0.5, 1, 0)
noisy_test_classifications_b = np.where(noisy_test_classifications[:, 0] > 0.5, 1, 0)
colour_patch_test_classifications_b = np.where(colour_patch_test_classifications[:, 0] > 0.5, 1, 0)
other_patch_test_classifications_b = np.where(other_patch_test_classifications[:, 0] > 0.5, 1, 0)
same_patch_test_classifications_b = np.where(same_patch_test_classifications[:, 0] > 0.5, 1, 0)
cw_advs_test_classifications_b = np.where(cw_advs_test_classifications[:, 0] > 0.5, 1, 0)

Now we obtain the predicitions using the CNN as we did before, except now we have the extra input of the adversarial classification. 

We begin with the original test set.

In [27]:
y_pred_prob_defence = defences_cnn.predict([orig_test_autoencoder,orig_test_classifications_b])



In [28]:
y_pred_defence = np.argmax(y_pred_prob_defence, axis=-1)

In [58]:
report_defence=classification_report(test_labels, y_pred_defence,output_dict=True)
macro_f1_defence = report_defence['macro avg']['f1-score']
print(macro_f1_defence)

0.8972372679996006


Now we move on to the adversarial test sets.

In [30]:
y_pred_prob_noisy_defence = defences_cnn.predict([noisy_test_autoencoder,noisy_test_classifications_b])
y_pred_noisy_defence = np.argmax(y_pred_prob_noisy_defence, axis=-1)
report_noisy_defence=classification_report(test_labels, y_pred_noisy_defence,output_dict=True)
macro_f1_noisy_defence = report_noisy_defence['macro avg']['f1-score']
print(macro_f1_noisy_defence)

0.8856664256720654


In [32]:
y_pred_prob_colour_patch_defence = defences_cnn.predict([colour_patch_test_autoencoder,colour_patch_test_classifications_b])
y_pred_colour_patch_defence = np.argmax(y_pred_prob_colour_patch_defence, axis=-1)
report_colour_patch_defence=classification_report(test_labels, y_pred_colour_patch_defence,output_dict=True)
macro_f1_colour_patch_defence = report_colour_patch_defence['macro avg']['f1-score']
print(macro_f1_colour_patch_defence)

0.8480705305317889


In [34]:
y_pred_prob_other_patch_defence = defences_cnn.predict([other_patch_test_autoencoder,other_patch_test_classifications_b])
y_pred_other_patch_defence = np.argmax(y_pred_prob_other_patch_defence, axis=-1)
report_other_patch_defence=classification_report(test_labels, y_pred_other_patch_defence,output_dict=True)
macro_f1_other_patch_defence = report_other_patch_defence['macro avg']['f1-score']
print(macro_f1_other_patch_defence)

0.855512027419354


In [35]:
y_pred_prob_same_patch_defence = defences_cnn.predict([same_patch_test_autoencoder,same_patch_test_classifications_b])
y_pred_same_patch_defence = np.argmax(y_pred_prob_same_patch_defence, axis=-1)
report_same_patch_defence=classification_report(test_labels, y_pred_same_patch_defence,output_dict=True)
macro_f1_same_patch_defence = report_same_patch_defence['macro avg']['f1-score']
print(macro_f1_same_patch_defence)

0.8697185820849073


In [31]:
y_pred_prob_cw_defence = defences_cnn.predict([cw_advs_test_autoencoder,cw_advs_test_classifications_b])
y_pred_cw_defence = np.argmax(y_pred_prob_cw_defence, axis=-1)
report_cw_defence=classification_report(test_labels, y_pred_cw_defence,output_dict=True)
macro_f1_cw_defence = report_cw_defence['macro avg']['f1-score']
print(macro_f1_cw_defence)

0.8691694353521981


And then we have a combined F1-score again.

In [36]:
y_pred_combined_defence=np.concatenate((y_pred_defence, y_pred_noisy_defence, y_pred_colour_patch_defence, y_pred_other_patch_defence, y_pred_same_patch_defence, y_pred_cw_defence), axis=0)

In [39]:
report_combined_defence=classification_report(test_labels_repeated, y_pred_combined_defence,output_dict=True)
macro_f1_combined_defence = report_combined_defence['macro avg']['f1-score']
print(macro_f1_combined_defence)

0.8709262001405003


So we can see that in comparison to the basic CNN, the one with added defences saw a very large improvement on the CW adversarial images, but saw reductions to the F1-score in all of the other test sets. This very large improvement in the CW images meant that it performed better on the test set overall.

## 4. No Classifier

By combining the testing of the autoencoder and the adversarial classifier, we don't know what the effects of each were. We therefore created an extra CNN which was trained only on autoencoder outputs, no adversarial binary label, and will test this as well in order to compare the results to the full defences model above.

In [59]:
defences_cnn_no_class=tf.keras.models.load_model('defence_no_classifier200.h5')

In [60]:
y_pred_prob_no_class = defences_cnn_no_class.predict(orig_test_autoencoder)



In [61]:
y_pred_no_class = np.argmax(y_pred_prob_no_class, axis=-1)

In [62]:
print(classification_report(test_labels, y_pred_no_class))

              precision    recall  f1-score   support

           0       0.90      0.60      0.72        60
           1       0.91      0.97      0.94       720
           2       0.91      0.90      0.91       750
           3       0.84      0.91      0.87       450
           4       0.94      0.90      0.92       660
           5       0.85      0.91      0.88       630
           6       0.97      0.69      0.80       150
           7       0.91      0.84      0.87       450
           8       0.89      0.86      0.87       450
           9       0.96      0.93      0.94       480
          10       0.98      0.98      0.98       660
          11       0.94      0.97      0.96       420
          12       0.99      0.98      0.99       690
          13       1.00      1.00      1.00       720
          14       1.00      1.00      1.00       270
          15       0.94      0.98      0.96       210
          16       0.99      1.00      1.00       150
          17       1.00    

In [63]:
report_no_class=classification_report(test_labels, y_pred_no_class,output_dict=True)
macro_f1_no_class = report_no_class['macro avg']['f1-score']
print(macro_f1_no_class)

0.9004759492978515


In [64]:
y_pred_prob_noisy_no_class = defences_cnn_no_class.predict(noisy_test_autoencoder)
y_pred_noisy_no_class = np.argmax(y_pred_prob_noisy_no_class, axis=-1)
report_noisy_no_class=classification_report(test_labels, y_pred_noisy_no_class,output_dict=True)
macro_f1_noisy_no_class = report_noisy_no_class['macro avg']['f1-score']
print(macro_f1_noisy_no_class)

0.8872441267727066


In [65]:
y_pred_prob_colour_patch_no_class = defences_cnn_no_class.predict(colour_patch_test_autoencoder)
y_pred_colour_patch_no_class = np.argmax(y_pred_prob_colour_patch_no_class, axis=-1)
report_colour_patch_no_class=classification_report(test_labels, y_pred_colour_patch_no_class,output_dict=True)
macro_f1_colour_patch_no_class = report_colour_patch_no_class['macro avg']['f1-score']
print(macro_f1_colour_patch_no_class)

0.851108092660862


In [67]:
y_pred_prob_other_patch_no_class = defences_cnn_no_class.predict(other_patch_test_autoencoder)
y_pred_other_patch_no_class = np.argmax(y_pred_prob_other_patch_no_class, axis=-1)
report_other_patch_no_class=classification_report(test_labels, y_pred_other_patch_no_class,output_dict=True)
macro_f1_other_patch_no_class = report_other_patch_no_class['macro avg']['f1-score']
print(macro_f1_other_patch_no_class)

0.8559818387701263


In [68]:
y_pred_prob_same_patch_no_class = defences_cnn_no_class.predict(same_patch_test_autoencoder)
y_pred_same_patch_no_class = np.argmax(y_pred_prob_same_patch_no_class, axis=-1)
report_same_patch_no_class=classification_report(test_labels, y_pred_same_patch_no_class,output_dict=True)
macro_f1_same_patch_no_class = report_same_patch_no_class['macro avg']['f1-score']
print(macro_f1_same_patch_no_class)

0.8714061233428062


In [69]:
y_pred_prob_cw_no_class = defences_cnn_no_class.predict(cw_advs_test_autoencoder)
y_pred_cw_no_class = np.argmax(y_pred_prob_cw_no_class, axis=-1)
report_cw_no_class=classification_report(test_labels, y_pred_cw_no_class,output_dict=True)
macro_f1_cw_no_class = report_cw_no_class['macro avg']['f1-score']
print(macro_f1_cw_no_class)

0.8606667320155059


In [72]:
y_pred_combined_no_class=np.concatenate((y_pred_no_class, y_pred_noisy_no_class, y_pred_colour_patch_no_class, y_pred_other_patch_no_class, y_pred_same_patch_no_class, y_pred_cw_no_class), axis=0)

In [73]:
report_combined_no_class=classification_report(test_labels_repeated, y_pred_combined_no_class,output_dict=True)
macro_f1_combined_no_class = report_combined_no_class['macro avg']['f1-score']
print(macro_f1_combined_no_class)

0.8711554757080944


We find that the results of this model are essentially the same as the results from the full defences model. This implies that the classification of adversarial images provided no benefit to the model.