# 02 - Kernel methods and SVMs
**Ecole Centrale Nantes**

**Diana Mateus**





PARTICIPANTS: **Yassine Jamoud, Samy Haffoudhi**
    

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import skimage
from skimage import io
import random


from skimage.color import rgb2gray
from skimage.transform import resize

from sklearn.svm import SVC
from sklearn.utils import shuffle


# 1. Image classification on Caltech 101

**a)** Download images from
http://www.vision.caltech.edu/feifeili/Datasets.htm
and run the code bellow to check the files and store the name of the classes in the list ```labelNamesAll```

(Just run)

In [None]:
## VERIFY LOCATION AND STORE LABEL NAMES

IMDIR = '101_ObjectCategories/'


labelNamesAll = []

for root, dirnames, filenames in os.walk(IMDIR):
    labelNamesAll.append(dirnames)
    #uncomment to check what is found in this folder
    #for filename in filenames:
        #f = os.path.join(root, filename)
        #if f.endswith(('.png', '.jpg', '.jpeg','.JPG', '.tif', '.gif')):
        #    print(f)

labelNamesAll = labelNamesAll[0]

#The list of all labels/directories is
print(labelNamesAll)

**b. Build a reduced dataset for accelerating process.** To do so: 
- Consider only up to $K$ randomly drawn categories (start with a binary case)
- Read only up to $N$ images for each class
- Resize the images to $(imWidth*imHeight)$

The dataset should consist of a 
- Input matrix $\mathbf{X}$ of size $(K\cdot N)\times (imWidth\cdot imHeight)$ with one image in every row of the matrix. 
- Output vector $\mathbf{y}$ of size $(K\cdot N)\times 1$ with the label index of each input point in $\bf X$.
- the reduced list of the label names of size $K$ to map between the indices and the names.

**Note than different classes may have different number of images so that the actual number of $\bf X$ and $\bf y$ is less than $K*N$**

(Run and try to understand the structure of the data)

In [None]:
#build DATASET from K categories and (up to) N images from category
K = 3 
N = 200
imWidth = 100 #resize images
imHeight = 100

#selection of label indices
X = np.zeros([K*N,imHeight*imWidth]) #data matrix, one image per row
#Y = np.zeros([K*N,1]) #label indices
Y = -np.ones([K*N,1]) #label indices
labelNames = []

random.seed(a=42) #uncomment to make errors reproducible/comment to see variability

globalCount = 0
for i in range(K): 
    while True:
        lab = random.randint(0,len(labelNamesAll)-1)
        if lab not in labelNames:
            break
    #folders are named after the class label
    filedir = os.path.join(IMDIR,labelNamesAll[lab])
    print(filedir)

    #save the name of the class
    labelNames.append(labelNamesAll[lab])       

    classCount = 0
    for filename in os.listdir(filedir):
        f = os.path.join(filedir, filename)
        if f.endswith(('.jpg')) and (classCount < N):
            #image = skimage.io.imread(f, as_grey=True) #Try this line instead of the one below if there is an error
            image = skimage.io.imread(f, as_gray=True)
            image = skimage.transform.resize(image, [imHeight,imWidth],mode='constant')#,anti_aliasing=True)
            X[globalCount,:] = image.flatten()
            Y[globalCount,:] = i
            globalCount += 1
            classCount += 1

#Remove the unused entries of X and Y
print("Total number of samples",globalCount)
X = X[:globalCount,:]
Y = Y[:globalCount,:]

#Check the stored classes
print("used labels",labelNames)
print("Size of data matrix", X.shape)
print("clas labels", Y.T)



**c**. Split the dataset into train (80% of samples) and test (20% samples). 
(Run and try to understand the structure of the data)

In [None]:
# Split in Train and test set with 80% - 20% rule

Ntrain = np.rint(.8*Y.shape[0]).astype(int)
Ntest = Y.shape[0]-Ntrain
print('Training with', Ntrain , 'training samples and ', Ntest, 'testing samples.')

# Randomize the order of X and Y
X, Y = shuffle(X, Y, random_state=0)


# Split the data and labels into training/testing sets
X_train = X[0:Ntrain,:]
Y_train = Y[0:Ntrain,:]

X_test = X[Ntrain:,:]
Y_test = Y[Ntrain:,:]

print("size of train dataset",X_train.shape)
print("size of test dataset",X_test.shape)
print("train target vector",Y_train.T)
print("test target vector",Y_test.T)

**d)** Training and testing a SVM
- Create an SVC model using the sklearn module, 
- train it on the train set, 
- and test it on the test set**. 

(Fill in the code and answer the questions)

**Question** SVMs are intrinsically binary classifiers, can you train the SVC for K>2? How is that achieved?

**ANSWER**: On peut avoir recourt aux méthodes :
* One VS One : on entraine $k(k-1)/2$ SVM et on choisit par vote majoritaire
* One VS Rest: on entraine $k$ SVM et on sélectionne le score le plus élevé

In [None]:
# Create, train and test an svm model using the sklearn SVC

clf = SVC(kernel='linear')
clf.fit(X_train, Y_train.ravel())

Y_pred = clf.predict(X_test)

print("True classes",Y_test.T)
print("Predictions",Y_pred)
errors = np.sum((Y_test.ravel()!=Y_pred))
print('There were ', errors, 'errors')

**e) Fill in the functions bellow to computing different evaluation measures and give a performance report**
Look at the formulas and definitions in https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)

Start by computing the confusion matrix, and the values TP, TN, FP, FN, for a binary case. When considering multiple clases ($K>2$) treat one class at a time as the postive class, and the remaining classes as negative. You may want to indicate the positive class as a parameter to the indicator function.

**Question:** There are three ways of resuming the scores for a multiple class problem $K>2$, namely, the macroaverage, the microaverage and the weighted average. Implement and EXPLAIN them below.

**Hint** Add a numerical zero eps to the denominators to prevent dividing by zero

**Hint2** for the multi-class case:

https://datascience.stackexchange.com/questions/15989/micro-average-vs-macro-average-performance-in-a-multiclass-classification-settin.

**ANSWER** Write your answer in the report

In [None]:
# Functions to compute the errors between prediction and ground truth 

def compute_measures(Y_gt,Y_pred, positiveClass=1): #Y_gt = ground truth
    measures = dict()
    
    eps = 1e-12
    
    TP = TN = FP = FN = 0 
    for pred, gt in zip(Y_pred, Y_gt):
        if pred != positiveClass and gt != positiveClass:
            TN += 1
        elif pred == gt and gt == positiveClass:
            TP += 1
        elif pred != gt and gt == positiveClass:
            FN += 1
        else:
            FP += 1
    
    print('TP ', TP, 'TN ', TN, 'FP ', FP, 'FN ', FN, 'Total', TP+TN+FP+FN)
    measures['TP'] = TP
    measures['TN'] = TN
    measures['FP'] = FP
    measures['FN'] = FN
    
    
    # Accuracy
    measures['accuracy'] = (TP + TN) / (TP + TN + FP + FN)
    
    # Precision
    measures['precision'] = TP / (TP + FP + eps)
        
    # Specificity
    measures['specificity'] = FP / (TN + FP + eps)
    
    # Recall
    measures['recall'] = TP / (TP + FN + eps)
    
    # F-measure
    measures['f1'] = 2 * TP / (2 * TP + FP + FN + eps)
    
    # Negative Predictive Value
    measures['npv'] = TN / (TN + FN + eps)
    
    # False Predictive Value
    measures['fpr'] = FN / (FN + TN + eps)
    
    print('Accuracy ', measures['accuracy'], '\n',
          'Precision', measures['precision'], '\n',
          'Recall', measures['recall'], '\n',
          'Specificity ', measures['specificity'], '\n',
          'F-measure', measures['f1'], '\n',
          'NPV', measures['npv'],'\n',
          'FPV', measures['fpr'],'\n')

    return measures

def micro_average(measuresList):
    microAverage = dict()
    eps = 1e-12
    
    TP = np.sum([measures['TP'] for measures in measuresList])
    FP = np.sum([measures['FP'] for measures in measuresList])
    TN = np.sum([measures['TN'] for measures in measuresList])
    FN = np.sum([measures['FN'] for measures in measuresList])
    
    # Accuracy
    microAverage['accuracy'] = (TP + TN) / (TP + TN + FP + FN)
    
    # Precision
    microAverage['precision'] = TP / (TP + FP + eps)
        
    # Specificity
    microAverage['specificity'] = TN / (TN + FP + eps)
    
    # Recall
    microAverage['recall'] = TP / (TP + FN + eps)
    
    # F-measure
    microAverage['f1'] = 2 * TP / (2 * TP + FP + FN + eps)
    
    # Negative Predictive Value
    microAverage['npv'] = TN / (TN + FN + eps)
    
    # False Predictive Value
    microAverage['fpr'] = FN / (FN + TN + eps)
        
    print('Accuracy ', microAverage['accuracy'], '\n',
          'Precision', microAverage['precision'], '\n',
          'Recall', microAverage['recall'], '\n',
          'Specificity ', microAverage['specificity'], '\n',
          'F-measure', microAverage['f1'], '\n',
          'NPV', microAverage['npv'],'\n',
          'FPV', microAverage['fpr'],'\n')
    
    return microAverage

def macro_average(measuresList):
    macroAverage = dict()

    # Accuracy
    macroAverage['accuracy'] = np.average([measure['accuracy'] for measure in measuresList])
    
    # Precision
    macroAverage['precision'] = np.average([measure['precision'] for measure in measuresList])
        
    # Specificity
    macroAverage['specificity'] = np.average([measure['specificity'] for measure in measuresList])
    
    # Recall
    macroAverage['recall'] = np.average([measure['recall'] for measure in measuresList])
    
    # F-measure
    macroAverage['f1'] = np.average([measure['f1'] for measure in measuresList])
    
    # Negative Predictive Value
    macroAverage['npv'] = np.average([measure['npv'] for measure in measuresList])
    
    # False Predictive Value
    macroAverage['fpr'] = np.average([measure['fpr'] for measure in measuresList])
    
    print('Accuracy ', macroAverage['accuracy'], '\n',
          'Precision', macroAverage['precision'], '\n',
          'Recall', macroAverage['recall'], '\n',
          'Specificity ', macroAverage['specificity'], '\n',
          'F-measure', macroAverage['f1'], '\n',
          'NPV', macroAverage['npv'],'\n',
          'FPV', macroAverage['fpr'],'\n')
    
    return macroAverage

**e)** Measure the performance of the SVC model for multiple classes $K>2$

First collect the measures when considering each class as positive, then, compute macro and microaverage 

Compare your results to those of sklearn metrics. 

In [None]:
#Fill in a list of measure dictionaries taking as input a different positive class

multiclass = []
for k in range(K):
    print('For class',labelNames[k])
    multiclass.append(compute_measures(Y_test.ravel(),Y_pred, positiveClass=k))

print('Macro-average')
macro_average(multiclass)
    
print('Micro-average')
micro_average(multiclass)

from sklearn.metrics import classification_report #confusion_matrix, accuracy_score, precision_score, recall_score, f1_micro, f1_macro
print(classification_report(Y_test.ravel(), Y_pred, target_names=labelNames, zero_division=1))

On observe qu'on obtient les même résultats que ceux fournis par scikit-learn.
Les résultats micro-average ne sont pas affichés car ils correspondent à l'accuracy qui est-elle déjà affichée.

**f) Show the test images as well as the the predictions (Y_pred) vs the ground truth (Y_gt) labels for the best model**
(Just run for each analysed model)

In [None]:
# Show some results
width=20
height=15
plt.rcParams['figure.figsize'] = [width, height]
fig=plt.figure()
imCounter = 1
for i in range(len(Y_test)):
    image=np.reshape(X_test[i,:], (imHeight,imWidth)) 

    plt.subplot(5,7,imCounter)
    plt.imshow(image,cmap='gray')
    plt.axis('off')
    gtLabel = labelNames[Y_test.ravel()[i].astype(int)]
    predLabel = labelNames[Y_pred.ravel()[i].astype(int)]
    plt.title('GT: {}. \n Pred: {}'.format(gtLabel, predLabel))

    imCounter += 1
plt.show()


**g) REPORT:**  Change the kernel and other hyperparameters of your SVC trying to optimize the F1 measure for different cases. Describe in your report the different variants of the model tried. You may want to split your dataset into train, validation and test sets this time to find the best hyperparameters. Present and discuss your findings for different hyperparameters, number of classes and numbers of images. THIS IS THE MOST IMPORTANT PART FOR THE EVALUATION. 

**Méthodologie :**

* on choisit d'optimiser le score f1_micro, plus adapté lorsqu'on dispose d'un nombre d'exemples variants en fonction des classes
* On divise le dataset en 3 parties : test, validation et entrainement
* On commence par sélectionner le noyau. Pour ce faire, on se contente d'utiliser des hyperparamètres par défaut et de choisir le noyau au meilleur score f1_micro sur le dataset de validation
* On passe alors à l'optimisation des hyper-paramètres pour le(s) modèle(s) retenu(s)

In [None]:
# Séparation du dataset en 3 parties : test, validation et entrainement

Ntrain = np.rint(.8*Y.shape[0]).astype(int)
Ntest = Nvalid = (Y.shape[0]-Ntrain) // 2
print('Training with', Ntrain , 'training samples, ', Nvalid, 'validation samples and ', Ntest, 'testing samples')

# Randomize the order of X and Y
X, Y = shuffle(X, Y, random_state=0)

# Split the data and labels into training/testing sets
X_train = X[0:Ntrain,:]
Y_train = Y[0:Ntrain,:]

X_valid = X[Ntrain:Ntrain+Nvalid,:]
Y_valid = Y[Ntrain:Ntrain+Nvalid,:]

X_test = X[Ntrain+Nvalid:,:]
Y_test = Y[Ntrain+Nvalid:,:]

print("size of train dataset",X_train.shape)
print("size of validation dataset",X_valid.shape)
print("size of test dataset",X_test.shape)
print("train target vector",Y_train.T)
print("validation target vector",Y_valid.T)
print("test target vector",Y_test.T)

In [None]:
# choix du kernel le plus prometteur

from sklearn.metrics import f1_score

kernels = ['linear', 'rbf', 'poly', 'sigmoid']

for k in kernels:
    clf = SVC(kernel=k)
    clf.fit(X_train, Y_train.ravel())
    Y_pred = clf.predict(X_valid)
    print(f"{k} kernel score: {f1_score(Y_valid, Y_pred, average='micro')}")
    # print(classification_report(Y_valid.ravel(), Y_pred, target_names=labelNames))

In [None]:
C_values = np.logspace(-3,10,10)
gamma_values = np.logspace(-9,3,10)

clfs = [SVC()] * 4
f1_best = [0] * 4
kernels = ['linear', 'rbf', 'poly', 'sigmoid']
for C in C_values: 
    for gamma in gamma_values:
        for i in range(4):
            clf = SVC(kernel=kernels[i], C=C, gamma=gamma)
            clf.fit(X_train, Y_train.ravel())
            Y_pred = clf.predict(X_valid)
            f1 = f1_score(Y_valid, Y_pred, average='micro')
            if f1 > f1_best[i]:
                clfs[i] = clf
                f1_best[i] = f1

In [None]:
print('F1 MICRO SCORES:')

for i in range(4):
    print(kernels[i], f1_best[i])

Les 3 modèles permettent d'obtenir un bon score f1_micro d'environ 0.92

In [None]:
print('F1 MICRO SCORES:')

for i in range(4):
    clf = clfs[i]
    Y_pred = clf.predict(X_test)
    print(kernels[i], f1_score(Y_test, Y_pred, average='micro'))

**Observations :**

* Pour 3 classes et en utilisant toutes les images disponibles
    * on arrive à obtenir de bons scores sur les données de validation mais on n'obtient de bien plus mauvais scores sur les données de test.
* Pour 3 classes et en utilisant un maximum de 20 images par classes
    * on obtient de moins bon scores f1_micro que précedemment, comme attendu
* Pour 2 classes et en utilisant un maximum de 20 images par classes
    * les scores des différents kernels sont plus porches les uns des autres que précedemment
* Pour 2 classes et en utilisant toutes les images disponibles
    * on obtient des scores meilleurs que pour le cas précedent
    * les 4 kernels obtiennent tous le même score  Pour le cas à 2 classes on voit alors que le choix du kernel a moins d'importance que pour celui à 3 classes
* Pour 5 classes et toutes les images
    * on observe à nouveau cette fois une différence entre les 4 kernels
    * les scores sont inférieurs à ceux précedents
    
Donc,
* Plus on utilise d'images, plus les scores seront élevés
* Plus on utilise de classes différentes, plus le problème devient compliqué et plus le cohoix des paramètres du modèle et le temps de calcul deviennent importants (plus de SVM à entrainer et nécessité de plus de données)