# Basic Clasification Systems
Enrique Juliá Arévalo, Sara Verde Camacho, Leo Pérez Peña

#### In this exercise, you are asked to compare the prediction error of:

- The Naive Bayes Classifier
- LDA
- QDA
- Nearest Shrunken Centroids Classifier

On the Breast Cancer dataset provided in the previous notebooks, and the Prostate cancer dataset attached.

#### Importantly:

- Use a random split of 2 / 3 of the data for training and 1 / 3 for testing each classifier. 
- Any hyper-parameter of each method should be tuned using a grid-search guided by an inner cross-validation procedure that uses only training data. If test data is used to tune the hyper-parameters (even indirectly) the submission will be penalized.
- To reduce the variance of the estimates, report average error results over 20 different partitions of the data into training and testing as described above. You can use a for loop over the steps described above.

#### Give some comments about the results and respond to these questions based on what you have seen in the lectures:

- What method performs best on each dataset in terms of prediction error?
- What method is more flexible according to what we have seen in the lectures? Is this compatible with the obtained results?
- What method is more robust to over-fitting according to what we have seen in the lectures? Is this compatible with the obtained results?
- Discuss and compare, regarding the bias and the variance, the procedure used to estimate the generalization error of a classifier trained on the available data with 10-fold-cross validation and leave-one-out cross-validation. Which one has a bigger bias and a bigger variance? Explain your response. You need not carry out additional experiments, only explain what you should expect to obtain according to what you have seen in the lectures.

## Importamos los módulos necesarios

In [1]:
from sklearn.naive_bayes import GaussianNB
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Abrimos y analizamos los datos

In [2]:
breast_data = pd.read_csv('wdbc.csv', header=None)
prostate_data = pd. read_csv('prostate.csv')

In [3]:
X_breast = breast_data.values[:, 2:].astype(float)
y_breast = breast_data.values[:, 1] == 'M'
y_breast = y_breast.astype(int) # 1 when M and 0 when B
breast_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [4]:
print(prostate_data.head())
print(prostate_data.values[:, -1].sum()) 

   100_g_at   1000_at   1001_at  1002_f_at  1003_s_at   1004_at   1005_at  \
0  6.927460  7.391657  3.812922   3.453385   6.070151  5.527153  5.812353   
1  7.222432  7.329050  3.958028   3.407226   5.921265  5.376464  7.303408   
2  6.776402  7.664007  3.783702   3.152019   5.452293  5.111794  7.207638   
3  6.919134  7.469634  4.004581   3.341170   6.070925  5.296108  8.744059   
4  7.113561  7.322408  4.242724   3.489324   6.141657  5.628390  6.825370   

    1006_at  1007_s_at  1008_f_at  ...  AFFX-ThrX-5_at  AFFX-ThrX-M_at  \
0  3.167275   7.354981   9.419909  ...        3.770583        2.884436   
1  3.108708   7.391872  10.539579  ...        3.190759        2.460119   
2  3.077360   7.488371   6.833428  ...        3.325183        2.603014   
3  3.117104   7.203028  10.400557  ...        3.625057        2.765521   
4  3.794904   7.403024  10.240322  ...        3.698067        3.026876   

   AFFX-TrpnX-3_at  AFFX-TrpnX-5_at  AFFX-TrpnX-M_at  AFFX-YEL002c/WBP1_at  \
0         2.73

In [5]:
X_prostate = prostate_data.values[:,:-1].astype(float)
y_prostate = prostate_data.values[:,-1] == 1 # 1 when Malignant and 0 when Benign
y_prostate = y_prostate.astype(int) 

## Partición de los datos

Vamos a separar los datos en aquellos que utilizaremos para entrenar los modelos y para testarlos. Como se indica en el enunciado, emplearemos 1/3 para el test (0.33), y 2/3 para el entrenamiento.

In [6]:
breast_complete = []
prostate_complete = []

for i in range(20):
    bre_X_train, bre_X_test, bre_y_train, bre_y_test = train_test_split( \
        X_breast, y_breast, test_size= 0.33, random_state=i)
    breast_complete.append([bre_X_train, bre_X_test, bre_y_train, bre_y_test])

    pro_X_train, pro_X_test, pro_y_train, pro_y_test = train_test_split(\
        X_prostate, y_prostate, test_size= 0.33, random_state=i)
    prostate_complete.append([pro_X_train, pro_X_test, pro_y_train, pro_y_test])
    print(f'Partition {i+1} done')

Partition 1 done
Partition 2 done
Partition 3 done
Partition 4 done
Partition 5 done
Partition 6 done
Partition 7 done
Partition 8 done
Partition 9 done
Partition 10 done
Partition 11 done
Partition 12 done
Partition 13 done
Partition 14 done
Partition 15 done
Partition 16 done
Partition 17 done
Partition 18 done
Partition 19 done
Partition 20 done


## Normalización de los datos

Para realizar la normalización hemos ajustado los datos train y posteriormente hemos utilizado los parámetros obtenidos para transformar tanto los dataset de train como los de test. De esta forma podemos saber si los parámetros que se generan se ajustan bien a datos desconocidos (de los cuales no tenemos por qué conocer la media ni la desviación estándar). 

In [7]:
# breast dataset
for i in breast_complete:
    bre_X_train, bre_X_test, bre_y_train, bre_y_test=i
    scaler = preprocessing.StandardScaler().fit(bre_X_train)
    bre_X_train_scaled = scaler.transform(bre_X_train)
    bre_X_test_scaled = scaler.transform(bre_X_test)
    i.append(bre_X_train_scaled)
    i.append(bre_X_test_scaled)

In [8]:
# prostate dataset
for i in prostate_complete:
    pro_X_train, pro_X_test, pro_y_train, pro_y_test=i
    scaler = preprocessing.StandardScaler().fit(pro_X_train)
    pro_X_train_scaled = scaler.transform(pro_X_train)
    pro_X_test_scaled = scaler.transform(pro_X_test)
    i.append(pro_X_train_scaled)
    i.append(pro_X_test_scaled)

## Clasificador de Naive Bayes

In [9]:
nb = GaussianNB()

In [10]:
pred_accuracy=[]
for i in breast_complete:
    bre_X_train, bre_X_test, bre_y_train, bre_y_test, bre_X_train_scaled, bre_X_test_scaled=i
    nb.fit(bre_X_train_scaled, bre_y_train)
    bre_y_pred = nb.predict(bre_X_test_scaled)
    conf = confusion_matrix(bre_y_test, bre_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(F"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")

Partition done. Prediction accuracy: 0.898936170212766
Partition done. Prediction accuracy: 0.9361702127659575
Partition done. Prediction accuracy: 0.9308510638297872
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9361702127659575
Partition done. Prediction accuracy: 0.9308510638297872
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9468085106382979
Partition done. Prediction accuracy: 0.9308510638297872
Partition done. Prediction accuracy: 0.9361702127659575
Partition done. Prediction accuracy: 0.9361702127659575
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9202127659574468
Partition done. Prediction accuracy: 0.9148936170212766
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9308510638297872
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9468085106

In [11]:
pred_accuracy=[]
for i in prostate_complete:
    pro_X_train, pro_X_test, pro_y_train, pro_y_test, pro_X_train_scaled, pro_X_test_scaled=i
    nb.fit(pro_X_train_scaled, pro_y_train)
    pro_y_pred = nb.predict(pro_X_test_scaled)
    conf = confusion_matrix(pro_y_test, pro_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(F"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")

Partition done. Prediction accuracy: 0.6764705882352942
Partition done. Prediction accuracy: 0.8235294117647058
Partition done. Prediction accuracy: 0.4411764705882353
Partition done. Prediction accuracy: 0.7352941176470589
Partition done. Prediction accuracy: 0.7058823529411765
Partition done. Prediction accuracy: 0.7647058823529411
Partition done. Prediction accuracy: 0.5882352941176471
Partition done. Prediction accuracy: 0.7058823529411765
Partition done. Prediction accuracy: 0.7647058823529411
Partition done. Prediction accuracy: 0.5
Partition done. Prediction accuracy: 0.5588235294117647
Partition done. Prediction accuracy: 0.4411764705882353
Partition done. Prediction accuracy: 0.6470588235294118
Partition done. Prediction accuracy: 0.7352941176470589
Partition done. Prediction accuracy: 0.5588235294117647
Partition done. Prediction accuracy: 0.5
Partition done. Prediction accuracy: 0.5294117647058824
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Predic

## Discriminant Analysis

In [12]:
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import matplotlib as mpl
from matplotlib import colors
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold, GridSearchCV
from sklearn import preprocessing
from sklearn.metrics import accuracy_score, make_scorer, confusion_matrix
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

### LDA

In [13]:
lda = LinearDiscriminantAnalysis()

In [14]:
pred_accuracy=[]
for i in breast_complete:
    bre_X_train, bre_X_test, bre_y_train, bre_y_test, bre_X_train_scaled, bre_X_test_scaled=i
    lda.fit(bre_X_train_scaled, bre_y_train)
    bre_y_pred = lda.predict(bre_X_test_scaled)
    conf = confusion_matrix(bre_y_test, bre_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(F"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")

Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.9787234042553191
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.9468085106382979
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.9680851063829787
Partition done. Prediction accuracy: 0.9680851063829787
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9574468085106383
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9574468085106383
Partition done. Prediction accuracy: 0.9574468085

In [15]:
pred_accuracy=[]
for i in prostate_complete:
    pro_X_train, pro_X_test, pro_y_train, pro_y_test, pro_X_train_scaled, pro_X_test_scaled=i
    lda.fit(pro_X_train_scaled, pro_y_train)
    pro_y_pred = lda.predict(pro_X_test_scaled)
    conf = confusion_matrix(pro_y_test, pro_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(F"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")

Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.7941176470588235
Partition done. Prediction accuracy: 0.8823529411764706
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.8235294117647058
Partition done. Prediction accuracy: 0.8235294117647058
Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.7941176470588235
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.8823529411764706
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.7647058823529411
Partition done. Prediction accuracy: 0.7941176470588235
Partition done. Prediction accuracy: 0.823529411

### QDA

In [16]:
pipeline = Pipeline([ ('qda', QuadraticDiscriminantAnalysis()) ])

In [23]:
reg_param_values = np.linspace(0, 1, 21).tolist()
param_grid = { 'qda__reg_param': reg_param_values }

In [24]:
pred_accuracy = []

for i in breast_complete:
    
    bre_X_train, bre_X_test, bre_y_train, bre_y_test, bre_X_train_scaled, bre_X_test_scaled = i

    #We do the cross-validation
    skfold = RepeatedStratifiedKFold(n_splits=10, n_repeats=1, random_state=0)
    gridcv = GridSearchCV(pipeline, cv=skfold, n_jobs=1, param_grid=param_grid, \
        scoring = make_scorer(accuracy_score))
    result = gridcv.fit(bre_X_train_scaled, bre_y_train)

    #We exctract the value of the parameter with the biggest test score
    accuracies = gridcv.cv_results_['mean_test_score']
    best_reg_param_value = reg_param_values[max(enumerate(accuracies), key=lambda x: x[1])[0]]
    
    qda = QuadraticDiscriminantAnalysis(reg_param = best_reg_param_value)

    qda.fit(bre_X_train_scaled, bre_y_train)  
    bre_y_pred = qda.predict(bre_X_test_scaled)
    
    conf = confusion_matrix(bre_y_test, bre_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(f"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")    

Partition done. Prediction accuracy: 0.9574468085106383
Partition done. Prediction accuracy: 0.9787234042553191
Partition done. Prediction accuracy: 0.9680851063829787
Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.9680851063829787
Partition done. Prediction accuracy: 0.9680851063829787
Partition done. Prediction accuracy: 0.9680851063829787
Partition done. Prediction accuracy: 0.9627659574468085
Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.9840425531914894
Partition done. Prediction accuracy: 0.9574468085106383
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9840425531914894
Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.973404255319149
Partition done. Prediction accuracy: 0.957446808510638

In [25]:
pred_accuracy = []

for i in prostate_complete:
    
    pro_X_train, pro_X_test, pro_y_train, pro_y_test, pro_X_train_scaled, pro_X_test_scaled = i
    
    #We do the cross-validation
    skfold = RepeatedStratifiedKFold(n_splits=10, n_repeats=1, random_state=0)
    gridcv = GridSearchCV(pipeline, cv=skfold, n_jobs=1, param_grid=param_grid, \
        scoring = make_scorer(accuracy_score))
    result = gridcv.fit(pro_X_train_scaled, pro_y_train)
    
    #We exctract the value of the parameter with the biggest test score
    accuracies = gridcv.cv_results_['mean_test_score']
    best_reg_param_value = reg_param_values[max(enumerate(accuracies), key=lambda x: x[1])[0]]

    qda = QuadraticDiscriminantAnalysis(reg_param = best_reg_param_value)

    qda.fit(pro_X_train_scaled, pro_y_train)
    pro_y_pred = qda.predict(pro_X_test_scaled)
    
    conf = confusion_matrix(pro_y_test, pro_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(F"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")

Partition done. Prediction accuracy: 0.6176470588235294
Partition done. Prediction accuracy: 0.6764705882352942
Partition done. Prediction accuracy: 0.4411764705882353
Partition done. Prediction accuracy: 0.6764705882352942
Partition done. Prediction accuracy: 0.8235294117647058
Partition done. Prediction accuracy: 0.6176470588235294
Partition done. Prediction accuracy: 0.7647058823529411
Partition done. Prediction accuracy: 0.7941176470588235
Partition done. Prediction accuracy: 0.7941176470588235
Partition done. Prediction accuracy: 0.5882352941176471
Partition done. Prediction accuracy: 0.6470588235294118
Partition done. Prediction accuracy: 0.6764705882352942
Partition done. Prediction accuracy: 0.7058823529411765
Partition done. Prediction accuracy: 0.7352941176470589
Partition done. Prediction accuracy: 0.8823529411764706
Partition done. Prediction accuracy: 0.5882352941176471
Partition done. Prediction accuracy: 0.7352941176470589
Partition done. Prediction accuracy: 0.823529411

## Clasificador de Nearest Shrunken Centroids

In [26]:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold, GridSearchCV
from sklearn.metrics import accuracy_score, make_scorer, confusion_matrix
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.pipeline import Pipeline
from sklearn.neighbors import NearestCentroid
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import numpy as np
import pandas as pd

In [27]:
pipeline = Pipeline([ ('nsc', NearestCentroid()) ])

In [28]:
shrinkage_param_values = np.linspace(1e-6, 8, 21).tolist()
param_grid_nsc = {'nsc__shrink_threshold': shrinkage_param_values}

In [29]:
pred_accuracy = []

for i in breast_complete:
    
    bre_X_train, bre_X_test, bre_y_train, bre_y_test, bre_X_train_scaled, bre_X_test_scaled = i

    #We do the cross-validation
    gridcv_nsc = GridSearchCV(pipeline, cv=skfold, n_jobs=1, param_grid=param_grid_nsc, \
        scoring=make_scorer(accuracy_score))
    result_nsc = gridcv_nsc.fit(bre_X_train_scaled, bre_y_train)

    #We exctract the value of the parameter with the biggest test score
    accuracies = gridcv_nsc.cv_results_['mean_test_score']
    
    nsc = NearestCentroid(shrink_threshold = shrinkage_param_values[np.argmax(accuracies)])

    nsc.fit(bre_X_train_scaled, bre_y_train)
    bre_y_pred = nsc.predict(bre_X_test_scaled)
    
    conf = confusion_matrix(bre_y_test, bre_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(f"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")    

Partition done. Prediction accuracy: 0.9148936170212766
Partition done. Prediction accuracy: 0.925531914893617
Partition done. Prediction accuracy: 0.9148936170212766
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9202127659574468
Partition done. Prediction accuracy: 0.9574468085106383
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9468085106382979
Partition done. Prediction accuracy: 0.925531914893617
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9521276595744681
Partition done. Prediction accuracy: 0.9468085106382979
Partition done. Prediction accuracy: 0.9202127659574468
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.9414893617021277
Partition done. Prediction accuracy: 0.925531914893617
Partition done. Prediction accuracy: 0.9361702127659575
Partition done. Prediction accuracy: 0.941489361702

In [30]:
pred_accuracy = []

for i in prostate_complete:
    
    pro_X_train, pro_X_test, pro_y_train, pro_y_test, pro_X_train_scaled, pro_X_test_scaled = i

    #We do the cross-validation
    gridcv_nsc = GridSearchCV(pipeline, cv=skfold, n_jobs=1, param_grid=param_grid_nsc, \
        scoring=make_scorer(accuracy_score))
    result_nsc = gridcv_nsc.fit(pro_X_train_scaled, pro_y_train)

    #We exctract the value of the parameter with the biggest test score
    accuracies = gridcv_nsc.cv_results_['mean_test_score']
    
    nsc = NearestCentroid(shrink_threshold = shrinkage_param_values[ np.argmax(accuracies)])

    nsc.fit(pro_X_train_scaled, pro_y_train)
    pro_y_pred = nsc.predict(pro_X_test_scaled)
    
    conf = confusion_matrix(pro_y_test, pro_y_pred)
    TN = conf[0][0]
    TP = conf[1][1]
    FP = conf[0][1]
    FN = conf[1][0]
    pred_accuracy.append(((TP + TN) / (TN + TP + FP + FN)))
    print(f'Partition done. Prediction accuracy: {((TP + TN) / (TN + TP + FP + FN))}')

pred_accuracy = np.array(pred_accuracy)
print(f"Prediction accuracy\nMEAN: {pred_accuracy.mean()}, SD: {pred_accuracy.std()}")    

Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.9411764705882353
Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.9411764705882353
Partition done. Prediction accuracy: 0.9411764705882353
Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.9117647058823529
Partition done. Prediction accuracy: 0.8823529411764706
Partition done. Prediction accuracy: 0.7941176470588235
Partition done. Prediction accuracy: 0.7647058823529411
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.8235294117647058
Partition done. Prediction accuracy: 0.8235294117647058
Partition done. Prediction accuracy: 0.8529411764705882
Partition done. Prediction accuracy: 0.941176470