# Avaliando algoritmos de apredizagem

## Grid-search

***

O Grid-search é usado para encontrar os hiperparâmetros ideais de um modelo que resultem em previsões mais "precisas". 

[Link](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

## Importando bibliotecas

In [10]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, StratifiedKFold, GridSearchCV

# Classes do modelo de aprendizado
from sklearn.neighbors import KNeighborsClassifier

# Funções de avaliação dos modelos
from sklearn.metrics import classification_report, f1_score, accuracy_score

from sklearn.metrics import confusion_matrix

import warnings
warnings.filterwarnings('ignore')

### LEAVE-ONE-OUT

Fornece índices de treinamento/teste para dividir os dados em conjuntos de treinamento/teste.

Cada amostra é usada uma vez como um conjunto de teste (singleton) enquanto as amostras restantes formam o conjunto de treinamento. [Link](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html)

In [11]:
#carregando o csv
dataset = pd.read_csv("https://raw.githubusercontent.com/johnattandouglas/monitoria-ml/main/Datasets/Iris.csv")

# Mapeando os valores da classe para inteiro (para fins de visualização da região de decisão)
dataset['Species'] = pd.factorize(dataset['Species'])[0]


### Separando o conjunto de dados

In [12]:
#Vamos usar somente duas features SepalLengthCm e SepalWidthCm
X = dataset.loc[:,["SepalLengthCm", "SepalWidthCm"]] 
y = dataset.loc[:,["Species"]]

#Separando o conjunto de dados em treinamento e teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y)

## Treinamento do modelo com os parâmetros default 

In [13]:
# vamos criar um classificador kNN com k=5
model = KNeighborsClassifier()
model.fit(X_train, y_train)

# e ver a sua performance no dataset de teste
print(classification_report(y_test, model.predict(X_test)))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.63      0.80      0.71        15
           2       0.73      0.53      0.62        15

    accuracy                           0.78        45
   macro avg       0.79      0.78      0.77        45
weighted avg       0.79      0.78      0.77        45



In [14]:
cm = confusion_matrix(y_test, model.predict(X_test), labels=model.classes_)
print(cm)

[[15  0  0]
 [ 0 12  3]
 [ 0  7  8]]


## Seleção de parâmetros com o Grid-Search 

In [15]:
model = KNeighborsClassifier()

parameters = {'n_neighbors': [11, 9, 7, 5, 3, 1],
              'metric':["euclidean", "manhattan"]}

grid = GridSearchCV(estimator = model,             # k-nn
                    param_grid = parameters,       # dicionário com valores para serem testados (Pares Chave-Valor)
                    scoring = 'accuracy',          # métrica de avaliação
                    cv = 5)                        # cross-validation

grid.fit(X_train, y_train)

y_pred = grid.predict(X_test)

print("Melhor parametro:", grid.best_params_)         
# performance no dataset de teste
print(classification_report(y_test, grid.predict(X_test)))

Melhor parametro: {'metric': 'euclidean', 'n_neighbors': 7}
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.71      0.80      0.75        15
           2       0.77      0.67      0.71        15

    accuracy                           0.82        45
   macro avg       0.83      0.82      0.82        45
weighted avg       0.83      0.82      0.82        45



In [16]:
pd.DataFrame(grid.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_metric,param_n_neighbors,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001506,0.001414,0.005677,0.001556,euclidean,11,"{'metric': 'euclidean', 'n_neighbors': 11}",0.809524,0.714286,0.857143,0.619048,0.761905,0.752381,0.081927,6
1,0.015104,0.020382,0.003777,0.003907,euclidean,9,"{'metric': 'euclidean', 'n_neighbors': 9}",0.809524,0.714286,0.809524,0.714286,0.761905,0.761905,0.042592,3
2,0.003591,0.003231,0.002929,0.003171,euclidean,7,"{'metric': 'euclidean', 'n_neighbors': 7}",0.857143,0.714286,0.952381,0.714286,0.666667,0.780952,0.106904,1
3,0.001752,0.00177,0.0031,0.00311,euclidean,5,"{'metric': 'euclidean', 'n_neighbors': 5}",0.857143,0.666667,0.952381,0.714286,0.666667,0.771429,0.114286,2
4,0.001971,0.002276,0.003535,0.004291,euclidean,3,"{'metric': 'euclidean', 'n_neighbors': 3}",0.809524,0.761905,0.857143,0.714286,0.619048,0.752381,0.081927,6
5,0.0007,0.000601,0.003796,0.002845,euclidean,1,"{'metric': 'euclidean', 'n_neighbors': 1}",0.714286,0.714286,0.761905,0.666667,0.619048,0.695238,0.048562,11
6,0.001251,0.001424,0.003781,0.003939,manhattan,11,"{'metric': 'manhattan', 'n_neighbors': 11}",0.809524,0.714286,0.809524,0.666667,0.666667,0.733333,0.064594,9
7,0.000914,0.000186,0.003098,0.000877,manhattan,9,"{'metric': 'manhattan', 'n_neighbors': 9}",0.809524,0.714286,0.809524,0.666667,0.761905,0.752381,0.055533,8
8,0.0004,0.00049,0.003585,0.002428,manhattan,7,"{'metric': 'manhattan', 'n_neighbors': 7}",0.857143,0.714286,0.857143,0.714286,0.666667,0.761905,0.079682,3
9,0.004392,0.00134,0.015948,0.015562,manhattan,5,"{'metric': 'manhattan', 'n_neighbors': 5}",0.857143,0.666667,0.904762,0.714286,0.666667,0.761905,0.099887,3


In [17]:
model = KNeighborsClassifier()

parameters = {'n_neighbors': [11, 9, 7, 5, 3, 1],
              'metric':["euclidean", "manhattan"]}

# Configurar a validação cruzada K-Fold
skf = StratifiedKFold(n_splits=5, random_state=42, shuffle=True)

grid = GridSearchCV(estimator = model,             
                    param_grid = parameters,       
                    scoring = 'accuracy',
                    cv=skf)  #StratifiedKFold                      

grid.fit(X_train, y_train)

y_pred = grid.predict(X_test)

print("Melhor parametro:", grid.best_params_)         
# performance no dataset de teste
print(classification_report(y_test, grid.predict(X_test)))

Melhor parametro: {'metric': 'euclidean', 'n_neighbors': 7}
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.71      0.80      0.75        15
           2       0.77      0.67      0.71        15

    accuracy                           0.82        45
   macro avg       0.83      0.82      0.82        45
weighted avg       0.83      0.82      0.82        45



In [18]:
# Imprimir os melhores parâmetros e a melhor acurácia
print("Melhores parâmetros:", grid.best_params_)
print("Melhor acurácia média:", grid.best_score_)

# Exibir resultados detalhados
print("\nDetalhes dos resultados:")
results = grid.cv_results_
for mean_score, params in zip(results['mean_test_score'], results['params']):
    print(f"{params}, Acurácia média: {mean_score:.3f}")

Melhores parâmetros: {'metric': 'euclidean', 'n_neighbors': 7}
Melhor acurácia média: 0.780952380952381

Detalhes dos resultados:
{'metric': 'euclidean', 'n_neighbors': 11}, Acurácia média: 0.743
{'metric': 'euclidean', 'n_neighbors': 9}, Acurácia média: 0.743
{'metric': 'euclidean', 'n_neighbors': 7}, Acurácia média: 0.781
{'metric': 'euclidean', 'n_neighbors': 5}, Acurácia média: 0.762
{'metric': 'euclidean', 'n_neighbors': 3}, Acurácia média: 0.705
{'metric': 'euclidean', 'n_neighbors': 1}, Acurácia média: 0.676
{'metric': 'manhattan', 'n_neighbors': 11}, Acurácia média: 0.752
{'metric': 'manhattan', 'n_neighbors': 9}, Acurácia média: 0.714
{'metric': 'manhattan', 'n_neighbors': 7}, Acurácia média: 0.762
{'metric': 'manhattan', 'n_neighbors': 5}, Acurácia média: 0.752
{'metric': 'manhattan', 'n_neighbors': 3}, Acurácia média: 0.705
{'metric': 'manhattan', 'n_neighbors': 1}, Acurácia média: 0.676
