# GS CV Test 1

Explorando _Grid Search_ e _Cross Validation_ para melhoramento de Hiperparâmetros de classificador **KNN** para o conjunto de dados iris

## Importando bibliotecas

In [1]:
import numpy as np
import pandas as pd

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV, train_test_split, cross_val_score

In [2]:
# Carregando conjunto de dados
X, y = load_iris(return_X_y=True)

for vizinho in [1, 3, 5, 7, 12]:
    # cross validation score para knn com 5 vizinhos
    accuracy = cross_val_score(
        estimator=KNeighborsClassifier(n_neighbors=vizinho),
        X=X,
        y=y,
        cv=3
    )
    print(f'{vizinho} vizinhos: mean = {accuracy.mean()}; std = {accuracy.std()}\narray: {accuracy}\n')

1 vizinhos: mean = 0.96; std = 0.016329931618554536
array: [0.98 0.94 0.96]

3 vizinhos: mean = 0.9733333333333333; std = 0.009428090415820642
array: [0.98 0.96 0.98]

5 vizinhos: mean = 0.98; std = 0.0
array: [0.98 0.98 0.98]

7 vizinhos: mean = 0.9733333333333333; std = 0.009428090415820642
array: [0.98 0.98 0.96]

12 vizinhos: mean = 0.9666666666666667; std = 0.024944382578492966
array: [0.96 1.   0.94]



In [3]:
gsClassifier = GridSearchCV(
    estimator=KNeighborsClassifier(),
    param_grid={'n_neighbors':[1, 3, 5, 7, 12]},
    cv=3
)
gsClassifier.fit(X, y)
gsClassifier.best_params_

{'n_neighbors': 5}

In [4]:
gridDF = pd.DataFrame(gsClassifier.cv_results_)
gridDF

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_n_neighbors,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001924,0.000858,0.007057,0.001884,1,{'n_neighbors': 1},0.98,0.94,0.96,0.96,0.01633,5
1,0.001533,0.000374,0.007469,0.001128,3,{'n_neighbors': 3},0.98,0.96,0.98,0.973333,0.009428,2
2,0.001107,0.000141,0.007573,0.001552,5,{'n_neighbors': 5},0.98,0.98,0.98,0.98,0.0,1
3,0.001031,0.000315,0.006616,0.003033,7,{'n_neighbors': 7},0.98,0.98,0.96,0.973333,0.009428,2
4,0.001417,0.000571,0.00562,0.002135,12,{'n_neighbors': 12},0.96,1.0,0.94,0.966667,0.024944,4


In [5]:
gridDF[['param_n_neighbors', 'mean_test_score', 'std_test_score', 'rank_test_score']]

Unnamed: 0,param_n_neighbors,mean_test_score,std_test_score,rank_test_score
0,1,0.96,0.01633,5
1,3,0.973333,0.009428,2
2,5,0.98,0.0,1
3,7,0.973333,0.009428,2
4,12,0.966667,0.024944,4


## Cross Validation Workflow

Os melhores hiperparâmetros podem ser determinados por tecnicas de _grid search_.

> [Fonte](https://scikit-learn.org/stable/modules/cross_validation.html)

Exemplo de fluxograma de fluxo de trabalho envolvendo _Cross Validation_ no treinamento de modelo.

![Workflow](https://scikit-learn.org/stable/_images/grid_search_workflow.png)

Exemplo de distribuição dos dados para _Cross Validation_.

![Data Distribution](https://scikit-learn.org/stable/_images/grid_search_cross_validation.png)