<h1>Conceito</h1>

A biblioteca *SKLearn* possuí uma classe chamada *grid search cross validation*. Através da mesma, é possível definir um *grid* com diversos valores de diferentes hiperparâmetros a serem testados pela classe.

<h1>Aplicação</h1>

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV

In [2]:
# Importando a base de dados
dados = pd.read_csv(r'Dados\base.csv')

In [3]:
# Dividindo inputs e outputs
x = dados[['preco', 'idade_do_modelo', 'km_por_ano']].values
y = dados['vendido'].values.ravel()

In [4]:
# Definindo um grid de hiperparâmetros a serem testados

grid_hiperparam = {
    "max_depth" : [3, 5],
    "min_samples_split": [32, 64, 128],
    "min_samples_leaf": [32, 64, 128],
    "criterion": ["gini", "entropy"]
}

In [6]:
# Realizando a validação cruzada dos hiperparâmetros:
SEED = 20

np.random.seed(SEED)

gscv = GridSearchCV(
    DecisionTreeClassifier(),
    grid_hiperparam,
    cv = KFold(n_splits=10, shuffle=True)
)

gscv.fit(x, y)

In [10]:
# É possível verificar os melhores parâmetros através do seguinte atributo da classe:
gscv.best_params_

{'criterion': 'gini',
 'max_depth': 3,
 'min_samples_leaf': 32,
 'min_samples_split': 32}

In [12]:
# É possível verificar todos os resultados através do seguinte atributo da classe:
resultados = pd.DataFrame(gscv.cv_results_)

In [13]:
resultados

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_criterion,param_max_depth,param_min_samples_leaf,param_min_samples_split,params,split0_test_score,...,split3_test_score,split4_test_score,split5_test_score,split6_test_score,split7_test_score,split8_test_score,split9_test_score,mean_test_score,std_test_score,rank_test_score
0,0.02645,0.009947,0.0032,0.001287,gini,3,32,32,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
1,0.034,0.015202,0.004451,0.002495,gini,3,32,64,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
2,0.031701,0.00581,0.00345,0.001234,gini,3,32,128,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
3,0.023199,0.005662,0.0025,0.001414,gini,3,64,32,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
4,0.034799,0.011348,0.008451,0.011939,gini,3,64,64,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
5,0.030852,0.005017,0.003349,0.001381,gini,3,64,128,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
6,0.025252,0.006595,0.002848,0.001306,gini,3,128,32,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
7,0.032301,0.008624,0.0051,0.002487,gini,3,128,64,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
8,0.028649,0.003887,0.002651,0.001454,gini,3,128,128,"{'criterion': 'gini', 'max_depth': 3, 'min_sam...",0.761,...,0.79,0.786,0.804,0.789,0.777,0.785,0.799,0.7869,0.011113,1
9,0.042898,0.008769,0.00225,0.001309,gini,5,32,32,"{'criterion': 'gini', 'max_depth': 5, 'min_sam...",0.763,...,0.79,0.786,0.803,0.785,0.771,0.785,0.796,0.7854,0.010781,31


In [14]:
# É possível também obter diretamente o melhor estimador através do seguinte atributo:

gscv.best_estimator_