# 4.1.2 Búsqueda Exhaustiva de hiperparámetros usando GridSearchCV

En muchos casos, los modelos contienen diferentes hiperparámetros que controlan su configuración y la estimación de los parámetros. Por ejemplo, en el ejemplo del ajuste del polinomio, el grado $n$ es un hiperparámetro. En este tutorial, se presenta como abordar el problema cuando hay más de un hiperparámetro que debe ser ajustado.

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

import warnings
warnings.filterwarnings("ignore")

## 4.1.2.1 Parametrización de la búsqueda

In [4]:
#
# Aca se usara una SVM. Dependiendo del tipo de kernel cambian los parámetros
# que pueden ajustarse.
#
# La variable tuned_parameters es una lista de diccionarios que contiene los
# valores que pueden ajustarse.
#
param_grid = [
    # -------------------------------------------------------------------------
    # Primera malla de parámetros
    {
        "kernel": ["rbf"],
        "gamma": [1e-3, 1e-4],
        "C": [1, 10, 100, 1000],
    },
    # -------------------------------------------------------------------------
    # Segunda malla de parámetros
    {
        "kernel": ["linear"],
        "C": [1, 10, 100, 1000],
    },
]

gridSearchCV = GridSearchCV(
    # --------------------------------------------------------------------------
    # This is assumed to implement the scikit-learn estimator interface.
    estimator=SVC(),
    # --------------------------------------------------------------------------
    # Dictionary with parameters names (str) as keys and lists of parameter
    # settings to try as values, or a list of such dictionaries
    param_grid=param_grid,
    # --------------------------------------------------------------------------
    # Determines the cross-validation splitting strategy.
    cv=5,
    # --------------------------------------------------------------------------
    # Strategy to evaluate the performance of the cross-validated model on the
    # test set.
    scoring="accuracy",
    # --------------------------------------------------------------------------
    # Refit an estimator using the best found parameters on the whole dataset.
    refit=True,
    # --------------------------------------------------------------------------
    # If False, the cv_results_ attribute will not include training scores.
    return_train_score=False,
)

## 4.1.2.2 Principales métricas disponibles para el scoring

link: https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

- Clasificacion
    - accuracy
    - balanced_accuracy

- Regresion
    - neg_mean_absolute_error
    - neg_mean_squared_error
    - neg_root_mean_squared_error
    - r2

## 4.1.2.3 Preparación de los datos  

In [6]:
digits = load_digits()

n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.5,
    random_state=0,
)

## 4.1.2.4 Realización de la búsqueda

In [7]:
gridSearchCV.fit(X_train, y_train)

## 4.1.2.5 Valores retornados

In [8]:
gridSearchCV.cv_results_

{'mean_fit_time': array([0.11407399, 0.04809709, 0.06134315, 0.03845797, 0.06358795,
        0.02056479, 0.03514423, 0.02034054, 0.01357017, 0.01456418,
        0.01739545, 0.01544566]),
 'std_fit_time': array([0.04217557, 0.00798209, 0.01319033, 0.00708322, 0.01461183,
        0.00168396, 0.00155122, 0.00332009, 0.000879  , 0.0009707 ,
        0.00427647, 0.0008631 ]),
 'mean_score_time': array([0.03173003, 0.02116961, 0.01861963, 0.01933002, 0.01600604,
        0.0078196 , 0.0101759 , 0.00839415, 0.0028439 , 0.00333147,
        0.00385866, 0.00381432]),
 'std_score_time': array([1.29811178e-02, 2.86232116e-03, 3.77856693e-03, 7.54697014e-03,
        3.19972056e-03, 1.36740465e-03, 1.27243768e-03, 2.30450612e-03,
        8.81792203e-05, 7.43548732e-04, 8.48042259e-04, 6.87104436e-04]),
 'param_C': masked_array(data=[1, 1, 10, 10, 100, 100, 1000, 1000, 1, 10, 100, 1000],
              mask=[False, False, False, False, False, False, False, False,
                    False, False, False,

In [9]:
#
# Estimator that was chosen by the search, i.e. estimator which gave highest
# score (or smallest loss if specified) on the left out data.
#
gridSearchCV.best_estimator_

In [10]:
gridSearchCV.best_score_

0.9866480446927375

In [11]:
gridSearchCV.best_params_

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

## 4.1.2.6 Pronóstico con el mejor modelo

In [12]:
gridSearchCV.predict(X_train)

array([1, 4, 9, 0, 4, 1, 1, 5, 9, 1, 4, 2, 6, 3, 9, 7, 6, 4, 8, 6, 8, 7,
       6, 0, 5, 9, 4, 7, 3, 4, 9, 4, 9, 7, 9, 1, 5, 6, 0, 0, 4, 3, 6, 1,
       0, 9, 4, 8, 7, 5, 9, 8, 4, 5, 0, 1, 6, 0, 5, 5, 0, 4, 3, 2, 8, 7,
       6, 3, 4, 2, 5, 8, 0, 6, 9, 4, 5, 4, 9, 7, 3, 3, 1, 4, 4, 2, 6, 8,
       1, 1, 0, 3, 7, 4, 6, 7, 4, 0, 5, 2, 9, 2, 1, 9, 2, 3, 1, 7, 7, 4,
       5, 6, 5, 6, 7, 8, 1, 4, 3, 4, 4, 3, 5, 3, 3, 4, 7, 9, 8, 0, 6, 1,
       9, 0, 8, 4, 1, 2, 3, 9, 7, 8, 8, 8, 3, 7, 5, 7, 0, 1, 7, 8, 3, 8,
       0, 4, 8, 6, 2, 3, 6, 7, 3, 7, 7, 1, 3, 5, 0, 9, 8, 5, 3, 1, 2, 0,
       3, 6, 0, 3, 4, 1, 2, 3, 1, 0, 5, 8, 9, 3, 9, 6, 6, 8, 9, 0, 7, 8,
       2, 0, 0, 7, 7, 4, 5, 3, 1, 8, 5, 9, 6, 2, 9, 7, 7, 9, 5, 4, 2, 6,
       6, 1, 3, 4, 7, 2, 8, 0, 6, 1, 6, 6, 5, 8, 4, 3, 0, 5, 2, 9, 9, 7,
       8, 0, 5, 0, 6, 3, 3, 5, 1, 5, 1, 7, 9, 6, 4, 5, 0, 1, 8, 7, 8, 8,
       8, 9, 8, 7, 7, 2, 2, 2, 8, 0, 7, 8, 6, 8, 0, 4, 2, 2, 3, 7, 9, 0,
       2, 0, 0, 2, 7, 1, 5, 6, 4, 0, 0, 5, 5, 3, 9,

In [5]:
print('ok_')

ok_
