# Búsqueda Exhaustiva de hiperparámetros usando GridSearchCV --- 7:40 min

* 7:40 min | Ultima modificación: Octubre 1, 2021 | [YouTube](https://youtu.be/brXVJ6JkUOE)

En muchos casos, los modelos contienen diferentes hiperparámetros que controlan su configuración y la estimación de los parámetros. Por ejemplo, en el ejemplo del ajuste del polinomio, el grado $n$ es un hiperparámetro. En este tutorial, se presenta como abordar el problema cuando hay más de un hiperparámetro que debe ser ajustado.

Parametrización de la búsqueda
---

In [1]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

#
# Aca se usara una SVM. Dependiendo del tipo de kernel cambian los parámetros
# que pueden ajustarse.
#
# La variable tuned_parameters es una lista de diccionarios que contiene los
# valores que pueden ajustarse.
#
param_grid = [
    # -------------------------------------------------------------------------
    # Primera malla de parámetros
    {
        "kernel": ["rbf"],
        "gamma": [1e-3, 1e-4],
        "C": [1, 10, 100, 1000],
    },
    # -------------------------------------------------------------------------
    # Segunda malla de parámetros
    {
        "kernel": ["linear"],
        "C": [1, 10, 100, 1000],
    },
]

gridSearchCV = GridSearchCV(
    # --------------------------------------------------------------------------
    # This is assumed to implement the scikit-learn estimator interface.
    estimator=SVC(),
    # --------------------------------------------------------------------------
    # Dictionary with parameters names (str) as keys and lists of parameter
    # settings to try as values, or a list of such dictionaries
    param_grid=param_grid,
    # --------------------------------------------------------------------------
    # Determines the cross-validation splitting strategy.
    cv=5,
    # --------------------------------------------------------------------------
    # Strategy to evaluate the performance of the cross-validated model on the
    # test set.
    scoring="accuracy",
    # --------------------------------------------------------------------------
    # Refit an estimator using the best found parameters on the whole dataset.
    refit=True,
    # --------------------------------------------------------------------------
    # If False, the cv_results_ attribute will not include training scores.
    return_train_score=False,
)

Preparación de los datos
---

In [4]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()

n_samples = len(digits.images)
X = digits.images.reshape((n_samples, -1))
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.5,
    random_state=0,
)

Realización de la búsqueda
---

In [5]:
gridSearchCV.fit(X_train, y_train)

GridSearchCV(cv=5, error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid=[{'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001],
                          'kernel': ['rbf']},
                         {'C': [1, 10, 100, 1000], 'kernel': ['linear']}],
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='accuracy', verbose=0)

Valores retornados
---

In [6]:
gridSearchCV.cv_results_

{'mean_fit_time': array([0.0388783 , 0.03032894, 0.03458943, 0.01701069, 0.03480916,
        0.01673946, 0.03490353, 0.01642923, 0.01093946, 0.01087055,
        0.01134777, 0.01113782]),
 'std_fit_time': array([5.18257843e-03, 7.87956479e-04, 5.77014212e-04, 1.32795253e-04,
        8.07848619e-04, 2.81633387e-04, 1.22628768e-03, 2.74976878e-04,
        1.94645524e-04, 9.26784999e-05, 4.96413888e-04, 4.87010054e-04]),
 'mean_score_time': array([0.0077136 , 0.00810127, 0.00666142, 0.00498128, 0.0065568 ,
        0.0047956 , 0.00658164, 0.00471325, 0.00291343, 0.0032692 ,
        0.00292768, 0.00296078]),
 'std_score_time': array([9.15071955e-04, 3.16003456e-04, 2.52138576e-04, 5.48435289e-05,
        2.10770406e-04, 1.18460151e-04, 1.88526458e-04, 8.20457050e-05,
        6.54805939e-05, 7.32591775e-04, 7.49389608e-05, 3.94798707e-05]),
 'param_C': masked_array(data=[1, 1, 10, 10, 100, 100, 1000, 1000, 1, 10, 100, 1000],
              mask=[False, False, False, False, False, False, False,

In [7]:
#
# Estimator that was chosen by the search, i.e. estimator which gave highest
# score (or smallest loss if specified) on the left out data.
#
gridSearchCV.best_estimator_

SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [8]:
gridSearchCV.best_score_

0.9866480446927375

In [9]:
gridSearchCV.best_params_

{'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}

Pronóstico con el mejor modelo
---

In [10]:
gridSearchCV.predict(X_train)

array([1, 4, 9, 0, 4, 1, 1, 5, 9, 1, 4, 2, 6, 3, 9, 7, 6, 4, 8, 6, 8, 7,
       6, 0, 5, 9, 4, 7, 3, 4, 9, 4, 9, 7, 9, 1, 5, 6, 0, 0, 4, 3, 6, 1,
       0, 9, 4, 8, 7, 5, 9, 8, 4, 5, 0, 1, 6, 0, 5, 5, 0, 4, 3, 2, 8, 7,
       6, 3, 4, 2, 5, 8, 0, 6, 9, 4, 5, 4, 9, 7, 3, 3, 1, 4, 4, 2, 6, 8,
       1, 1, 0, 3, 7, 4, 6, 7, 4, 0, 5, 2, 9, 2, 1, 9, 2, 3, 1, 7, 7, 4,
       5, 6, 5, 6, 7, 8, 1, 4, 3, 4, 4, 3, 5, 3, 3, 4, 7, 9, 8, 0, 6, 1,
       9, 0, 8, 4, 1, 2, 3, 9, 7, 8, 8, 8, 3, 7, 5, 7, 0, 1, 7, 8, 3, 8,
       0, 4, 8, 6, 2, 3, 6, 7, 3, 7, 7, 1, 3, 5, 0, 9, 8, 5, 3, 1, 2, 0,
       3, 6, 0, 3, 4, 1, 2, 3, 1, 0, 5, 8, 9, 3, 9, 6, 6, 8, 9, 0, 7, 8,
       2, 0, 0, 7, 7, 4, 5, 3, 1, 8, 5, 9, 6, 2, 9, 7, 7, 9, 5, 4, 2, 6,
       6, 1, 3, 4, 7, 2, 8, 0, 6, 1, 6, 6, 5, 8, 4, 3, 0, 5, 2, 9, 9, 7,
       8, 0, 5, 0, 6, 3, 3, 5, 1, 5, 1, 7, 9, 6, 4, 5, 0, 1, 8, 7, 8, 8,
       8, 9, 8, 7, 7, 2, 2, 2, 8, 0, 7, 8, 6, 8, 0, 4, 2, 2, 3, 7, 9, 0,
       2, 0, 0, 2, 7, 1, 5, 6, 4, 0, 0, 5, 5, 3, 9,