# Hyperparameters search

In past notebooks, we pointing out that some models' parameters had an impact on the statistical performance of the models. Usually, we would like to optimize these parameters such that the model trained is as good as possible. This optimization is called hyperparameters tuning.

In this notebook, we will show a couple of method allowing to tune models' hyperparameters.

## Introductory example

We will take an example that we showed in the linear model where we discussed the impact of the $\alpha$ parameter on a `Ridge` model. Indeed, we mentioned that this parameter allows to regularize more or less the model. However, there is no general rule specifying what is a good $\alpha$ value. Indeed, it would depend of the dataset.

Let's load a dataset to tackle a regression problem.

In [1]:
from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [2]:
y.head()

0    4.526
1    3.585
2    3.521
3    3.413
4    3.422
Name: MedHouseVal, dtype: float64

Now, we will define a `Ridge` model where we will process the the data with add some interaction between features using a `PolynomialFeatures` transformer.

In [4]:
import sklearn

sklearn.set_config(display="diagram")

In [6]:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    PolynomialFeatures(),
    StandardScaler(),
    Ridge(),
)
model

However, we did not change any of the default parameters given by scikit-learn. Let's evaluate this vanilla model.

In [7]:
import pandas as pd
from sklearn.model_selection import cross_validate

cv_results = cross_validate(model, X, y)
cv_results = pd.DataFrame(cv_results)

In [8]:
cv_results

Unnamed: 0,fit_time,score_time,test_score
0,0.03358,0.002328,0.46765
1,0.014659,0.002118,0.552113
2,0.016746,0.002526,0.579568
3,0.013096,0.001769,0.500778
4,0.011725,0.001566,-4.211175


In [9]:
cv_results.aggregate(["mean", "std"])

Unnamed: 0,fit_time,score_time,test_score
mean,0.017961,0.002061,-0.422213
std,0.008929,0.000394,2.118542


At this stage, there is nothing to tell use that our pipeline is the best pipeline that we could get. Indeed, we could imagine that the degree of the `PolynomialFeatures` could be higher or that the `Ridge` regressor should be more regularized. Let's check which parameters we could tune with the model:

In [10]:
for params in model.get_params():
    print(params)

memory
steps
verbose
polynomialfeatures
standardscaler
ridge
polynomialfeatures__degree
polynomialfeatures__include_bias
polynomialfeatures__interaction_only
polynomialfeatures__order
standardscaler__copy
standardscaler__with_mean
standardscaler__with_std
ridge__alpha
ridge__copy_X
ridge__fit_intercept
ridge__max_iter
ridge__normalize
ridge__positive
ridge__random_state
ridge__solver
ridge__tol


Two important parameters of this model are `polynomialfeatures__degree` and `ridge__alpha`. We will try to find the optimal values of these parameters for the current dataset.

## Manual hyperparameters search

Before to show the automated tools allowing to make hyperparameters tuning in scikit-learn, we will manually make our own manual simplified version.

<div class="alert alert-success">
    <b>EXERCISE</b>:
    <ul>
        <li>Split the dataset into a training and testing set.</li>
        <li>Make a nested <tt>for</tt> loop to try all the possible parameters combination that we defined in <tt>parameter_grid</tt>.</li>
        <li>In the internal loop, use a cross-validation (using <tt>cross_val_score</tt>) on the training set to get a distribution of score.</li>
        <li>Compute the mean and standard deviation of the cross-validation score and pick-up the best set of hyperparameters.</li>
        <li>Retrain a model with the combination of the best hyperparameters and evalute it on the testing set.</li>
    </ul>
</div>

In [12]:
import numpy as np

parameter_grid = {
    "polynomialfeatures__degree": np.arange(2, 5),
    "ridge__alpha": np.logspace(1, 3, num=5),
}

In [11]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=0
)

In [13]:
from collections import defaultdict
from sklearn.model_selection import cross_val_score

search_results = defaultdict(list)
for degree in parameter_grid["polynomialfeatures__degree"]:
    for alpha in parameter_grid["ridge__alpha"]:
        search_results["polynomialfeatures__degree"].append(degree)
        search_results["ridge__alpha"].append(alpha)
        model.set_params(
            polynomialfeatures__degree=degree,
            ridge__alpha=alpha,
        )
        search_results["score"].append(cross_val_score(model, X_train, y_train))
search_results = pd.DataFrame(search_results)

In [17]:
search_results["mean_score"] = search_results["score"].apply(lambda x: x.mean())
search_results["std_score"] = search_results["score"].apply(lambda x: x.std())

In [29]:
search_results = search_results.sort_values(by="mean_score", ascending=False)
search_results

Unnamed: 0,polynomialfeatures__degree,ridge__alpha,score,mean_score,std_score
3,2,316.227766,"[0.6011787650057057, 0.6291786154038115, 0.562...",0.615069,0.030193
9,3,1000.0,"[0.6166992944746666, 0.6127822330945476, 0.536...",0.614862,0.043321
2,2,100.0,"[0.6397033426134056, 0.6045685742196174, 0.494...",0.61171,0.062322
4,2,1000.0,"[0.576208257127071, 0.6156474629642577, 0.5922...",0.5979,0.017012
1,2,31.622777,"[0.6576675638070638, 0.5658734564523009, 0.422...",0.592392,0.092297
8,3,316.227766,"[0.6655985801925857, 0.5874011840505577, 0.475...",0.589077,0.073398
0,2,10.0,"[0.6558061904686903, 0.5285695073484505, 0.362...",0.573121,0.116659
14,4,1000.0,"[0.6646236338304413, 0.5526846349518668, 0.467...",0.538435,0.126204
7,3,100.0,"[0.5880089780415716, 0.5642941354891886, 0.418...",0.529637,0.10645
6,3,31.622777,"[0.3205562616474341, 0.5150245320497143, 0.434...",0.446145,0.143737


In [36]:
best_model = model.set_params(
    polynomialfeatures__degree=search_results["polynomialfeatures__degree"].iloc[0],
    ridge__alpha=search_results["ridge__alpha"].iloc[0],
)

In [37]:
cv_results = cross_validate(best_model, X, y)
cv_results = pd.DataFrame(cv_results)
cv_results

Unnamed: 0,fit_time,score_time,test_score
0,0.034005,0.002753,0.521021
1,0.01739,0.002313,0.493117
2,0.015289,0.002213,0.574714
3,0.012836,0.001619,0.555567
4,0.011655,0.001979,-3.220341


## Hyperparameters search using a grid

The search that we performed manually is indeed known as a grid-search: we try every possible combination of the parameter that we first provided. Scikit-learn provides a specific estimator that will make the processing that we did previously: during `fit`, it will perform a cross-validation and pick the optimal hyperparameters using cross-validation.

In [38]:
from sklearn.model_selection import GridSearchCV

search_cv = GridSearchCV(model, param_grid=parameter_grid)
search_cv.fit(X_train, y_train)

We can get the best found parameters by looking at the fitted attributes `best_params_`:

In [39]:
search_cv.best_params_

{'polynomialfeatures__degree': 2, 'ridge__alpha': 316.22776601683796}

We can even get more information regarding the different combinations of hyperparameters tried during `fit` by looking at the fitted attribute `cv_results_`:

In [40]:
cv_results = pd.DataFrame(search_cv.cv_results_)
cv_results.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_polynomialfeatures__degree,param_ridge__alpha,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.014342,0.005106,0.0021,0.000546,2,10.0,"{'polynomialfeatures__degree': 2, 'ridge__alph...",0.655806,0.52857,0.362651,0.653063,0.665514,0.573121,0.116659,7
1,0.008773,0.000211,0.001417,0.000176,2,31.622777,"{'polynomialfeatures__degree': 2, 'ridge__alph...",0.657668,0.565873,0.422209,0.654168,0.66204,0.592392,0.092297,5
2,0.009396,0.000643,0.001411,0.000128,2,100.0,"{'polynomialfeatures__degree': 2, 'ridge__alph...",0.639703,0.604569,0.494025,0.665001,0.655253,0.61171,0.062322,3
3,0.008757,0.000326,0.001561,0.000361,2,316.227766,"{'polynomialfeatures__degree': 2, 'ridge__alph...",0.601179,0.629179,0.562315,0.640136,0.642539,0.615069,0.030193,1
4,0.008848,0.000208,0.001317,4.4e-05,2,1000.0,"{'polynomialfeatures__degree': 2, 'ridge__alph...",0.576208,0.615647,0.592287,0.585586,0.619774,0.5979,0.017012,4


In addition, at the end of the `fit` procedure, if the parameter `refit` is set to `True` (default), a model with the best combination will be trained (as we did in the manual hyperparameters search). We can check this model by looking at the fitted attribute `best_estimator_`

In [41]:
search_cv.best_estimator_

Indeed, this `best_estimator_` will be used when calling `predict` and `score` from the `GridSearchCV` instance.

In [43]:
search_cv.score(X_test, y_test)

-1.222417346053133

<div class="alert alert-success">
    <b>EXERCISE</b>:
    <br>
    Since a <tt>GridSearchCV</tt> behave like any classifier or regressor, it can be used and evaluated by cross-validation. Use <tt>cross_validate</tt> to evaluate the previous grid-search model that we created.
</div>

In [90]:
cv_results = cross_validate(search_cv, X, y, return_estimator=True)
cv_results = pd.DataFrame(cv_results)

In [91]:
cv_results

Unnamed: 0,fit_time,score_time,estimator,test_score
0,5.356896,0.00435,GridSearchCV(estimator=Pipeline(steps=[('polyn...,0.258175
1,4.30081,0.001634,GridSearchCV(estimator=Pipeline(steps=[('polyn...,0.475082
2,4.452103,0.00244,GridSearchCV(estimator=Pipeline(steps=[('polyn...,0.561609
3,4.414243,0.001665,GridSearchCV(estimator=Pipeline(steps=[('polyn...,0.526413
4,4.488308,0.003557,GridSearchCV(estimator=Pipeline(steps=[('polyn...,-17.880146


In [92]:
cv_results["estimator"]

0    GridSearchCV(estimator=Pipeline(steps=[('polyn...
1    GridSearchCV(estimator=Pipeline(steps=[('polyn...
2    GridSearchCV(estimator=Pipeline(steps=[('polyn...
3    GridSearchCV(estimator=Pipeline(steps=[('polyn...
4    GridSearchCV(estimator=Pipeline(steps=[('polyn...
Name: estimator, dtype: object

In [94]:
for est in cv_results["estimator"]:
    print(est.best_params_)

{'polynomialfeatures__degree': 2, 'ridge__alpha': 31.622776601683793}
{'polynomialfeatures__degree': 2, 'ridge__alpha': 1000.0}
{'polynomialfeatures__degree': 2, 'ridge__alpha': 1000.0}
{'polynomialfeatures__degree': 2, 'ridge__alpha': 1000.0}
{'polynomialfeatures__degree': 3, 'ridge__alpha': 100.0}


<div class="alert alert-success">
    <b>QUESTION</b>:
    <br>
    Which limitation the grid-search approach suffer from?
</div>

## Randomized hyperparameters search

In the previous strategy, the grid-search has two limitations:

- it only explores combination of parameters defined in the grid;
- adding new parameters and values to explore will increase rapidly the cost of the search.

`RandomizedSearchCV` allows to specify a distribution from which to draw parameter values. It allows to explore the hyperparamters space on a non-grid fashion and as a user, you can give a butget of the number of combination you want to try.

In [53]:
from scipy.stats import loguniform

parameter_distributions = {
    "polynomialfeatures__degree": np.arange(1, 5),
    "ridge__alpha": loguniform(1, 3),
}

In [54]:
from sklearn.model_selection import RandomizedSearchCV

search_cv = RandomizedSearchCV(
    model, param_distributions=parameter_distributions, n_iter=10,
)

In [55]:
cv_results = cross_validate(search_cv, X, y, return_estimator=True)
cv_results = pd.DataFrame(cv_results)

In [56]:
cv_results

Unnamed: 0,fit_time,score_time,estimator,test_score
0,1.412861,0.000635,RandomizedSearchCV(estimator=Pipeline(steps=[(...,0.548872
1,3.99026,0.000627,RandomizedSearchCV(estimator=Pipeline(steps=[(...,0.468163
2,1.655859,0.000616,RandomizedSearchCV(estimator=Pipeline(steps=[(...,0.550803
3,1.952612,0.000857,RandomizedSearchCV(estimator=Pipeline(steps=[(...,0.536801
4,2.33049,0.000654,RandomizedSearchCV(estimator=Pipeline(steps=[(...,0.660527


In [57]:
for est in cv_results["estimator"]:
    print(est.best_params_)

{'polynomialfeatures__degree': 1, 'ridge__alpha': 2.428365753116008}
{'polynomialfeatures__degree': 1, 'ridge__alpha': 1.7372677706132327}
{'polynomialfeatures__degree': 1, 'ridge__alpha': 1.0581272217613393}
{'polynomialfeatures__degree': 1, 'ridge__alpha': 2.727042126862619}
{'polynomialfeatures__degree': 1, 'ridge__alpha': 2.7965527670658323}


## Model with internal hyperparameter tuning

Some classifiers or regressors come with the some efficient hyperparameter selection, at least more efficient than a grid-search. Usually, the name of the classsifiers or regressors finish with `CV` (e.g. `RidgeCV`).

<div class="alert alert-success">
    <b>EXERCISE</b>:
    <br>
    <ul>
        <li>Create a pipeline made of a <tt>PolynomialFeatures</tt>, a <tt>StandardScaler</tt>, and a <tt>Ridge</tt>.</li>
        <li>Create a grid-search by passing the previous pipeline and tune the parameter <tt>alpha</tt> such that you will try the values <tt>np.logspace(-2, 2, num=50)</tt>.</li>
        <li>Fit the grid-search on the training set and check the time it takes.</li>
        <li>Repeat the experiment by replacing the <tt>Ridge</tt> regressor by a <tt>RidgeCV</tt> regressor and removing the <tt>GridSearchCV</tt>.</li>
    </ul>
    Which approach is more efficient in terms of computational performance.
</div>

In [97]:
from sklearn.linear_model import Ridge

alphas = np.logspace(-2, 2, num=50)

model = GridSearchCV(
    make_pipeline(
        PolynomialFeatures(),
        StandardScaler(),
        Ridge(),
    ),
    param_grid={
        "ridge__alpha": alphas
    },
    scoring="neg_mean_squared_error",
)
model

In [98]:
%%time
model.fit(X_train, y_train)

CPU times: user 11.3 s, sys: 7.54 s, total: 18.8 s
Wall time: 2.77 s


In [107]:
from sklearn.linear_model import RidgeCV

model = make_pipeline(
    PolynomialFeatures(),
    StandardScaler(),
    RidgeCV(alphas=alphas, store_cv_values=True),
)

In [108]:
%%time
model.fit(X_train, y_train)

CPU times: user 462 ms, sys: 306 ms, total: 768 ms
Wall time: 161 ms


## Inspection of hyperparameters in cross-validation

Sometimes, we perform a search cross-validation inside a cross-validation evaluation. In this case, we potentially have different set of hyperparameter values for each individual cross-validation split. We can indeed inspect these values. Let's define a `GridSearchCV` model.

In [114]:
from sklearn.linear_model import Ridge

alphas = np.logspace(-2, 2, num=50)

model = GridSearchCV(
    make_pipeline(
        PolynomialFeatures(),
        StandardScaler(),
        Ridge(),
    ),
    param_grid={
        "ridge__alpha": alphas
    },
    scoring="neg_mean_squared_error",
)
model

Then, we can run a cross-validation by passing the model to `cross_validate`. In addition, we can store every model train on each cross-validation splits by setting `return_estimator` to `True`.

In [115]:
cv_results = cross_validate(model, X, y, cv=3, return_estimator=True)
cv_results = pd.DataFrame(cv_results)
cv_results

Unnamed: 0,fit_time,score_time,estimator,test_score
0,2.479363,0.002521,GridSearchCV(estimator=Pipeline(steps=[('polyn...,-0.693117
1,2.588212,0.002671,GridSearchCV(estimator=Pipeline(steps=[('polyn...,-0.493655
2,2.749654,0.003345,GridSearchCV(estimator=Pipeline(steps=[('polyn...,-9.096473


We see that the `estimator` columns contain the different estimators. Thus we can check the `best_params_` stored by the `GridSearchCV`.

In [116]:
for estimator_cv_fold in cv_results["estimator"]:
    print(estimator_cv_fold.best_params_)

{'ridge__alpha': 22.229964825261934}
{'ridge__alpha': 0.20235896477251566}
{'ridge__alpha': 1.0985411419875584}


Such inspection allows to study the stability of the hyperparameter values.