# **Grid Search Algorithm**

The grid search algorithm for hyperparameter tuning works by training a model on predetermined lists of hyperparameter values. This method tries every hyperparameter value on the list, and then uses the one that makes the model perform best.


Suppose we had two hyperparameters we wanted to tune and we wanted to choose between 6 values for the first one and 5 values of the second, we’d be searching a grid of thirty values as shown below. Grid search would fit the model and evaluate its performance for each of the values represented by these points.

![image](images/grid_search.png)

## **sklearn.model_selection.GridSearchCV**

The two most important parameters in `GridSearchCV` that need to be specified are: the name of the model that we are testing and the name of a dictionary of hyperparameters that we would initialize, represented by the argument `parameters`. To tune the hyperparameters, we can use `fit()`, just as we would for a regular machine learning model.

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split

# Load the data set
cancer = load_breast_cancer()

# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target)

lr = LogisticRegression(solver="liblinear", max_iter=1000)

lr.get_params()

{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 1000,
 'multi_class': 'deprecated',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'liblinear',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

In [None]:
parameters = {"penalty": ["l1", "l2"], "C": [1, 10, 100]}
clf = GridSearchCV(lr, parameters)
clf.get_params()

{'cv': None,
 'error_score': nan,
 'estimator__C': 1.0,
 'estimator__class_weight': None,
 'estimator__dual': False,
 'estimator__fit_intercept': True,
 'estimator__intercept_scaling': 1,
 'estimator__l1_ratio': None,
 'estimator__max_iter': 1000,
 'estimator__multi_class': 'deprecated',
 'estimator__n_jobs': None,
 'estimator__penalty': 'l2',
 'estimator__random_state': None,
 'estimator__solver': 'liblinear',
 'estimator__tol': 0.0001,
 'estimator__verbose': 0,
 'estimator__warm_start': False,
 'estimator': LogisticRegression(max_iter=1000, solver='liblinear'),
 'n_jobs': None,
 'param_grid': {'penalty': ['l1', 'l2'], 'C': [1, 10, 100]},
 'pre_dispatch': '2*n_jobs',
 'refit': True,
 'return_train_score': False,
 'scoring': None,
 'verbose': 0}

## **Cross-validation**

The “CV” in `GridSearchCV` is an acronym for cross-validation. It’s best practice in machine learning to go beyond the usual train-test split and have a holdout or validation dataset. Specifically, GridSearhCV uses a technique known as k-fold cross-validation. This works as follows.

GridSearchCV subdivides the training data further into another training and test data set. It fits the model on this new training data and evaluates the model on the new test data. But to make sure that we don’t accidentally have good performance in only one part of our dataset, GridSearchCV will do this process multiple times on different cross-validation splits so that every point in the data gets to be tested on at least once! The number of times this split happens is the “k” in “k-fold”. For instance, in a 10-fold cross-validation, our data would be split into a 90:10 train-test split 10 times and GridSearchCV would evaluate the model on each fold.

In scikit-learn, `cv` argument in GridSearchCV allows us to decide on the number of cross-validation splits we’d like. The default setting for this is 5.

After fitting a GridSearchCV model we can find out the results using the following attributes of the `clf` argument:

- `.best_estimator_` gives us the best estimator
- `.best_score_` gives us the mean cross-validated score corresponding to the best estimator
- `.best_params_` gives us the set of hyperparameters that correspond to the best estimator

Additionally, the `.cv_results_` attribute gives us the scores for each hyperparamter combination in the grid. 

In [None]:
clf.fit(X_train, y_train)
best_model = clf.best_estimator_
print(best_model)
print(clf.best_params_)

LogisticRegression(C=1, max_iter=1000, penalty='l1', solver='liblinear')
{'C': 1, 'penalty': 'l1'}


In [None]:
best_score = clf.best_score_
test_score = clf.score(X_test, y_test)
print(best_score)
print(test_score)

0.9530779753761971
0.9790209790209791


In [None]:
import pandas as pd

hyperparameter_grid = pd.DataFrame(clf.cv_results_["params"])
grid_scores = pd.DataFrame(clf.cv_results_["mean_test_score"], columns=["score"])

df = pd.concat([hyperparameter_grid, grid_scores], axis=1)
print(df)

     C penalty     score
0    1      l1  0.953078
1    1      l2  0.943748
2   10      l1  0.948427
3   10      l2  0.946074
4  100      l1  0.950807
5  100      l2  0.948427


# **Extra Reading**

https://medium.com/data-science/its-a-mistake-to-trust-the-best-model-of-a-gridsearchcv-536a73e835ad

https://towardsdatascience.com/a-highly-anticipated-time-series-cross-validator-is-finally-here-7dc99f672736/?source=post_page-----536a73e835ad---------------------------------------

https://medium.com/@abhishekjainindore24/optuna-vs-gridsearchcv-vs-randomsearchcv-hyperparameter-tuning-techniques-ea8e2ada28d0
