# Hyperparameters Tuning

<!--<badge>--><a href="https://colab.research.google.com/github/mthd98/Machine-Learning-from-Zero-to-Hero-Bootcamp-v1/blob/main/Week 04 - Advance Machine Learning/3- Grid Search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a><!--</badge>-->

Hyperparameter tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned.

The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data. Cross-validation is often used to estimate this generalization performance.

Grid Search is a common technique to find the best combination of hyperparameters, it involves defning the possible hyperparamters values and then testing all possible combination (also called Exahustive Grid Search). Scikit Learn provides this functionality automatically using `GridSearchCV` class.

## Import Libraries & Load Data

In [1]:
from sklearn import svm, metrics, datasets, model_selection
import pandas as pd

In [2]:
x, y = datasets.load_breast_cancer(return_X_y=True)

## Define & Fit GridSearchCV

First we define the baseline model, we don't need to set any hyper parameters since these will be set by `GridSearchCV`

In [3]:
svc = svm.SVC()

Second, we'll define the possible hyperparamter values for the hyperparameters we intend to tune.

For this exmaple, we will be tuning the `kernel`, `C`, and `gamma` parameters, we'll define their possible values inside a dictionary.

The possible hyperparameter values are defined as a dictionary with parameters names (string) as keys and lists of parameter settings to try as values.



In [4]:
params = {'kernel': ['linear', 'rbf'], 'C': [0.1, 1, 10], 'gamma': ['scale', 'auto', 0.1, 1]}

By defalue, `GridSearchCV` uses the model's built-in `score` function to measure the model's performance (in `SVC` the default score function is the accuracy). Since we're working with a binary classification problem, we will need other metrics aside from accuracy like precision and recall.

In order to override that behavior, we'll define the metrics we want to use as a dictionary, where the keys are the name of the metrics (string) and the values are the scorer object. We can create a scorer object by wrapping a metrics'scoring function with `metrics.make_scorer()`.


In [5]:
scoring = {
    'accuracy': metrics.make_scorer(metrics.accuracy_score),
    'precision': metrics.make_scorer(metrics.precision_score), 
    'recall': metrics.make_scorer(metrics.recall_score)
    }

And finally, we can define a custom cross-validation strategy to override `GridSearchCV`s defalut KFold 5-split cross-validation.

In [6]:
kfold = model_selection.StratifiedShuffleSplit(5, random_state=42)

Now, we're all set to create the `GridSearchCV` object, we'll pass the custom paramters we created above to the `GridSearchCV` constructor.

Note that we also used `refit`, which specifies which of the passed metrics will be used as the main metric to decide on the best hyperparameters. Also `verbose` is set to 2 so that the model prints the logs during training in order to get feedback since this process can take a few minutes.

In [7]:
gs = model_selection.GridSearchCV(svc, params, cv=kfold, scoring=scoring, refit="recall", verbose=2)

Now let's fit the gs object on the dataset, note that we didn't do train_test_split since `GridSearchCV` will do cross-validation automatically.

In [8]:
gs.fit(x, y)

Fitting 5 folds for each of 24 candidates, totalling 120 fits
[CV] C=0.1, gamma=scale, kernel=linear ...............................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[CV] ................ C=0.1, gamma=scale, kernel=linear, total=   0.6s
[CV] C=0.1, gamma=scale, kernel=linear ...............................
[CV] ................ C=0.1, gamma=scale, kernel=linear, total=   0.2s
[CV] C=0.1, gamma=scale, kernel=linear ...............................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.6s remaining:    0.0s


[CV] ................ C=0.1, gamma=scale, kernel=linear, total=   0.1s
[CV] C=0.1, gamma=scale, kernel=linear ...............................
[CV] ................ C=0.1, gamma=scale, kernel=linear, total=   0.3s
[CV] C=0.1, gamma=scale, kernel=linear ...............................
[CV] ................ C=0.1, gamma=scale, kernel=linear, total=   0.1s
[CV] C=0.1, gamma=scale, kernel=rbf ..................................
[CV] ................... C=0.1, gamma=scale, kernel=rbf, total=   0.0s
[CV] C=0.1, gamma=scale, kernel=rbf ..................................
[CV] ................... C=0.1, gamma=scale, kernel=rbf, total=   0.0s
[CV] C=0.1, gamma=scale, kernel=rbf ..................................
[CV] ................... C=0.1, gamma=scale, kernel=rbf, total=   0.0s
[CV] C=0.1, gamma=scale, kernel=rbf ..................................
[CV] ................... C=0.1, gamma=scale, kernel=rbf, total=   0.0s
[CV] C=0.1, gamma=scale, kernel=rbf ..................................
[CV] .

[Parallel(n_jobs=1)]: Done 120 out of 120 | elapsed:  2.0min finished


GridSearchCV(cv=StratifiedShuffleSplit(n_splits=5, random_state=42, test_size=None,
            train_size=None),
             error_score=nan,
             estimator=SVC(C=1.0, break_ties=False, cache_size=200,
                           class_weight=None, coef0=0.0,
                           decision_function_shape='ovr', degree=3,
                           gamma='scale', kernel='rbf', max_iter=-1,
                           probability=False, random_state=None, shrinking=True,
                           tol=0.001, verbose=False),
             iid='deprecated', n_jobs=None,
             param_grid={'C': [0.1, 1, 10], 'gamma': ['scale', 'auto', 0.1, 1],
                         'kernel': ['linear', 'rbf']},
             pre_dispatch='2*n_jobs', refit='recall', return_train_score=False,
             scoring={'accuracy': make_scorer(accuracy_score),
                      'precision': make_scorer(precision_score),
                      'recall': make_scorer(recall_score)},
            

## View GridSearchCV Output

We can display the best set of parameters that `GridSearchCV` found using `best_params_` field, and the score that the model achieved using these hyperparameters (the score here is recall as defined in the `refit` parameter).

In [9]:
print("Best Hyperparameters:", gs.best_params_)
print("Best Score (Recall): ", gs.best_score_)

Best Hyperparameters: {'C': 0.1, 'gamma': 'auto', 'kernel': 'rbf'}
Best Score (Recall):  1.0


We can also get the best model it found using `best_estimator_` field.

In [10]:
best_svc = gs.best_estimator_
best_svc

SVC(C=0.1, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

We can also display the results of all the experiments that `GridSearchCV` did using `cv_results_` field, this can help us "zoom in" on the range of the hyperparameters that preformed the best so we can do more fine-grained grid search

We will convert `cv_results_` to a `DataFrame` for easier viewing.

In [11]:
df = pd.DataFrame(gs.cv_results_)
df.sort_values(['mean_test_precision'])

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_gamma,param_kernel,params,split0_test_accuracy,split1_test_accuracy,split2_test_accuracy,split3_test_accuracy,split4_test_accuracy,mean_test_accuracy,std_test_accuracy,rank_test_accuracy,split0_test_precision,split1_test_precision,split2_test_precision,split3_test_precision,split4_test_precision,mean_test_precision,std_test_precision,rank_test_precision,split0_test_recall,split1_test_recall,split2_test_recall,split3_test_recall,split4_test_recall,mean_test_recall,std_test_recall,rank_test_recall
11,0.021519,0.000341,0.003478,0.000121,1.0,auto,rbf,"{'C': 1, 'gamma': 'auto', 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
21,0.020412,0.001967,0.003257,5.5e-05,10.0,0.1,rbf,"{'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
19,0.021502,0.000162,0.003395,5.2e-05,10.0,auto,rbf,"{'C': 10, 'gamma': 'auto', 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
15,0.019649,0.000325,0.0033,9.1e-05,1.0,1,rbf,"{'C': 1, 'gamma': 1, 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
13,0.020771,0.000164,0.003386,8.1e-05,1.0,0.1,rbf,"{'C': 1, 'gamma': 0.1, 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
7,0.019227,0.000822,0.00326,8.5e-05,0.1,1,rbf,"{'C': 0.1, 'gamma': 1, 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
23,0.018996,6.4e-05,0.003242,4.1e-05,10.0,1,rbf,"{'C': 10, 'gamma': 1, 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
5,0.02009,0.000852,0.003308,9.4e-05,0.1,0.1,rbf,"{'C': 0.1, 'gamma': 0.1, 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
3,0.020399,0.00066,0.003765,0.000768,0.1,auto,rbf,"{'C': 0.1, 'gamma': 'auto', 'kernel': 'rbf'}",0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,0.631579,0.631579,0.631579,0.631579,0.631579,0.631579,0.0,16,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1
1,0.010098,0.001212,0.003066,0.000671,0.1,scale,rbf,"{'C': 0.1, 'gamma': 'scale', 'kernel': 'rbf'}",0.912281,0.894737,0.877193,0.894737,0.877193,0.891228,0.013129,15,0.897436,0.857143,0.837209,0.857143,0.853659,0.860518,0.01988,15,0.972222,1.0,1.0,1.0,0.972222,0.988889,0.013608,10
