# 3.7 Hyperparameter Tuning

The choice of model parameters that is optimal for the specific problem can be found automatically by searching the model parameter space. The type of algorithm is fixed in this instance.

There are built-in toolkits to perform the hyperparameter search.

In [1]:
# basic tools
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import wget
import h5py
import pandas as pd
import os

In [2]:
import numpy as np
from sklearn.datasets import load_digits,fetch_openml
digits = load_digits()
digits.keys()

dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])

In [3]:
# explore data type
data,y = digits["data"].copy(),digits["target"].copy()
print(type(data[0][:]),type(y[0]))
# note that we do not modify the raw data that is stored on the digits dictionary.

<class 'numpy.ndarray'> <class 'numpy.int64'>


In [4]:
print(min(data[0]),max(data[0]))
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
scaler = MinMaxScaler()
scaler.fit_transform(data)# fit the model for data normalization
newdata = scaler.transform(data) # transform the data. watch that data was converted to a numpy array

# Split data into 50% train and 50% test subsets
print(f"There are {data.shape[0]} data samples")
X_train, X_test, y_train, y_test = train_test_split(
    data, y, test_size=0.2, shuffle=False)


0.0 15.0
There are 1797 data samples


In [5]:
import sklearn
from sklearn import metrics
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# Support Vector Machine classifier
clf = SVC(gamma=0.001) # model design
clf.fit(X_train, y_train) # learn
svc_prediction = clf.predict(X_test) # predict on test
print("SVC Accuracy:", metrics.accuracy_score(y_true=y_test ,y_pred=svc_prediction))


SVC Accuracy: 0.9583333333333334


what are the parameters we are trying to optimize?

In [6]:
clf.get_params()

{'C': 1.0,
 'break_ties': False,
 'cache_size': 200,
 'class_weight': None,
 'coef0': 0.0,
 'decision_function_shape': 'ovr',
 'degree': 3,
 'gamma': 0.001,
 'kernel': 'rbf',
 'max_iter': -1,
 'probability': False,
 'random_state': None,
 'shrinking': True,
 'tol': 0.001,
 'verbose': False}

A search consists of:

* an estimator (regressor or classifier such as ``SVC()``);

* a parameter space;

* a method for searching or sampling candidates (grid search or random selection);

* a cross-validation scheme; and

* a loss function or a scoring metrics.


## 1. Grid Search cross validation. 
Performs the search in the brute-force way using cross-validation. One has to define the parameter space. The scikit-learn function is ``GridSearchCV``. More details [here](!https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).



In [7]:
from sklearn.model_selection import GridSearchCV
param_grid = [
  {'C': [1, 5,10,50, 100,500, 1000], 'kernel': ['linear']},
  {'C': [1,5, 10,50, 100,500, 1000], 'gamma': [0.01,0.001, 0.0001], 'kernel': ['rbf']},
 ]

The algorithm will search for all combinations of parameters, which can be from the model algorithms or the choice of features.

In [8]:
search = GridSearchCV(clf, param_grid, cv=5,verbose=3)

In [9]:
search.fit(X_train, y_train) # learn

Fitting 5 folds for each of 28 candidates, totalling 140 fits
[CV] C=1, kernel=linear ..............................................
[CV] .................. C=1, kernel=linear, score=0.948, total=   0.0s
[CV] C=1, kernel=linear ..............................................
[CV] .................. C=1, kernel=linear, score=0.948, total=   0.0s
[CV] C=1, kernel=linear ..............................................
[CV] .................. C=1, kernel=linear, score=0.944, total=   0.0s
[CV] C=1, kernel=linear ..............................................
[CV] .................. C=1, kernel=linear, score=0.965, total=   0.0s
[CV] C=1, kernel=linear ..............................................
[CV] .................. C=1, kernel=linear, score=0.972, total=   0.0s
[CV] C=5, kernel=linear ..............................................
[CV] .................. C=5, kernel=linear, score=0.948, total=   0.0s
[CV] C=5, kernel=linear ..............................................
[CV] ..........

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s


[CV] .................. C=5, kernel=linear, score=0.965, total=   0.0s
[CV] C=5, kernel=linear ..............................................
[CV] .................. C=5, kernel=linear, score=0.972, total=   0.0s
[CV] C=10, kernel=linear .............................................
[CV] ................. C=10, kernel=linear, score=0.948, total=   0.0s
[CV] C=10, kernel=linear .............................................
[CV] ................. C=10, kernel=linear, score=0.948, total=   0.0s
[CV] C=10, kernel=linear .............................................
[CV] ................. C=10, kernel=linear, score=0.944, total=   0.0s
[CV] C=10, kernel=linear .............................................
[CV] ................. C=10, kernel=linear, score=0.965, total=   0.0s
[CV] C=10, kernel=linear .............................................
[CV] ................. C=10, kernel=linear, score=0.972, total=   0.0s
[CV] C=50, kernel=linear .............................................
[CV] .

[Parallel(n_jobs=1)]: Done 140 out of 140 | elapsed:   11.9s finished


GridSearchCV(cv=5, estimator=SVC(gamma=0.001),
             param_grid=[{'C': [1, 5, 10, 50, 100, 500, 1000],
                          'kernel': ['linear']},
                         {'C': [1, 5, 10, 50, 100, 500, 1000],
                          'gamma': [0.01, 0.001, 0.0001], 'kernel': ['rbf']}],
             verbose=3)

In [10]:
search.get_params()

{'cv': 5,
 'error_score': nan,
 'estimator__C': 1.0,
 'estimator__break_ties': False,
 'estimator__cache_size': 200,
 'estimator__class_weight': None,
 'estimator__coef0': 0.0,
 'estimator__decision_function_shape': 'ovr',
 'estimator__degree': 3,
 'estimator__gamma': 0.001,
 'estimator__kernel': 'rbf',
 'estimator__max_iter': -1,
 'estimator__probability': False,
 'estimator__random_state': None,
 'estimator__shrinking': True,
 'estimator__tol': 0.001,
 'estimator__verbose': False,
 'estimator': SVC(gamma=0.001),
 'iid': 'deprecated',
 'n_jobs': None,
 'param_grid': [{'C': [1, 5, 10, 50, 100, 500, 1000], 'kernel': ['linear']},
  {'C': [1, 5, 10, 50, 100, 500, 1000],
   'gamma': [0.01, 0.001, 0.0001],
   'kernel': ['rbf']}],
 'pre_dispatch': '2*n_jobs',
 'refit': True,
 'return_train_score': False,
 'scoring': None,
 'verbose': 3}

In [11]:
search.cv_results_

{'mean_fit_time': array([0.0187614 , 0.01823897, 0.01769457, 0.01726675, 0.01706581,
        0.01723228, 0.01746058, 0.17803249, 0.05716662, 0.04809985,
        0.18843017, 0.05442019, 0.02756419, 0.18685474, 0.05366359,
        0.02612495, 0.19918694, 0.06084275, 0.02507   , 0.19251866,
        0.05511222, 0.02502255, 0.17860823, 0.05154009, 0.02380347,
        0.1841126 , 0.05535727, 0.02449503]),
 'std_fit_time': array([0.0004974 , 0.00063497, 0.00075296, 0.00032911, 0.000368  ,
        0.00022739, 0.00025221, 0.00081537, 0.00235309, 0.00160452,
        0.00084407, 0.00057124, 0.00060262, 0.00078404, 0.00097039,
        0.00114053, 0.02107299, 0.00676385, 0.00082129, 0.00579743,
        0.00096243, 0.0005219 , 0.00275743, 0.00036523, 0.00138058,
        0.00266993, 0.00202393, 0.00078219]),
 'mean_score_time': array([0.00597839, 0.00570369, 0.00550599, 0.00540662, 0.00537195,
        0.00541439, 0.00548282, 0.02175317, 0.01266155, 0.0147789 ,
        0.02326522, 0.01185122, 0.009369

In [12]:
search.best_params_

{'C': 5, 'gamma': 0.001, 'kernel': 'rbf'}

## 2. Random Search Cross Validation.
It performs the search in the brute-force way using cross-validation. One has to define the parameter space. The scikit-learn function is ``GridSearchCV``. More details [here](!https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

The advantage is that it can be used for a wide hyperparameter space and limit to ``n_iter`` number of iterations.

In [13]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform,norm ,loguniform

 
distributions= [ {'C': uniform(loc=1, scale=1000), 'kernel': ['linear']},
  {'C': uniform(loc=1, scale=1000), 'gamma': loguniform(1e-4,1e-2), 'kernel': ['rbf']}]
clf2 = RandomizedSearchCV(clf, distributions, random_state=0,cv=5,n_iter=100)
clf2.fit(X_train,y_train)

RandomizedSearchCV(cv=5, estimator=SVC(gamma=0.001), n_iter=100,
                   param_distributions=[{'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x14cde2cd0>,
                                         'kernel': ['linear']},
                                        {'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x14cde05b0>,
                                         'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x14cde2460>,
                                         'kernel': ['rbf']}],
                   random_state=0)

In [14]:
print(search.best_params_)
print(clf2.best_params_)

{'C': 5, 'gamma': 0.001, 'kernel': 'rbf'}
{'C': 20.987665408758737, 'gamma': 0.0007645780792982153, 'kernel': 'rbf'}
