### Grid Search Parameter Tuning
Grid search is an approach to parameter tuning that will methodically build and evaluate a model for each combination of algorithm parameters specified in a grid.

The recipe below evaluates different alpha values for the Ridge Regression algorithm on the standard diabetes dataset. This is a one-dimensional grid search.

Grid Search can be thought of as an exhaustive search for selecting a model. In Grid Search, the data scientist sets up a grid of hyperparameter values and for each combination, trains a model and scores on the testing data. In this approach, every combination of hyperparameter values is tried which can be very inefficient. For example, searching 20 different parameter values for each of 4 parameters will require 160,000 trials of cross-validation. This equates to 1,600,000 model fits and 1,600,000 predictions if 10-fold cross validation is used. While Scikit Learn offers the GridSearchCV function to simplify the process, it would be an extremely costly execution both in computing power and time.

In [3]:
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

In [4]:
iris = datasets.load_iris()

In [13]:
parameters = {'n_neighbors':(1,3,5,7,9,11),
              'p':[1,2,3],
             'leaf_size':[30,35]}

In [14]:
KNN = KNeighborsClassifier()

In [26]:
clf = GridSearchCV(KNN, parameters, cv=5,scoring='f1_macro')
clf.fit(iris.data, iris.target)

GridSearchCV(cv=5, error_score=nan,
             estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                            metric='minkowski',
                                            metric_params=None, n_jobs=None,
                                            n_neighbors=5, p=2,
                                            weights='uniform'),
             iid='deprecated', n_jobs=None,
             param_grid={'leaf_size': [30, 35],
                         'n_neighbors': (1, 3, 5, 7, 9, 11), 'p': [1, 2, 3]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='f1_macro', verbose=0)

In [16]:
clf.best_estimator_

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=9, p=3,
                     weights='uniform')

In [40]:
clf.best_score_

0.9866332497911445

In [42]:
clf.

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [18]:
clf.cv_results_

{'mean_fit_time': array([0.00083055, 0.00040054, 0.        , 0.0018157 , 0.00313916,
        0.00314097, 0.        , 0.        , 0.0001996 , 0.        ,
        0.00315452, 0.00020018, 0.        , 0.00312438, 0.        ,
        0.        , 0.00312419, 0.        , 0.00332384, 0.        ,
        0.        , 0.00649443, 0.00039835, 0.        , 0.00316944,
        0.00313811, 0.        , 0.        , 0.00312495, 0.        ,
        0.        , 0.        , 0.        , 0.00020652, 0.        ,
        0.        ]),
 'std_fit_time': array([0.00041684, 0.00049056, 0.        , 0.00173439, 0.00627832,
        0.00628195, 0.        , 0.        , 0.00039921, 0.        ,
        0.00630903, 0.00040035, 0.        , 0.00624876, 0.        ,
        0.        , 0.00624838, 0.        , 0.0061607 , 0.        ,
        0.        , 0.00795699, 0.00048788, 0.        , 0.00633888,
        0.00627623, 0.        , 0.        , 0.0062499 , 0.        ,
        0.        , 0.        , 0.        , 0.00041304, 0.   

In [17]:
sorted(clf.cv_results_.keys())

['mean_fit_time',
 'mean_score_time',
 'mean_test_score',
 'param_leaf_size',
 'param_n_neighbors',
 'param_p',
 'params',
 'rank_test_score',
 'split0_test_score',
 'split1_test_score',
 'split2_test_score',
 'split3_test_score',
 'split4_test_score',
 'std_fit_time',
 'std_score_time',
 'std_test_score']

In [43]:
Log_Reg=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')
Log_Reg
Log_Reg.fit(iris.data, iris.target)

predicted_xtest=list(Log_Reg.predict(iris.data))
actual_test=list(iris.target)

from sklearn.metrics import confusion_matrix 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 

results =confusion_matrix(actual_test, predicted_xtest) 

print('Confusion Matrix :')
print(results) 
print('Accuracy Score :',accuracy_score(actual_test, predicted_xtest) )
print('Report : ')
print (classification_report(actual_test, predicted_xtest) )

Confusion Matrix :
[[50  0  0]
 [ 0 47  3]
 [ 0  2 48]]
Accuracy Score : 0.9666666666666667
Report : 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.96      0.94      0.95        50
           2       0.94      0.96      0.95        50

    accuracy                           0.97       150
   macro avg       0.97      0.97      0.97       150
weighted avg       0.97      0.97      0.97       150



In [50]:
from sklearn.metrics import f1_score

In [51]:
f1_score(actual_test, predicted_xtest, average='macro')

0.9666633329999667

### Random Search Parameter Tuning
Random search is an approach to parameter tuning that will sample algorithm parameters from a random distribution (i.e. uniform) for a fixed number of iterations. A model is constructed and evaluated for each combination of parameters chosen.

The recipe below evaluates different alpha random values between 0 and 1 for the Ridge Regression algorithm on the standard diabetes dataset.

While it’s possible that RandomizedSearchCV will not find as accurate of a result as GridSearchCV, it surprisingly picks the best result more often than not and in a fraction of the time it takes GridSearchCV would have taken. Given the same resources, Randomized Search can even outperform Grid Search. This can be visualized in the graphic below when continuous parameters are used.

In [38]:
import numpy as np
from scipy.stats import uniform as sp_rand
from sklearn import datasets
from sklearn.linear_model import Ridge
from sklearn.model_selection import RandomizedSearchCV
# load the diabetes datasets
dataset = datasets.load_diabetes()
# prepare a uniform distribution to sample for the alpha parameter
param_grid = {'alpha': sp_rand()}
# create and fit a ridge regression model, testing random alpha values
model = Ridge()
rsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
rsearch.fit(dataset.data, dataset.target)
print(rsearch)


RandomizedSearchCV(cv=None, error_score=nan,
                   estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
                                   max_iter=None, normalize=False,
                                   random_state=None, solver='auto',
                                   tol=0.001),
                   iid='deprecated', n_iter=100, n_jobs=None,
                   param_distributions={'alpha': <scipy.stats._distn_infrastructure.rv_frozen object at 0x000002024E492C08>},
                   pre_dispatch='2*n_jobs', random_state=None, refit=True,
                   return_train_score=False, scoring=None, verbose=0)


In [39]:
# summarize the results of the random parameter search
print(rsearch.best_score_)
print(rsearch.best_estimator_.alpha)

0.4821549081098828
0.0026304647186116137


### Conclusions and key takeaways:
Model tuning is the process of finding the best machine learning model hyperparameters for a particular data set. Random and Grid Search are two uniformed methods for hyperparameter tuning and Scikit Learn offers these functions through GridSearchCV and RandomizedSearchCV.

With small data sets and lots of resources, Grid Search will produce accurate results. However, with large data sets, the high dimensions will greatly slow down computation time and be very costly. In this instance, it is advised to use Randomized Search since the number of iterations is explicitly defined by the data scientist.

In [37]:
from scipy.stats import uniform as sp_rand
type(sp_rand())

scipy.stats._distn_infrastructure.rv_frozen