# Randomized Grid Search

<span>Manual hyperparameter searching? No way. Scikit Learn has a got an amazing random grid search function that can give us a hint into the best parameters by calling its class, setting up a dictionary with all parameters, and letting it fly. This example below his using a K-Nearest Neighbours model for its example. After the Randomize Grid Search is done, you can pull the best parameter for your model, and as well as take a look a the history of the previous combination of parameters.</span>
    
### Import Preliminaries

In [6]:
# Import modulse
import numpy as np
import pandas as pd
from sklearn.cross_validation import cross_val_score
from sklearn.datasets import load_iris
from sklearn.model_selection import RandomizedSearchCV
from sklearn.neighbors import KNeighborsClassifier


# Import iris dataset 
iris = load_iris()
X, y = iris.data, iris.target

# Assign classifier
classifier = KNeighborsClassifier(n_neighbors=5, weights='uniform', 
                                 metric ='minkowski', p=2)

# Intiate a grid dictionary
grid = {'n_neighbors':list(range(1,11)), 'weights':['uniform', 'distance'],
       'p':[1,2], }

# Declare randomized search on model using our param grid
random_search = RandomizedSearchCV(estimator=classifier, 
                                   param_distributions = grid, 
                                   n_iter = 10, scoring = 'accuracy', 
                                   n_jobs=1, refit=True,
                                   cv = 10,
                                   return_train_score=True)

# Fit the randomized search model with our data
random_search.fit(X,y)

RandomizedSearchCV(cv=10, error_score='raise',
          estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
          fit_params=None, iid=True, n_iter=10, n_jobs=1,
          param_distributions={'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'weights': ['uniform', 'distance'], 'p': [1, 2]},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score=True, scoring='accuracy', verbose=0)

### Randomized Grid Search Score

In [7]:
# Print the best parameters and its best accuracy score
print('Best parameters: %s'%random_search.best_params_)
print('CV Accuracy of best parameters: %.3f'%random_search.best_score_)

Best parameters: {'weights': 'distance', 'p': 2, 'n_neighbors': 10}
CV Accuracy of best parameters: 0.973


- This method is more computationaly visable then a full grid search
- The result will change each time the model is fitted

### Baseline Cross Validation Score

In [8]:
# Print our current accuracy score using our current parameters
print ('Baseline with default parameters: %.3f' %np.mean(
        cross_val_score(classifier, X, y, cv=10, scoring='accuracy', n_jobs=1)))

Baseline with default parameters: 0.967


### Viewing Randomized Grid Score

In [9]:
# The grid scores attribute is now depricated, 
# but I'll use it till its completely gone
random_search.grid_scores_



[mean: 0.96000, std: 0.05333, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 2},
 mean: 0.97333, std: 0.03266, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 10},
 mean: 0.96000, std: 0.05333, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 1},
 mean: 0.96667, std: 0.04472, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 4},
 mean: 0.96667, std: 0.04472, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 8},
 mean: 0.96667, std: 0.04472, params: {'weights': 'uniform', 'p': 2, 'n_neighbors': 7},
 mean: 0.95333, std: 0.06700, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 4},
 mean: 0.95333, std: 0.05207, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 7},
 mean: 0.94000, std: 0.06289, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 2},
 mean: 0.97333, std: 0.03266, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 9}]

In [11]:
# The new cv_results attribute outpute our results in JSON
# Throw it in a dataframe to make some sense of it
json_df = pd.DataFrame(random_search.cv_results_).head(3)
json_df 

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_weights,param_p,param_n_neighbors,params,split0_test_score,split1_test_score,...,split2_train_score,split3_train_score,split4_train_score,split5_train_score,split6_train_score,split7_train_score,split8_train_score,split9_train_score,mean_train_score,std_train_score
0,0.000292,3.7e-05,0.000545,0.000114,distance,2,2,"{'weights': 'distance', 'p': 2, 'n_neighbors': 2}",1.0,0.933333,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,0.000293,0.000102,0.000528,8.7e-05,distance,2,10,"{'weights': 'distance', 'p': 2, 'n_neighbors':...",1.0,0.933333,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,0.000265,2.4e-05,0.000493,9.4e-05,distance,2,1,"{'weights': 'distance', 'p': 2, 'n_neighbors': 1}",1.0,0.933333,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [12]:
# Here is the raw JSON output
random_search.cv_results_

{'mean_fit_time': array([ 0.00029211,  0.00029318,  0.00026453,  0.00026796,  0.00027027,
         0.00027721,  0.00029645,  0.00026639,  0.00025463,  0.00026634]),
 'std_fit_time': array([  3.71983109e-05,   1.02307524e-04,   2.38624127e-05,
          2.79288718e-05,   3.62967054e-05,   4.16681812e-05,
          5.36330741e-05,   2.61113629e-05,   1.27668641e-05,
          4.79880930e-05]),
 'mean_score_time': array([ 0.00054541,  0.00052838,  0.00049326,  0.00048833,  0.00048378,
         0.00047646,  0.00046542,  0.00048676,  0.00046687,  0.00046294]),
 'std_score_time': array([  1.14184241e-04,   8.74830381e-05,   9.36912486e-05,
          6.95798226e-05,   4.37161709e-05,   6.20151611e-05,
          5.55320274e-05,   1.04900880e-04,   9.93801053e-05,
          4.81664600e-05]),
 'param_weights': masked_array(data = ['distance' 'distance' 'distance' 'distance' 'distance' 'uniform' 'uniform'
  'uniform' 'uniform' 'uniform'],
              mask = [False False False False False False 

Author: Kavi Sekhon