Selecting hypermeters is always a challenge in machine learning. Gridsearch and Randomizedsearch are two well known methods to find the best parameters. 

In GridsearchCV (Gridsearch Cross Validation) we use all the possible combinations of the parameters. But in RandomizedCV (Randomizedch Cross Validation), as it's name suggest, we select the combinations randomly. 


It is worth mentioning that, GridsearchCV is appropriate where:

- The training size is small (1,000 to 10,000). Otherwise it will take too long time to find parameters
- When the paremeters are equally important for the model



RandomizedCV is usable where:
- Training size is too big and it requires a long time to train
- When one or more parameters are more important than others

In [1]:
#Now we will implement both of the methods on a simple ML problem and evaluate their performance
#First load a preloaded dataset of sklearn

from sklearn import svm, datasets
iris = datasets.load_iris()

import pandas as pd
df = pd.DataFrame(iris.data, columns = iris.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [2]:
#In irsi dataset the target is not binded with iris.data. We have to collect it separately
#Targets are also in one hot encoded form. We need to bring out the names from that

df['target'] = iris.target
df['flower'] = df['target'].apply(lambda x : iris.target_names[x])
df['flower'].unique()

array(['setosa', 'versicolor', 'virginica'], dtype=object)

In [3]:
#Lets split the dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop(labels=['flower', 'target'], axis = 1),
                                                    df['flower'], test_size = 0.2)

In [4]:
#Now we will create our ML model with a random combination of parameters

svc_model = svm.SVC(kernel='linear', C = 0.02, gamma='auto')
svc_model.fit(X_train, y_train)
svc_model.score(X_test, y_test)

0.9333333333333333

In [12]:
#As the dataset is very small, we can use cross validation to see if the model is overfitted or not
from sklearn.model_selection import cross_val_score

In [13]:
cross_val_score(svm.SVC(kernel='linear', C = 0.02, gamma='auto'), 
                 (df.drop(labels=['flower', 'target'], axis = 1)), df['flower'], cv = 5) 

array([0.9       , 0.96666667, 0.86666667, 0.93333333, 1.        ])

In [15]:
#The accuracy result is consistent, so it is not overfitted
#Now we can proceed to do GridsearchCV
from sklearn.model_selection import GridSearchCV

In [30]:
grid_clf = GridSearchCV(svm.SVC(gamma='auto'),
                        {
                            'C' : [0.01, 1, 10],
                            'kernel' : ['rbf', 'linear']
                        },
                        cv = 5,
                        return_train_score = False
                       )

grid_clf.fit((df.drop(labels=['flower', 'target'], axis = 1)), df['flower'])
grid_result_df = pd.DataFrame(grid_clf.cv_results_)
grid_result_df

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.003987,0.000891,0.002008,1.3e-05,0.01,rbf,"{'C': 0.01, 'kernel': 'rbf'}",0.9,0.933333,0.9,0.933333,1.0,0.933333,0.036515,5
1,0.002987,1.2e-05,0.0018,0.000403,0.01,linear,"{'C': 0.01, 'kernel': 'linear'}",0.9,0.966667,0.866667,0.966667,0.9,0.92,0.04,6
2,0.002586,0.000486,0.001801,0.000387,1.0,rbf,"{'C': 1, 'kernel': 'rbf'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
3,0.002195,0.000399,0.001795,0.000398,1.0,linear,"{'C': 1, 'kernel': 'linear'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
4,0.002593,0.000489,0.00159,0.000485,10.0,rbf,"{'C': 10, 'kernel': 'rbf'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
5,0.002393,0.000489,0.001796,0.000399,10.0,linear,"{'C': 10, 'kernel': 'linear'}",1.0,1.0,0.9,0.966667,1.0,0.973333,0.038873,4


In [31]:
print('Best accuracy found in GridsearchCV:', grid_clf.best_score_)
print('Best parameter combinaiton found in GridsearchCV:',grid_clf.best_params_)

Best accuracy found: 0.9800000000000001
Best parameter combinaiton: {'C': 1, 'kernel': 'rbf'}


In [32]:
#RandomizedCV does not use all the possible combinations like GridsearchCV.
#Let's try for a RandomizedCV approach

from sklearn.model_selection import RandomizedSearchCV

In [34]:
# Now I will create a model for randomizedCV
# The only difference is here we have to defince n_iter which is the number combinaitons the 
    # the model will look for
rand_clf = RandomizedSearchCV(svm.SVC(gamma='auto'),
                              {
                                  'C' : [0.01, 1, 10],
                                  'kernel' : ['rbf', 'linear', 'poly']
                              },
                              cv = 5,
                              return_train_score = False,
                              n_iter = 5
                             )
rand_clf.fit((df.drop(labels=['flower', 'target'], axis = 1)), df['flower'])
rand_result_df = pd.DataFrame(rand_clf.cv_results_)
rand_result_df

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_kernel,param_C,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.003593,0.000796,0.002201,0.000396,rbf,0.01,"{'kernel': 'rbf', 'C': 0.01}",0.9,0.933333,0.9,0.933333,1.0,0.933333,0.036515,4
1,0.002788,0.000396,0.001799,0.000403,linear,10.0,"{'kernel': 'linear', 'C': 10}",1.0,1.0,0.9,0.966667,1.0,0.973333,0.038873,2
2,0.003389,0.001018,0.001791,0.000396,poly,1.0,"{'kernel': 'poly', 'C': 1}",1.0,1.0,0.9,0.933333,1.0,0.966667,0.042164,3
3,0.002194,0.0004,0.001995,3e-06,rbf,1.0,"{'kernel': 'rbf', 'C': 1}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
4,0.002394,0.00049,0.001796,0.000398,linear,0.01,"{'kernel': 'linear', 'C': 0.01}",0.9,0.966667,0.866667,0.966667,0.9,0.92,0.04,5


In [35]:
print('Best accuracy found in RandomizedsearchCV:', rand_clf.best_score_)
print('Best parameter combinaiton found in RandomizedsearchCV:', rand_clf.best_params_)

Best accuracy found in RandomizedsearchCV: 0.9800000000000001
Best parameter combinaiton found in RandomizedsearchCV: {'kernel': 'rbf', 'C': 1}


In [37]:
#Till now we have searched among different parameters of the same model
#Now we will search different parameter combinaitons among different models
from sklearn.ensemble import RandomForestClassifier
all_models = {
    'svm' : {
        'model' : svm.SVC(gamma='auto'),
        'parameters' : {
            'C' : [0.01, 1, 10],
            'kernel' : ['rbf', 'linear', 'poly']
        }
    },
    'random_forest' : {
        'model' : RandomForestClassifier(),
        'parameters' : {
            'n_estimators' : [1, 10, 50]
        }
    }
}

In [38]:
# Now we will find the best accuracies and combinations
final_accuracy = []
for model_name, model_desc in all_models.items():
    clf_model = RandomizedSearchCV(model_desc['model'], model_desc['parameters'], cv = 5,
                                  return_train_score = False, n_iter = 3)
    clf_model.fit((df.drop(labels=['flower', 'target'], axis = 1)), df['flower'])
    final_accuracy.append(
    {
        'model' : model_name,
        'best accuracy' : clf_model.best_score_,
        'best parameters' : clf_model.best_params_
    })
    
final_result_df = pd.DataFrame(final_accuracy)
final_result_df

Unnamed: 0,model,best accuracy,best parameters
0,svm,0.98,"{'kernel': 'linear', 'C': 1}"
1,random_forest,0.966667,{'n_estimators': 50}


In the next step a Gridsearch and Randomsearch approach will added for Neural Network parameter tuning