https://www.kaggle.com/code/devharal/gridsearchcv-and-randomizedsearchcv/notebook

### Learning

https://www.javatpoint.com/hyperparameters-in-machine-learning


#### GridSearchCV vs RandomSearchCV
https://www.kaggle.com/general/212697

1) GridSearchCV :
We try every combination of a present list of values of the hyper-parameters and choose the best combination based on the cross validation score.

**-** It takes a lot of time to fit (because it will try all the combinations)

**+** gives us the best hyper-parameters.

exemple ;
{ 'C': [0.1, 1, 10, 100, 1000], 'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 'kernel': ['rbf',’linear’,'sigmoid'] }

in this case we will try 5 * 5 * 3=75 combinations

2) RandomSearchCV :
Tries random combinations of a range of values (we have to define the number of iterations). It is good at testing a wide range of values and normally it reaches a very good combination very fast, but the problem that it doesn’t guarantee to give the best parameter combination because not all parameter values are tried out (recommended for big datasets or high number of parameters to tune.

   **-**  It doesn't guarantee that we have the best parameters
   
   **+**  faster because not all parameter values are tried out

In [51]:
# importing the dependencies
import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

In [52]:
# loading the data from sklearn
breast_cancer_dataset = sklearn.datasets.load_breast_cancer()

In [53]:
type(breast_cancer_dataset)

sklearn.utils._bunch.Bunch

In [54]:
df = pd.DataFrame(breast_cancer_dataset.data, columns=breast_cancer_dataset.feature_names)
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [55]:
df.columns

Index(['mean radius', 'mean texture', 'mean perimeter', 'mean area',
       'mean smoothness', 'mean compactness', 'mean concavity',
       'mean concave points', 'mean symmetry', 'mean fractal dimension',
       'radius error', 'texture error', 'perimeter error', 'area error',
       'smoothness error', 'compactness error', 'concavity error',
       'concave points error', 'symmetry error', 'fractal dimension error',
       'worst radius', 'worst texture', 'worst perimeter', 'worst area',
       'worst smoothness', 'worst compactness', 'worst concavity',
       'worst concave points', 'worst symmetry', 'worst fractal dimension'],
      dtype='object')

In [56]:
# adding the 'target' column to the data frame
df['label'] = breast_cancer_dataset.target
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,label
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [57]:
df.shape

(569, 31)

In [58]:
# checking for missing values
df.isna().sum()

mean radius                0
mean texture               0
mean perimeter             0
mean area                  0
mean smoothness            0
mean compactness           0
mean concavity             0
mean concave points        0
mean symmetry              0
mean fractal dimension     0
radius error               0
texture error              0
perimeter error            0
area error                 0
smoothness error           0
compactness error          0
concavity error            0
concave points error       0
symmetry error             0
fractal dimension error    0
worst radius               0
worst texture              0
worst perimeter            0
worst area                 0
worst smoothness           0
worst compactness          0
worst concavity            0
worst concave points       0
worst symmetry             0
worst fractal dimension    0
label                      0
dtype: int64

In [59]:
# checking the distribution of Target Varibale
df["label"].value_counts()

1    357
0    212
Name: label, dtype: int64

1 --> Benign

0 --> Malignant

Separating the features and target

In [60]:
X = df.drop(columns='label', axis=1)
Y = df['label']

In [61]:
X.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [62]:
Y.head()

0    0
1    0
2    0
3    0
4    0
Name: label, dtype: int64

In [63]:
 X = np.asarray(X)
 Y = np.asarray(Y)

## GridSearchCV

GridSearchCV is used for determining the best parameters for our model

In [64]:
# loading the SVC model
model = SVC()

In [65]:
# hyperparameters

parameters = {
              'kernel':['linear','poly','rbf','sigmoid'],
              'C':[1, 5, 10, 20]
}

In [66]:
# grid search
classifier = GridSearchCV(model, parameters, cv=5)

In [67]:
# fitting the data to our model
classifier.fit(X, Y)

In [68]:
classifier.cv_results_

{'mean_fit_time': array([1.06690202e+00, 2.69250870e-03, 3.61504555e-03, 1.46060467e-02,
        2.16908674e+00, 2.70099640e-03, 2.73308754e-03, 1.00340843e-02,
        3.07536135e+00, 2.54955292e-03, 2.72965431e-03, 1.01852894e-02,
        5.74419479e+00, 2.73442268e-03, 2.52399445e-03, 9.87071991e-03]),
 'std_fit_time': array([4.27578689e-01, 6.21149823e-04, 6.38843935e-04, 4.33720091e-04,
        4.55216802e-01, 3.35482184e-04, 4.55932646e-04, 4.35110262e-04,
        7.12624570e-01, 1.24806841e-04, 3.08752115e-04, 4.63306904e-04,
        2.16344506e+00, 1.19964307e-04, 1.10443824e-04, 4.14018717e-04]),
 'mean_score_time': array([0.00051427, 0.00071192, 0.00215082, 0.00279355, 0.00048876,
        0.0006969 , 0.00138664, 0.00232654, 0.00050645, 0.00053153,
        0.00136595, 0.00224423, 0.00052662, 0.00054502, 0.0011198 ,
        0.00213242]),
 'std_score_time': array([3.37589258e-05, 1.52920272e-04, 3.95192108e-04, 2.96552846e-05,
        2.41323889e-05, 8.13105932e-05, 1.76786757e-

In [69]:
# best parameters

best_parameters = classifier.best_params_
print(best_parameters)

{'C': 10, 'kernel': 'linear'}


In [70]:
# higest accuracy

highest_accuracy = classifier.best_score_
print(highest_accuracy)

0.9525694767893185


In [71]:
# loading the results to pandas dataframe
result = pd.DataFrame(classifier.cv_results_)

In [72]:
result.shape

(16, 15)

In [73]:
result.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,1.066902,0.427579,0.000514,3.4e-05,1,linear,"{'C': 1, 'kernel': 'linear'}",0.947368,0.929825,0.973684,0.921053,0.955752,0.945536,0.018689,4
1,0.002693,0.000621,0.000712,0.000153,1,poly,"{'C': 1, 'kernel': 'poly'}",0.842105,0.885965,0.929825,0.947368,0.938053,0.908663,0.039382,12
2,0.003615,0.000639,0.002151,0.000395,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.850877,0.894737,0.929825,0.947368,0.938053,0.912172,0.035444,11
3,0.014606,0.000434,0.002794,3e-05,1,sigmoid,"{'C': 1, 'kernel': 'sigmoid'}",0.54386,0.45614,0.464912,0.385965,0.451327,0.460441,0.050253,13
4,2.169087,0.455217,0.000489,2.4e-05,5,linear,"{'C': 5, 'kernel': 'linear'}",0.947368,0.938596,0.973684,0.929825,0.964602,0.950815,0.016216,2


In [74]:
grid_search_result = result[['param_C','param_kernel','mean_test_score']]

In [75]:
grid_search_result

Unnamed: 0,param_C,param_kernel,mean_test_score
0,1,linear,0.945536
1,1,poly,0.908663
2,1,rbf,0.912172
3,1,sigmoid,0.460441
4,5,linear,0.950815
5,5,poly,0.922729
6,5,rbf,0.931501
7,5,sigmoid,0.411178
8,10,linear,0.952569
9,10,poly,0.920975


Highest Accuracy = 95.2%

Best Parameters = {'C':10, 'kernel':'linear'}

## RandomizedSearchCV

In [76]:
# loading the SVC model
model = SVC()

In [77]:
# hyperparameters

parameters = {
              'kernel':['linear','poly','rbf','sigmoid'],
              'C':[1, 5, 10, 20]
}

In [78]:
# grid search
classifier = RandomizedSearchCV(model, parameters, cv=5)

In [79]:
# fitting the data to our model
classifier.fit(X, Y)

In [80]:
classifier.cv_results_

{'mean_fit_time': array([5.87854385e-03, 1.57222748e-02, 4.16970253e-03, 1.47730827e-02,
        3.70432444e+00, 2.59003258e+00, 3.84755135e-03, 1.33867264e-02,
        6.29396696e+00, 3.63683701e-03]),
 'std_fit_time': array([1.21544209e-03, 2.84860155e-03, 3.21456898e-04, 2.50008928e-04,
        1.23259711e+00, 5.44130063e-01, 1.45791384e-04, 6.21245384e-04,
        1.71577970e+00, 1.65552409e-04]),
 'mean_score_time': array([0.0018887 , 0.00309649, 0.00220594, 0.00337715, 0.00063186,
        0.00061178, 0.00075994, 0.00305271, 0.00065074, 0.00077186]),
 'std_score_time': array([4.12204512e-04, 2.25156710e-04, 1.76634064e-04, 8.65270132e-05,
        1.34920768e-04, 9.85797830e-05, 6.61836577e-05, 1.97629416e-04,
        1.19666995e-04, 3.69259814e-05]),
 'param_kernel': masked_array(data=['poly', 'sigmoid', 'rbf', 'sigmoid', 'linear',
                    'linear', 'poly', 'sigmoid', 'linear', 'poly'],
              mask=[False, False, False, False, False, False, False, False,
       

In [81]:
# best parameters

best_parameters = classifier.best_params_
print(best_parameters)

{'kernel': 'linear', 'C': 10}


In [82]:
# higest accuracy

highest_accuracy = classifier.best_score_
print(highest_accuracy)

0.9525694767893185


In [83]:
# loading the results to pandas dataframe
result = pd.DataFrame(classifier.cv_results_)

In [84]:
result.shape

(10, 15)

In [85]:
result.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_kernel,param_C,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.005879,0.001215,0.001889,0.000412,poly,5,"{'kernel': 'poly', 'C': 5}",0.885965,0.912281,0.921053,0.938596,0.955752,0.922729,0.023689,4
1,0.015722,0.002849,0.003096,0.000225,sigmoid,20,"{'kernel': 'sigmoid', 'C': 20}",0.473684,0.403509,0.421053,0.342105,0.353982,0.398867,0.04764,10
2,0.00417,0.000321,0.002206,0.000177,rbf,1,"{'kernel': 'rbf', 'C': 1}",0.850877,0.894737,0.929825,0.947368,0.938053,0.912172,0.035444,7
3,0.014773,0.00025,0.003377,8.7e-05,sigmoid,1,"{'kernel': 'sigmoid', 'C': 1}",0.54386,0.45614,0.464912,0.385965,0.451327,0.460441,0.050253,8
4,3.704324,1.232597,0.000632,0.000135,linear,10,"{'kernel': 'linear', 'C': 10}",0.938596,0.938596,0.973684,0.947368,0.964602,0.952569,0.0142,1


In [86]:
randomized_search_result = result[['param_C','param_kernel','mean_test_score']]

In [87]:
randomized_search_result

Unnamed: 0,param_C,param_kernel,mean_test_score
0,5,poly,0.922729
1,20,sigmoid,0.398867
2,1,rbf,0.912172
3,1,sigmoid,0.460441
4,10,linear,0.952569
5,5,linear,0.950815
6,20,poly,0.919221
7,10,sigmoid,0.402391
8,20,linear,0.949061
9,10,poly,0.920975


Highest Accuracy = 95.2%

Best Parameters = {'C':10, 'kernel':'linear'}