Introduction to Hyperparameter Tuning

Data Science is made of mainly two parts. Data analytics and machine learning modeling

1. Data Analytics – Historical data has a lot to say, so to hear what it
has in store for us, we need to analyze it thoroughly. 

2. Machine Learning – Post doing data analytics, these insights should be used in the most sought-after way to predict the future values.

When a machine learns on its own based on data patterns from historical data, we get an output which is known as a machine learning model.

There is a list of different machine learning models. They all are different in some way or the other, but what makes them different is nothing but input
parameters for the model. These input parameters are named as Hyperparameters.

#### Hyperparameters: 
Define the architecture of the model, and the best part about these is that you get a choice to select these for your model. Of course, you must select from a specific list of hyperparameters for a given model as it varies from model to model.

#### Hyperparameter Tuning: 
is used to select optimal values for hyperparameters which would generate the best model output. So, what we tell the model is to explore and
select the optimal model architecture automatically. 

For every model, our goal is to minimize the
error or say to have predictions as close as possible to actual values. This is
one of the cores or say the major objective of hyperparameter tuning.

### Approaches to Hyperparameter tuning:

#### Manual Search: 

We select some hyperparameters for a model based on our gut feeling and experience. This might not be of much help as human judgment is biased, and here human experience is playing a significant role.

#### Random Search: 

Instead of doing multiple rounds of this process, it would be better to give multiple values for all the hyperparameters in one got to the model and let the model decide which one best suits.  Out of the values mentioned, the model randomly makes combinations of its own and tries to fit the dataset and test the accuracy. Here, chances are there to miss on a few combinations which could have been optimal ones. Although, random search consumes quite less
amount of time and most of the time it gives optimal solutions as well.

#### Grid Search: 

In this method, each combination of
hyperparameter value is tried. This makes the process time consuming, or in short, inefficient. This method is quite an expensive method in terms of computation power and time, but this is the most efficient method as there is the least possibility of missing out on an optimal solution for a model.

#### Cross Validation

While creating any machine learning models, we generally divide the dataset into train sets and test sets. The train set is used to make machines learn the
pattern and create a model for future prediction. The test dataset is used to test model performance such that it considers this data as unseen data. When
we use cross-validation, even the train set is divided into N partitions to make sure that our model is not overfitting.

### Hyperparameter Tuning Machine Learning

In [1]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

In [2]:
dataset = pd.read_csv('C:\\Users\\AFC 2\\Documents\\Data Science Files\\Jupyter notebooks for Data Science\\Social_Network_Ads.csv')

In [3]:
dataset.head() #record for a particular product

Unnamed: 0,User ID,Gender,Age,EstimatedSalary,Purchased
0,15624510,Male,19,19000,0
1,15810944,Male,35,20000,0
2,15668575,Female,26,43000,0
3,15603246,Female,27,57000,0
4,15804002,Male,19,76000,0


In [4]:
#To take a look at the age and estimated salary
X = dataset.iloc[:, [2, 3]]
Y = dataset.iloc[:, 4]

In [5]:
X

Unnamed: 0,Age,EstimatedSalary
0,19,19000
1,35,20000
2,26,43000
3,27,57000
4,19,76000
...,...,...
395,46,41000
396,51,23000
397,50,20000
398,36,33000


In [6]:
Y

0      0
1      0
2      0
3      0
4      0
      ..
395    1
396    1
397    1
398    0
399    1
Name: Purchased, Length: 400, dtype: int64

In [7]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   User ID          400 non-null    int64 
 1   Gender           400 non-null    object
 2   Age              400 non-null    int64 
 3   EstimatedSalary  400 non-null    int64 
 4   Purchased        400 non-null    int64 
dtypes: int64(4), object(1)
memory usage: 15.8+ KB


In [8]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = .25)

In [9]:
#Rescale the data
scalar = StandardScaler()
x_train_scaled = scalar.fit_transform(x_train)
x_test_scaled = scalar.transform(x_test)

In [10]:
#Train our model
clf = SVC()

In [11]:
#Fit the data
clf.fit(x_train_scaled, y_train)

SVC()

In [12]:
#Make prediction
y_pred = clf.predict(x_test_scaled)

In [13]:
#Take a look at the accuracy score
accuracy_score(y_test, y_pred)

0.89

In [14]:
#create a grid of all possible value
grid = {
        'C': [1, 10, 100, 1000],
        'kernel': ['rbf', 'linear'],
        'gama': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
}    

In [15]:
#create grid search CV
grid_search_cv = GridSearchCV(SVC(), param_grid = grid, scoring = 'accuracy', n_jobs = 1)

In [16]:
grid_search_cv.fit(x_train_scaled, y_train)
#grid search takes a while to load. What to do is to rescale the data 

ValueError: Invalid parameter gama for estimator SVC(C=1). Check the list of available parameters with `estimator.get_params().keys()`.

In [None]:
grid_search_cv.best_score_

In [None]:
grid_search_cv.best_params_

In [None]:
rdm_search_cv = RandomizedSearchCV(SVC(), param_distributions = grid, n_jobs = -1)

In [None]:
rdm_search_cv.fit(x_train_scaled, y_train)

In [None]:
rdm_search_cv.best_score_

In [None]:
rdm_search_cv.best_params_

#### Introduction to Grid Search

GridSearch is an optimization tool that we use when tuning
hyperparameters. We define the grid of parameters that we want
to search through, and we select the best combination of
parameters for our data.

##### What makes GridSearch so important?
GridSearch allows us to find the best model given a data set very easily. It actually makes the Machine Learning part of the Data Scientists role much easier by automating the search.

On the Machine Learning side, some things that still remain to be done is deciding on the right way to measure error, deciding on which models to try out and which hyperparameters to test for. And the most important part, the work on data preparation, is also left for the data scientist.

Thanks to the GridSearch approach, the Data Scientist can focus on the data wrangling work, while automating repetitive tasks of model comparison. This makes the work more interesting and allows the Data Scientist to add value where he’s most needed: working with data.

#### Grid Search

Allows us to scan through multiple free parameters with few lines of code.

Scikit-learn has a means of automatically iterating over these hyperparameters using cross-validation called Grid Search.

Grid Search takes the model or objects we will like to train and different values of the hyperparameters. It then calculates the mean square error or R-square for various hyperparameter values allowing you to choose the best values.

One of the advantages of Grid Search is how quickly we can test multiple parameters. 

#### Machine Learning Session – Grid Search

The process of choosing the optimum parameter is called hyper tuning.

In [17]:
from sklearn import svm, datasets
iris = datasets.load_iris()

In [18]:
import pandas as pd
df = pd.DataFrame(iris.data, columns = iris.feature_names)
df['flower'] = iris.target
df['flower'] = df['flower'].apply(lambda x: iris.target_names[x])
df[47:52]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),flower
47,4.6,3.2,1.4,0.2,setosa
48,5.3,3.7,1.5,0.2,setosa
49,5.0,3.3,1.4,0.2,setosa
50,7.0,3.2,4.7,1.4,versicolor
51,6.4,3.2,4.5,1.5,versicolor


In [19]:
# Splitting our dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size = 0.3)

In [20]:
#Using the SVM model
model = svm.SVC(kernel = 'rbf', C = 30, gamma = 'auto')
model.fit(X_train, y_train)
model.score(X_test, y_test)

0.9555555555555556

In [21]:
from sklearn.model_selection import cross_val_score

In [22]:
cross_val_score(svm.SVC(kernel = 'linear', C = 10, gamma = 'auto'), iris.data, iris.target, cv = 5)

array([1.        , 1.        , 0.9       , 0.96666667, 1.        ])

In [23]:
cross_val_score(svm.SVC(kernel = 'rbf', C = 10, gamma = 'auto'), iris.data, iris.target, cv = 5)

array([0.96666667, 1.        , 0.96666667, 0.96666667, 1.        ])

In [24]:
cross_val_score(svm.SVC(kernel = 'rbf', C = 20, gamma = 'auto'), iris.data, iris.target, cv = 5)

array([0.96666667, 1.        , 0.9       , 0.96666667, 1.        ])

In [25]:
# using a for loop
import numpy as np
kernels = ['rbf', 'linear']
C = [1, 10, 20]
avg_scores = {}
for kval in kernels:
    for cval in C:
        cv_scores = cross_val_score(svm.SVC(kernel = kval, C = cval, gamma = 'auto'), iris.data, iris.target, cv = 5)
        avg_scores[kval + '_' + str(cval)] = np.average(cv_scores)
avg_scores

{'rbf_1': 0.9800000000000001,
 'rbf_10': 0.9800000000000001,
 'rbf_20': 0.9666666666666668,
 'linear_1': 0.9800000000000001,
 'linear_10': 0.9733333333333334,
 'linear_20': 0.9666666666666666}

In [26]:
# The GridSearchCV will be used to do the exact same thing as seen in the code above for more convenience
from sklearn.model_selection import GridSearchCV

clf = GridSearchCV(svm.SVC(gamma = 'auto'), {
    'C': [1, 10, 20],
    'kernel': ['rbf', 'linear']
}, cv = 5, return_train_score = False)

clf.fit(iris.data, iris.target)
clf.cv_results_

{'mean_fit_time': array([0.00316734, 0.00052075, 0.00162015, 0.00324116, 0.00159826,
        0.00165749]),
 'std_fit_time': array([0.0025624 , 0.00104151, 0.0032403 , 0.00397026, 0.00319653,
        0.00331497]),
 'mean_score_time': array([5.09595871e-04, 1.18551254e-03, 9.41753387e-05, 0.00000000e+00,
        1.59850121e-03, 9.96589661e-05]),
 'std_score_time': array([0.00045831, 0.00145199, 0.00018835, 0.        , 0.003197  ,
        0.00019932]),
 'param_C': masked_array(data=[1, 1, 10, 10, 20, 20],
              mask=[False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_kernel': masked_array(data=['rbf', 'linear', 'rbf', 'linear', 'rbf', 'linear'],
              mask=[False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'C': 1, 'kernel': 'rbf'},
  {'C': 1, 'kernel': 'linear'},
  {'C': 10, 'kernel': 'rbf'},
  {'C': 10, 'kernel': 'linear'},
  {'C': 20, 'kernel': 'rbf'},
  {'C': 20

Gridsearchcv results are not easy to view. So, we download into a dataframe below

In [27]:
df = pd.DataFrame(clf.cv_results_)
df

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.003167,0.002562,0.00051,0.000458,1,rbf,"{'C': 1, 'kernel': 'rbf'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
1,0.000521,0.001042,0.001186,0.001452,1,linear,"{'C': 1, 'kernel': 'linear'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
2,0.00162,0.00324,9.4e-05,0.000188,10,rbf,"{'C': 10, 'kernel': 'rbf'}",0.966667,1.0,0.966667,0.966667,1.0,0.98,0.01633,1
3,0.003241,0.00397,0.0,0.0,10,linear,"{'C': 10, 'kernel': 'linear'}",1.0,1.0,0.9,0.966667,1.0,0.973333,0.038873,4
4,0.001598,0.003197,0.001599,0.003197,20,rbf,"{'C': 20, 'kernel': 'rbf'}",0.966667,1.0,0.9,0.966667,1.0,0.966667,0.036515,5
5,0.001657,0.003315,0.0001,0.000199,20,linear,"{'C': 20, 'kernel': 'linear'}",1.0,1.0,0.9,0.933333,1.0,0.966667,0.042164,6


In [28]:
df[['param_C', 'param_kernel', 'mean_test_score']]

Unnamed: 0,param_C,param_kernel,mean_test_score
0,1,rbf,0.98
1,1,linear,0.98
2,10,rbf,0.98
3,10,linear,0.973333
4,20,rbf,0.966667
5,20,linear,0.966667


In [29]:
dir(clf)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_check_is_fitted',
 '_check_n_features',
 '_check_refit_for_multimetric',
 '_estimator_type',
 '_format_results',
 '_get_param_names',
 '_get_tags',
 '_more_tags',
 '_pairwise',
 '_repr_html_',
 '_repr_html_inner',
 '_repr_mimebundle_',
 '_required_parameters',
 '_run_search',
 '_validate_data',
 'best_estimator_',
 'best_index_',
 'best_params_',
 'best_score_',
 'classes_',
 'cv',
 'cv_results_',
 'decision_function',
 'error_score',
 'estimator',
 'fit',
 'get_params',
 'inverse_transform',
 'multimetric_',
 'n_features_in_',
 'n_jobs',
 'n_splits

In [30]:
clf.best_score_

0.9800000000000001

In [31]:
clf.best_params_

{'C': 1, 'kernel': 'rbf'}

One issue that can happen with gridsearchcv is the computation cost. It can go very high. 

To tackle this computation problem, sklearn uses the Randomised search CV. This will try random combination of these parameter values and we can choose what those iterations will be.

In [35]:
#using the randomised search cv
from sklearn.model_selection import RandomizedSearchCV
rs = RandomizedSearchCV(svm.SVC(gamma = 'auto'), {
    'C': [1, 10, 20],
    'kernel': ['rbf', 'linear']
    },
    cv = 5,
    return_train_score = False,
    n_iter = 2     #trying only 2 combinations
)
rs.fit(iris.data, iris.target)
pd.DataFrame(rs.cv_results_)[['param_C', 'param_kernel', 'mean_test_score']]

Unnamed: 0,param_C,param_kernel,mean_test_score
0,10,rbf,0.98
1,20,rbf,0.966667


The code above produces 2 combinations initially. It randomly tries C values initially. 
Running it a second time will give a different C value.
It randomly tries the values for C and kernel and gives the best score.

#### How to choose the best model for a given problem

In [36]:
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

In [37]:
model_params = {
    'svm': {
        'model': svm.SVC(gamma = 'auto'),
        'params': {
            'C': [1, 10, 20],
            'kernel': ['rbf', 'linear']
        }
    },
    'random_forest': {
        'model': RandomForestClassifier(),
        'params': {
            'n_estimators': [1, 5, 10]
        }
    },
    'logistic_regression' : {
        'model': LogisticRegression(solver ='liblinear', multi_class = 'auto'),
        'param': {
            'C': [1, 5, 10]
        }
    }
}

In [40]:
# We will use a for loop to go through each of the values as above
scores = []

for model_name, mp in model_params.items():
    clf = GridSearchCV(mp['model'], mp['params'], cv = 5, return_train_score = False)
    clf.fit(iris.data, iris.target)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })

KeyError: 'params'

In [41]:
df = pd.DataFrame(scores, columns = ['model', 'best_score', 'best_params'])
df

Unnamed: 0,model,best_score,best_params
0,svm,0.98,"{'C': 1, 'kernel': 'rbf'}"
1,random_forest,0.96,{'n_estimators': 5}
