# Witful ML 09 - Hyperparameter Optimization
by Kaan Kabalak, Editor In Chief @ witfuldata.com

# What is hyperparameter optimization? 

To understand this we have to answer the following question: What is a parameter?

There are many fancy definitions out there. I want to explain what a parameter is through examples, within the context of machine learning.

Do you remember the K-Nearest Neighbors classifier? We used a parameter there, 'k'. We assigned a value to this parameter so that the model could make the necessary computations accordingly. It used the k parameter we set to determine the number that defined 'nearest' neighbors. That 'k' was a parameter. <strong> Parameter can be defined as a part of a system (in this case, an algorithm) which affects the characteristic workflow of that system</strong>. When the k changes, the workflow of the KNN system-algorithm changes.

A more recent example could be provided through our recent Regularized Regression tutorial. There we assigned a value to the 'alpha' parameter of the Lasso and Ridge models. The models used this 'alpha' parameter to determine how to penalize feature coefficients.

Hyperparameter means the parameters we set before running our model. Like 'k' for Nearest Neighbors Classifier and 'alpha' for regularized regression. 

Hyperparameter optimization is about changing these parameters until we find the ones that work the best. Let's see how we can do that in Python. 

# What is Grid Search Cross Validation (And Why It's Not So Scary)

I know that this terms seems super complicated and a bit scary. Think about how simply we went through all those ML concepts up to this point. A thing is complicated not by itself, but by its definition. With the right definition, everything will become understandable for everyone. 

So let's talk about Grid Search Cross Validation (GridSearch CV) and why it's a good way to learn and use hyperparameter optimization. 

There are several ways to optimize parameters. In this tutorial we are going to focus on GridSearchCV because it is a good choice the understand the main logic of hyperparameter optimization. To understand how GridSearch CV works, we need to understand what K- Fold Cross Validation is and how it helps data scientists. 

## K-Fold Cross Validation

Think about the train_test split. We seperated a certain part of the dataset, gaining two different sets of data to train and test our model.

The point of Hyperparameter Optimization is, as the name suggests, to find the best option possible. When we go over different parameters, trying out how models perform with them, we do this on a dataset with only one train_test split. This may not come in very handy for real life situations because other people can split the data in different ways. We need a method to make sure the parameters we are testing will perform well on datasets split in different ways. 

This is where Cross Validation comes to the rescue. It works like this:

* Split the main dataset into any number of parts (here, we are going to make it 5)
* Take one of these parts as the test set, and the other as the training set
* Calculate the performance metric with the defined parameters (R2,Accuracy etc.)
* Repeat the same process, each time using a different one of these 5 parts


By the way, these split parts are called folds. Because there are various options about the number of these folds, we call it K-Fold, to highlight the fact that the number of folds can change.


At the end of this process, we gain a lot of information. We will know how a certain parameter performed with a certain model on a dataset that was split in 5 different ways. We will have a profound opinion on how the parameter really affects the model performance in different scenarios.


Let's see how to perform cross validation with KFold. 

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso,Ridge
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, KFold

In [2]:
# Data Frame
house_df = pd.read_csv('housing_data.csv')

# Define X-y
X = house_df.drop('PRICE', axis = 1).values
y = house_df['PRICE'].values

# Instantiate KFold, set shuffle to True to shuffle the dataset before splitting it into folds.
kf = KFold(n_splits=5 , shuffle=True, random_state=4)

# Instantiate the model
lasso_model = Lasso(alpha=0.1)

# Assign the results to a variable
cv_res = cross_val_score (lasso_model, X, y, cv = kf)

In [3]:
print (cv_res)

[0.70499706 0.71285768 0.72769772 0.76137373 0.62810964]


These are the scores we have from the KFold cross validation.

But how can we get an overall idea about it? By using NumPy! We can use the mean and std (standart deviation) methods of the NumPy to get a better idea about how the model would perform in different scenarios.

In [4]:
print('This is the mean:', np.mean(cv_res))
print('This is the standart deviation:', np.std(cv_res))

This is the mean: 0.7070071657782433
This is the standart deviation: 0.04392473238019104


## Grid Search CV

The main goal of cross validation is to see how a model with a certain parameter would perform on different dataset split scenarios. The main goal of Grid Search CV is similar, but this time it isn't just about splitting data. It is also about trying out models with different parameters (like 'alpha' for regularization or 'k' for neighbor models) on each of these data split scenarios and then evaluating their performance with different metrics (like R2 etc.)

In other words Grid Search CV will provide us with a very big picture on our models and how they would perform in different scenarios. 

Let's see how we can perform Grid Search CV in Python:

In [5]:
from sklearn.model_selection import GridSearchCV

In [6]:
# Instantiate K-Fold
kf = KFold (n_splits = 5, shuffle = True, random_state = 4)

# Set the range of parameters
param_grid = {'alpha':np.arange(0.001,10,10)}

# Instantiate a model
lasso = Lasso()

# Assign Grid Search to a variable
lasso_grid = GridSearchCV (lasso, param_grid, cv = kf)

# Fit GridSearch like you would do with a model
lasso_grid.fit(X,y)

GridSearchCV(cv=KFold(n_splits=5, random_state=4, shuffle=True),
             estimator=Lasso(), param_grid={'alpha': array([0.001])})

In [7]:
# Get the best parameters
print ('Best parameter:',lasso_grid.best_params_)

Best parameter: {'alpha': 0.001}


In [8]:
# Get the best score
print ('Best score:', lasso_grid.best_score_)

Best score: 0.7206190370699099


In [9]:
# Use the best estimator for prediction
estimator_pred = lasso_grid.best_estimator_.predict(X)

The best_estimator_ attribute refers to the best of all the models trained and tested with Grid Search CV. This can be used make predictions just as you would do with any machine learning model. 