# Hyperparameter Tuning (Optimization)

A **hyperparameter**, such as $alpha$ in Ridge/lasso regression, $k$ in k-Nearest Neighbors or $C$ in  logistic regression, is a parameter whose value is not learned but set before the model fitting. On the other hand, the values of other parameters are derived via machine learning.

There are several approaches to choose the optimal hyperparameters. Some of the are listed below.
 - Grid search
 - Random search
 - Bayesian optimization
 - Gradient-based optimization
 - Evolutionary optimization


In [5]:
import pandas as pd
import numpy as np

In [6]:
df = pd.read_csv("datasets/diabetes.csv")

>Note: I'll keep the EDA short, since the focus of this analysis is not exploring the data.

In [9]:
df.head(3)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1


## Fitting a Model

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [14]:
X = df.drop('Outcome', axis = 1).values # features
y = df['Outcome'].values # target

### Using a Hold-out Set

Since, we want to be certain about our model's ability to generalize to unseen data, we can use a training set to tune the model's hyperparameters and a **hold-out set** to assess the performance.

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state=42)

In [9]:
# Create the classifier
logreg = LogisticRegression()

# Fit the classifier to the training data
logreg.fit(X_train, y_train)

# Predict the labels of the test set
y_pred = logreg.predict(X_test)

## Hyperparameter tuning with Grid Search (GridSearchCV)

"Grid Search is an exhaustive searching through a manually specified subset of the hyperparameter space."

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

We'll find the optimal $C$ with GridSearchCV. This hyperparameter "controls the inverse of the regularization strength". It is worth noting that smaller $C$ means stronger regularization.

In [11]:
# Setup the hyperparameter grid
c_space = np.logspace(-5, 8, 15) # Return numbers spaced evenly on a log scale. (15 = sample size )
param_grid = {'C': c_space}

# Instantiate a logistic regression classifier
logreg = LogisticRegression()

# Instantiate the GridSearchCV object
logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

# Fit it to the data
logreg_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_)) 
print("Best score is {}".format(logreg_cv.best_score_))

Tuned Logistic Regression Parameters: {'C': 268.2695795279727}
Best score is 0.7708333333333334


Logistic regression has also a $penalty$ hyperparameter in addition to $C$. Penalty specifies whether to use 'l1' or 'l2' regularization.

In [12]:
# Create the hyperparameter grid
param_grid = {'C': c_space, 'penalty': ['l1', 'l2']}

# Instantiate the GridSearchCV object
logreg_cv = GridSearchCV(logreg, param_grid, cv=5)

# Fit it to the training data
logreg_cv.fit(X_train, y_train)

# Print the optimal parameters and best score
print("Tuned Logistic Regression Parameter: {}".format(logreg_cv.best_params_))
print("Tuned Logistic Regression Accuracy: {}".format(logreg_cv.best_score_))

Tuned Logistic Regression Parameter: {'C': 31.622776601683793, 'penalty': 'l2'}
Tuned Logistic Regression Accuracy: 0.7673913043478261


There is another type of regularized regression known as the elastic net. Below shows elastic net regression with hyperparameter tuning.

In [None]:
# Import necessary modules
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, GridSearchCV

# Create train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)

# Create the hyperparameter grid
l1_space = np.linspace(0, 1, 30)
param_grid = {'l1_ratio': l1_space}

# Instantiate the ElasticNet regressor
elastic_net = ElasticNet()

# Setup the GridSearchCV object
gm_cv = GridSearchCV(elastic_net, param_grid, cv=5)

# Fit it to the training data
gm_cv.fit(X_train, y_train)

# Predict on the test set and compute metrics
y_pred = gm_cv.predict(X_test)
r2 = gm_cv.score(X_test, y_test)
mse = mean_squared_error(y_test, y_pred)
print("Tuned ElasticNet l1 ratio: {}".format(gm_cv.best_params_))
print("Tuned ElasticNet R squared: {}".format(r2))
print("Tuned ElasticNet MSE: {}".format(mse))

## Hyperparameter tuning with Random Search (RandomizedSearchCV)

When we are dealing with multiple hyperparameters, Grid Search method may be computationally expensive. Random Search algorithm can outperform Grid search and helps us mainly in cases when we have a large hyperparameter space. 

In [23]:
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

In [25]:
# Setup the parameters and distributions to sample from
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}

# Instantiate a Decision Tree classifier
tree = DecisionTreeClassifier()

# Instantiate the RandomizedSearchCV object
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)

# Fit it to the data
tree_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

Tuned Decision Tree Parameters: {'criterion': 'entropy', 'max_depth': 3, 'max_features': 6, 'min_samples_leaf': 5}
Best score is 0.7513020833333334


RandomizedSearchCV do not outperform GridSearchCV. Random Search is important because it saves on computation time.