# Import Libraries and Dataset

In [None]:
import pandas as pd
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = pd.read_csv(url, header=None)

In [None]:
dataframe.head()

In [None]:
dataframe.shape

This dataset is the sonar dataset of 208 rows with 60 features and is a staple of ML datasets for classification learning

In [None]:
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)

# Define Model

In [None]:
from scipy.stats import loguniform
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV

model = LogisticRegression()

# Summary of Searches

Both techniques evaluate models for a given hyperparameter vector using cross-validation, hence the "CV." This `cv` argument allows either an integer number of folds to be specified or a configured cross-validation object. In this case we define a cross-validation object so as to gain more control over the model evaluation and make the evaluation procedure obvious and explicit.

And both hyperparameter optimization classes provide a `scoring` argument that takes a string indicating the metric to optimize. This metric must be maximizing, meaning better models results in larger scores. It's not looking to minimize some sort of loss function, but rather to maximize a scoring function. For classification tasks this could be `accuracy` and for regressions this could be `neg_mean_absolute_error` where values closer to zero reprsent less prediction error by the model.

# Random Search for Hyperparameter Tuning

The RandomSearch needs the search space which is defined as a dictionary where the names are the hyperparameter arguments to the model and the values are discrete values or a distribution of values to sample in the the random search. Repeats Stratified K-Fold n times with different randomization in each repetition.

In [None]:
# define the evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define the search space
space = dict()
space['solver'] = ['newton-cg','lbfgs','liblinear']
space['penalty'] = ['none','l1','l2','elasticnet']
space['C'] = loguniform(1e-5, 100)
# define search
search = RandomizedSearchCV(model,
                            space,
                            n_iter=500,
                            scoring='accuracy',
                            n_jobs=-1,
                            cv=cv,
                            random_state=1)
# execute search
result = search.fit(X, y)

In [None]:
# Summarize Results
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)

# Grid Search for Hyperparameter Tuning

With GridSearch the search space must be a discrete grid to be searched. This means that instead of using a log-uniform for `C`, we need to specify discrete values on a log scale

In [None]:
model_2 = LogisticRegression()

# define evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define search space
space = dict()
space['solver'] = ['newton-cg','lbfgs','liblinear']
space['penalty'] = ['none','l1','l2','elasticnet']
space['C'] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]
# define search
search = GridSearchCV(model_2, space, scoring='accuracy', n_jobs=-1, cv=cv)
# execute search
result = search.fit(X, y)
# summarize result

In [None]:
# Summarize the Results
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)