#### Hyperparameter Tuning 

CO2: Improve model performance of multiple linear regression using grid search hyperparameter tuning technique												3 Hours	  
1. Hyperparameters, 
2. common hyperparameters with examples, and 
3. hyperparameter tuning techniques:  
    - grid search, 
    - random search

#### Grid Search   
**Use Keras Models in scikit-learn**: Keras models can be used in scikit-learn by wrapping them with the KerasClassifier or KerasRegressor class from the module SciKeras.   

Grid search is a model hyperparameter optimization technique.
In scikit-learn, this technique is provided in the GridSearchCV class.

When constructing this class, provide a dictionary of hyperparameters to evaluate in the param_grid argument.  
This is a map of the model parameter name and an array of values to try.   

By default, accuracy score is optimised,  
but we can specify other scores in the score argument of the GridSearchCV constructor.  

The GridSearchCV process constructs and evaluates one model for each combination of parameters.  
It uses cross validation to evaluate each individual model.

Pima Indians onset of diabetes classification dataset is used in the example below.  
This is an easy to work with small dataset with all numerical attributes.  

In [1]:
# Use scikit-learn to grid search the batch size and epochs
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

from sklearn.model_selection import GridSearchCV
from scikeras.wrappers import KerasClassifier

import warnings
warnings.filterwarnings('ignore')

In [2]:
def create_model():
    model = Sequential()
    model.add(Dense(12, input_shape=(8,), activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In [3]:
seed = 7
tf.random.set_seed(seed)

dataset = np.loadtxt("../Data/pima-indians-diabetes.csv", delimiter=",")

X = dataset[:,0:8]
Y = dataset[:,8]

In [6]:
len(Y)

768

In [7]:
model = KerasClassifier(model=create_model, verbose=0)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)

In [8]:
param_grid

{'batch_size': [10, 20, 40, 60, 80, 100], 'epochs': [10, 50, 100]}

In [9]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)

In [10]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.704427 using {'batch_size': 10, 'epochs': 100}
0.554688 (0.086821) with: {'batch_size': 10, 'epochs': 10}
0.657552 (0.023939) with: {'batch_size': 10, 'epochs': 50}
0.704427 (0.039879) with: {'batch_size': 10, 'epochs': 100}
0.606771 (0.025780) with: {'batch_size': 20, 'epochs': 10}
0.669271 (0.012890) with: {'batch_size': 20, 'epochs': 50}
0.683594 (0.013902) with: {'batch_size': 20, 'epochs': 100}
0.519531 (0.032369) with: {'batch_size': 40, 'epochs': 10}
0.622396 (0.028764) with: {'batch_size': 40, 'epochs': 50}
0.674479 (0.030978) with: {'batch_size': 40, 'epochs': 100}
0.537760 (0.054532) with: {'batch_size': 60, 'epochs': 10}
0.652344 (0.043146) with: {'batch_size': 60, 'epochs': 50}
0.684896 (0.022628) with: {'batch_size': 60, 'epochs': 100}
0.550781 (0.072098) with: {'batch_size': 80, 'epochs': 10}
0.618490 (0.009744) with: {'batch_size': 80, 'epochs': 50}
0.627604 (0.035132) with: {'batch_size': 80, 'epochs': 100}
0.567708 (0.036828) with: {'batch_size': 100, 'epochs':

We can tune   
1. Batch Size and Number of Epochs   
2. Training Optimization Algorithm - SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam   
3. Learning Rate and Momentum   
4. Network Weight Initialization   
5. Neuron Activation Function - softmax, softplus, softsign, relu, tanh, sigmoid, hard_sigmoid, linear   
6. Dropout Regularization   
7. Number of Neurons in the Hidden Layer   

#### Random Search   
Define a search space as a bounded domain of hyperparameter values and randomly sample points in that domain.   


##### random search logistic regression model on the sonar dataset   

In [11]:
from scipy.stats import loguniform
from pandas import read_csv
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import RandomizedSearchCV

In [12]:
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
data = read_csv(url, header=None).values
X, y = data[:, :-1], data[:, -1]

In [13]:
model = LogisticRegression()

# define evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)

# define search space
space = dict()
space['solver'] = ['newton-cg', 'lbfgs', 'liblinear']
space['penalty'] = ['none', 'l1', 'l2', 'elasticnet']
space['C'] = loguniform(1e-5, 100)

# define search
search = RandomizedSearchCV(model, space, n_iter=500, scoring='accuracy', n_jobs=-1, cv=cv, random_state=1)

# execute search
result = search.fit(X, y)

# summarize result
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)

Best Score: 0.7881746031746033
Best Hyperparameters: {'C': 4.878363034905761, 'penalty': 'l2', 'solver': 'newton-cg'}
