## Hyperparameter Tuning

In this exercise you will be building a Neural network for which, you will be tuning the **Model Parameters** to find out the parameters with which the model perform its best.

You will using 

1. `Grid Search`
2. `Random Search`


### 1. Import the Packages

In [1]:
!pip install wrangle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wrangle
  Downloading wrangle-0.7.2-py3-none-any.whl (52 kB)
[K     |████████████████████████████████| 52 kB 817 kB/s 
Collecting sklearn
  Downloading sklearn-0.0.post1.tar.gz (3.6 kB)
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... [?25l[?25hdone
  Created wheel for sklearn: filename=sklearn-0.0.post1-py3-none-any.whl size=2344 sha256=ea317ac01f28c5efa4bb709532d3d69a835cc7d4043e5ee06bf1235f2109f806
  Stored in directory: /root/.cache/pip/wheels/42/56/cc/4a8bf86613aafd5b7f1b310477667c1fca5c51c3ae4124a003
Successfully built sklearn
Installing collected packages: sklearn, wrangle
Successfully installed sklearn-0.0.post1 wrangle-0.7.2


In [32]:
import os
import pandas as pd
import wrangle as wr
from numpy import nan

from keras.utils import to_categorical
from keras.activations import *
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.preprocessing import StandardScaler


In [147]:
#Read the dataset with pandas
df = pd.read_csv("/content/data.csv")
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


### 2.Basic Data Cleaning


1.   Drop the Unnamed:32` and `id` columns
2.   Consider `diagnosis` column as the labels(y) while the rest of the columns as features (X) 

**Note:**

Convert the labels in terms of 0 and 1 where 1 corresponds to M and 0 Corresponds to 0



In [148]:
def breast_cancer(df):
    '''Load and preprocess(cleaning) the dataset
    Input: Dataframe
    Output: x,y
    x:Features
    y:Labels in form of 0 and 1
    '''
    x = df.drop(["Unnamed: 32", "id", "diagnosis"], axis = 1)
    y = df["diagnosis"].map({"M":1, "B": 0})

    return x, y

In [149]:
#Call the Datacleaning Function
x, y = breast_cancer(df)
# Normalize every feature in x to mean 0, std 1 with wrangle rescale_meanzero function
scaler = StandardScaler()
x = scaler.fit_transform(x)

#Initialise the input feature dimension
input_dim = x.shape[1]

In [150]:
x.shape


(569, 30)

### 3.Decide on the Parameters to be tuned and create the model
We will be creating a 2-layer Neural Network.

In this example we will be tuning only the model parameters while the hyperparameters can be tuned in later exercise.


**Model Parameters to be tuned**
1. `first_neurons`:Number of neurons in the First layer
2. `activation`: Activation function to be used in First layer.
3. `kernel_initializer`:Initializer in both the layers
4. `optimizer`:Optimizer to be used when compiling the model.


 
 **Hyper Parameters**
 1. `epochs`
 2. `batch_size`
 3. `dropout_rate`

 
----------------------------------------------------------------
**Create the array of values for each  parameters**
1. first_neurons with values 8,9
2. activation with values relu and tanh
3. kernel_initializer with values uniform,he_uniform
4. optimizer with values Adam and SGD



**Note: Make sure to initialize the values in the same order**

In [151]:
# Model Design Components
first_neurons = [8, 9]
activation = [tf.keras.activations.relu, tf.keras.activations.tanh] 
kernel_initializer = [tf.keras.initializers.he_uniform, tf.keras.initializers.random_uniform]
optimizer = ["Adam", "SGD"]


# Hyperparameters
epochs = [10]
batch_size = [1024]
dropout_rate = [0.0]

### 4.Creating Model

In [152]:


# Function to create model, required for KerasClassifier
def create_model(first_neuron=9,activation='relu',
                 kernel_initializer= tf.keras.initializers.he_uniform,
                 dropout_rate=0,optimizer='Adam'):
  

    '''

    Input: Model params and Hyper Params to be tuned
    Output: Compiled model

    '''
    
    #1.Create sequential model

    model = Sequential()

    #2. Add the First Dense layer with params 
    #first_neuron,input_dim,kernel_initailizer,activation assigned values from actual function parameters
    
    model.add(Dense(units = input_dim,
                    kernel_initializer = kernel_initializer,
                    activation = activation))
    

    #3. Add dropout to the with the value from actual function parameter dropout_rate

    model.add(Dropout(rate = dropout_rate))


    #4. Add the Second Dense layer with params
    #Number of neurons =1
    #Kernel_initializer from function parameter
    #activation=sigmoid


    model.add(Dense(1, 
                    kernel_initializer = kernel_initializer,
                    activation = "sigmoid"))


    #5.Compile model with
    # loss='binary_crossentropy'
    # optimizer from function parameter
    # metrics=accuracy

    model.compile(loss='binary_crossentropy',
    optimizer = optimizer, 
    metrics=["accuracy"])
    

    return model

### 5. Create a Keras Classifier

In [153]:
model = KerasClassifier(build_fn=create_model)


  """Entry point for launching an IPython kernel.


### 6. Hyperparameter Tuning 1 - Grid Search
1. Create a GridSearchCV model with parameters
    - Estimator
    - Param_grid
    - n_jobs=1
    - cv=3
    - verbose=2

2. Fit the model with x,y
  

In [156]:
#parameter grid
param_grid = dict(epochs=epochs, 
                  batch_size=batch_size, 
                  optimizer=optimizer,
                  dropout_rate=dropout_rate,
                  activation=activation,
                  kernel_initializer=kernel_initializer,
                  first_neuron=first_neurons)

In [157]:
param_grid

{'epochs': [10],
 'batch_size': [1024],
 'optimizer': ['Adam', 'SGD'],
 'dropout_rate': [0.0],
 'activation': [<function keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0)>,
  <function keras.activations.tanh(x)>],
 'kernel_initializer': [keras.initializers.initializers_v2.HeUniform,
  keras.initializers.initializers_v2.RandomUniform],
 'first_neuron': [8, 9]}

In [158]:
#create GridSearchCv model
grid = GridSearchCV(estimator = model, 
                    param_grid = param_grid, 
                    n_jobs = -1, 
                    cv = 3, 
                    verbose = 2)

#Fit the model and return the result
grid_result = grid.fit(x, y)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [159]:
#Print the Best Params
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

#Explore the others
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.933241 using {'activation': <function relu at 0x7fcba3180560>, 'batch_size': 1024, 'dropout_rate': 0.0, 'epochs': 10, 'first_neuron': 8, 'kernel_initializer': <class 'keras.initializers.initializers_v2.RandomUniform'>, 'optimizer': 'Adam'}
0.636517 (0.171341) with: {'activation': <function relu at 0x7fcba3180560>, 'batch_size': 1024, 'dropout_rate': 0.0, 'epochs': 10, 'first_neuron': 8, 'kernel_initializer': <class 'keras.initializers.initializers_v2.HeUniform'>, 'optimizer': 'Adam'}
0.806451 (0.098392) with: {'activation': <function relu at 0x7fcba3180560>, 'batch_size': 1024, 'dropout_rate': 0.0, 'epochs': 10, 'first_neuron': 8, 'kernel_initializer': <class 'keras.initializers.initializers_v2.HeUniform'>, 'optimizer': 'SGD'}
0.933241 (0.027581) with: {'activation': <function relu at 0x7fcba3180560>, 'batch_size': 1024, 'dropout_rate': 0.0, 'epochs': 10, 'first_neuron': 8, 'kernel_initializer': <class 'keras.initializers.initializers_v2.RandomUniform'>, 'optimizer': 'Adam'}
0.

### 7. Hyperparameter Tuning 1 - Randomized Search
1. Create a RandomizedSearchCV model with parameters
    - Estimator as model
    - param_dist
    - n_iter=8
    - n_jobs=1
    - cv=3
    - verbose=2

2. Fit the model with x,y
  

In [160]:
param_dist = dict(epochs=epochs, 
                  batch_size=batch_size, 
                  optimizer=optimizer,
                  dropout_rate=dropout_rate,
                  activation=activation,
                  kernel_initializer=kernel_initializer,
                  first_neuron=first_neurons)

In [161]:
#Create the randomsearccv model
random_search = RandomizedSearchCV(estimator = model, 
                    param_distributions = param_dist, 
                    n_jobs = -1, 
                    cv = 3, 
                    verbose = 2)


#Fit the model
random_search.fit(x, y)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


RandomizedSearchCV(cv=3,
                   estimator=<keras.wrappers.scikit_learn.KerasClassifier object at 0x7fcb9ac97890>,
                   n_jobs=-1,
                   param_distributions={'activation': [<function relu at 0x7fcba3180560>,
                                                       <function tanh at 0x7fcba31808c0>],
                                        'batch_size': [1024],
                                        'dropout_rate': [0.0], 'epochs': [10],
                                        'first_neuron': [8, 9],
                                        'kernel_initializer': [<class 'keras.initializers.initializers_v2.HeUniform'>,
                                                               <class 'keras.initializers.initializers_v2.RandomUniform'>],
                                        'optimizer': ['Adam', 'SGD']},
                   verbose=2)

In [162]:
# Print the best params 
print("Best: %f using %s" % (random_search.best_score_, random_search.best_params_))


#Explore the others
means = random_search.cv_results_['mean_test_score']
stds = random_search.cv_results_['std_test_score']
params = random_search.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.927968 using {'optimizer': 'Adam', 'kernel_initializer': <class 'keras.initializers.initializers_v2.RandomUniform'>, 'first_neuron': 8, 'epochs': 10, 'dropout_rate': 0.0, 'batch_size': 1024, 'activation': <function tanh at 0x7fcba31808c0>}
0.539590 (0.219967) with: {'optimizer': 'Adam', 'kernel_initializer': <class 'keras.initializers.initializers_v2.HeUniform'>, 'first_neuron': 8, 'epochs': 10, 'dropout_rate': 0.0, 'batch_size': 1024, 'activation': <function tanh at 0x7fcba31808c0>}
0.650135 (0.052311) with: {'optimizer': 'Adam', 'kernel_initializer': <class 'keras.initializers.initializers_v2.HeUniform'>, 'first_neuron': 8, 'epochs': 10, 'dropout_rate': 0.0, 'batch_size': 1024, 'activation': <function relu at 0x7fcba3180560>}
0.718936 (0.121653) with: {'optimizer': 'Adam', 'kernel_initializer': <class 'keras.initializers.initializers_v2.HeUniform'>, 'first_neuron': 9, 'epochs': 10, 'dropout_rate': 0.0, 'batch_size': 1024, 'activation': <function relu at 0x7fcba3180560>}
0.912

### Save your answers by running the cell below

In [None]:
import pickle
with open('grid1.pkl', 'wb') as handle:
  pickle.dump(grid.param_grid, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('grid2.pkl', 'wb') as handle:
  pickle.dump(grid.n_jobs, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('grid3.pkl', 'wb') as handle:
  pickle.dump((grid.classes_).tolist(), handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('ran1.pkl', 'wb') as handle:
  pickle.dump(random_search.param_distributions, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('ran2.pkl', 'wb') as handle:
  pickle.dump(random_search.n_iter, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('ran3.pkl', 'wb') as handle:
  pickle.dump(random_search.n_splits_, handle, protocol=pickle.HIGHEST_PROTOCOL)

save_model=create_model()
save_model.save('model.h5')


Don't stop! your learning ! Tune more to explore more.

1. Tune the activations with other values like 'sigmoid','hard_sigmoid','linear',etc.


2. Tune the Kernel initializers with values like normal and zero


3. Tune the Optimizers with RMSprop, Adamax etc

