

# Hyperparameter Tuning Practice

## *Data Science Unit 4 Sprint 2 Assignment 3*

# Gridsearch Hyperparameters

In the guided project, you learned how to use sklearn's `GridsearchCV` and `keras-tuner` libraries to tune the hyperparameters of a neural network model. For your module project you'll continue using these two libraries, however we are going to make things a little more interesting for you. 

Continue to use TensorFlow Keras & a sample of the [Quickdraw dataset](https://github.com/googlecreativelab/quickdraw-dataset) to build a sketch classification model. The dataset has been sampled to only 10 classes and 10000 observations per class. 



**Don't forget to switch to GPU on Colab!**

## 0.1 Imports and installs

In [None]:
# native python libraries imports 
import math
from time import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# sklearn imports 
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

# keras imports 
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.activations import relu, sigmoid
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.utils import get_file

# required for compatibility between sklearn and keras
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# install keras-tuner
!pip install keras-tuner
from kerastuner.tuners import RandomSearch, BayesianOptimization, Sklearn
from kerastuner.engine.hyperparameters import HyperParameters

Collecting keras-tuner
  Downloading keras_tuner-1.1.0-py3-none-any.whl (98 kB)
[?25l[K     |███▍                            | 10 kB 36.9 MB/s eta 0:00:01[K     |██████▊                         | 20 kB 33.0 MB/s eta 0:00:01[K     |██████████                      | 30 kB 19.8 MB/s eta 0:00:01[K     |█████████████▍                  | 40 kB 16.5 MB/s eta 0:00:01[K     |████████████████▊               | 51 kB 9.3 MB/s eta 0:00:01[K     |████████████████████            | 61 kB 9.1 MB/s eta 0:00:01[K     |███████████████████████▍        | 71 kB 9.4 MB/s eta 0:00:01[K     |██████████████████████████▊     | 81 kB 10.5 MB/s eta 0:00:01[K     |██████████████████████████████  | 92 kB 10.9 MB/s eta 0:00:01[K     |████████████████████████████████| 98 kB 5.6 MB/s 
Collecting kt-legacy
  Downloading kt_legacy-1.0.4-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.1.0 kt-legacy-1.0.4




## 0.2 Load quickdraw data

In [None]:
def load_quickdraw10():
    """
    Fill out this doc string, and comment the code, for practice in writing the kind of code that will get you hired. 
    """
    
    URL_ = "https://github.com/LambdaSchool/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/quickdraw10.npz?raw=true"
    
    path_to_zip = get_file('./quickdraw10.npz', origin=URL_, extract=False)

    data = np.load(path_to_zip)
    
    # normalize your image data
    max_pixel_value = 255
    X = data['arr_0']/max_pixel_value
    Y = data['arr_1']
        
    return train_test_split(X, Y, shuffle=True)

In [None]:
X_train, X_test, y_train, y_test = load_quickdraw10()

Downloading data from https://github.com/LambdaSchool/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/quickdraw10.npz?raw=true


In [None]:
X_train.shape

In [None]:
y_train.shape

_____

# Experiment 1

## Tune Hyperperameters using Enhanced GridsearchCV 

We are going to use GridsearchCV again to tune a deep learning model however we are going to add some additional functionality to our gridsearch. 

Specifically, we are going to automate the generation of how many nodes to use in a layer and how many layers to use in a model! 

By the way, yes, there is a function within a function. Try to not let that bother you. An alternative to this would be to create a class. If you're up for the challenge give it a shot. However, consider this a stretch goal that you come back to after you finish going through this assignment. 


### Objective 

The objective of this experiment is to show you how to automate the generation of layers and layer nodes for the purposes of gridsearch. <br>
Up until now, we've been manually selecting the number of layers and layer nodes.

In [None]:
# Function to create model, required for KerasClassifier
def create_model(n_layers,  first_layer_nodes, last_layer_nodes, act_funct ="relu", negative_node_incrementation=True):
    """"
    Returns a compiled keras model 
    
    Parameters
    ----------
    n_layers: int 
        number of hidden layers in model 
        To be clear, this excludes the input and output layer.
        
    first_layer_nodes: int
        Number of nodes in the first hidden layer 

    last_layer_nodes: int
        Number of nodes in the last hidden layer (this is the layer just prior to the output layer)
        
     act_funct: string 
         Name of activation function to use in hidden layers (this excludes the output layer)
        
    Returns
    -------
    model: keras object 
    """
    
    def gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation=True):
        """
        Generates and returns the number of nodes in each hidden layer. 
        To be clear, this excludes the input and output layer. 

        Note
        ----
        Number of nodes in each layer is linearly incremented. 
        For example, gen_layer_nodes(5, 500, 100) will generate [500, 400, 300, 200, 100]

        Parameters
        ----------
        n_layers: int
            Number of hidden layers
            This values should be 2 or greater 

        first_layer_nodes: int

        last_layer_nodes: int

        Returns
        -------
        layers: list of ints
            Contains number of nodes for each layer 
        """

        # throws an error if n_layers is less than 2 
        assert n_layers >= 2, "n_layers must be 2 or greater"

        layers = []

        # PROTIP: IF YOU WANT THE NODE INCREMENTATION TO BE SPACED DIFFERENTLY
        # THEN YOU'LL NEED TO CHANGE THE WAY THAT IT'S CALCULATED - HAVE FUN!
        # when set to True number of nodes are decreased for subsequent layers 
        # NOTE: the order of the number of nodes doesn't matter
        if negative_node_incrementation:
            # subtract this amount from previous layer's nodes in order to increment towards smaller numbers 
            nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)
            
        # when set to False number of nodes are increased for subsequent layers
        else:
            # add this amount from previous layer's nodes in order to increment towards larger numbers 
            nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)

        nodes = first_layer_nodes

        for i in range(1, n_layers+1):

            layers.append(math.ceil(nodes))

            # increment nodes for next layer 
            nodes = nodes + nodes_increment

        return layers
    
    # create model
    model = Sequential()
    
    n_nodes = gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation)
    
    for i in range(1, n_layers):
        if i==1:
            model.add(Dense(first_layer_nodes, input_dim=X_train.shape[1], activation=act_funct))
        else:
            model.add(Dense(n_nodes[i-1], activation=act_funct))
            
            
    # output layer 
    model.add(Dense(10, # 10 unit/neurons in output layer because we have 10 possible labels to predict  
                    activation='softmax')) # use softmax for a label set greater than 2            
    
    # Compile model
    model.compile(loss='sparse_categorical_crossentropy', 
                  optimizer='adam', # adam is a good default optimizer 
                  metrics=['accuracy'])
    
    # do not include model.fit() inside the create_model function
    # KerasClassifier is expecting a complied model 
    return model


## 1.1 Explore `create_model`



The helper function `gen_layer_nodes()` which is contained inside `create_model()` <br>
returns a list containing the number of nodes for each successive layer.<br>

Let's check that `gen_layer_nodes()` behaves as expected. <br>
In other words, we'll perform a **Unit Test!**

In [None]:
    def gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation=True):
        """
        Generates and returns the number of nodes in each hidden layer. 
        To be clear, this excludes the input and output layer. 

        Note
        ----
        Number of nodes in each layer is linearly incremented. 
        For example, gen_layer_nodes(5, 500, 100) will generate [500, 400, 300, 200, 100]

        Parameters
        ----------
        n_layers: int
            Number of hidden layers
            This values should be 2 or greater 

        first_layer_nodes: int

        last_layer_nodes: int

        Returns
        -------
        layers: list of ints
            Contains number of nodes for each layer 
        """

        # throws an error if n_layers is less than 2 
        assert n_layers >= 2, "n_layers must be 2 or greater"

        layers = []

        # PROTIP: IF YOU WANT THE NODE INCREMENTATION TO BE SPACED DIFFERENTLY
        # THEN YOU'LL NEED TO CHANGE THE WAY THAT IT'S CALCULATED - HAVE FUN!
        # when set to True number of nodes are decreased for subsequent layers 
        # NOTE: the order of the number of nodes doesn't matter
        if negative_node_incrementation:
            # subtract this amount from previous layer's nodes in order to increment towards smaller numbers 
            nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)
            #print(f'nodes increment = {nodes_increment}')
            
        # when set to False number of nodes are increased for subsequent layers
        else:
            # add this amount from previous layer's nodes in order to increment towards larger numbers 
            nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)
            #print(f'nodes increment = {nodes_increment}')

        nodes = first_layer_nodes

        for i in range(1, n_layers+1):

            layers.append(math.ceil(nodes))

            # increment nodes for next layer 
            nodes = nodes + nodes_increment

        return layers




### `negative_node_incrementation = True`
For this case we want the number of nodes to _decrease_ by a constant number for successive layers. <br>So `first_layer_nodes` must be _larger_ than `last_layer_nodes` 

In [None]:
n_layers = 5
first_layer_nodes = 500
last_layer_nodes = 100
negative_node_incrementation = True
n_nodes = gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation)
print(f'Number of nodes in successive layers: {n_nodes}')

nodes increment = -100.0
Number of nodes in successive layers: [500, 400, 300, 200, 100]


### `negative_node_incrementation = False`
For this case we want the number of nodes to _increase_ by a constant number for successive layers. <br>So `first_layer_nodes` must be _smaller_ than `last_layer_nodes` 

In [None]:
n_layers = 5
first_layer_nodes = 100
last_layer_nodes = 500
negative_node_incrementation = False
n_nodes = gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation)
print(f'Number of nodes in successive layers: {n_nodes}')

nodes increment = 100.0
Number of nodes in successive layers: [100, 200, 300, 400, 500]


### OK, the Unit Test is passed!

### Let's build a few models<br> 
in order to understand how `create_model()` works in practice. 

### Build a model, setting `negative_node_incrementation = True` 

Use `create_model` to build a model. 

- Set `n_layers = 10` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`


In [None]:
# use create_model to create a model 

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# run model.summary() and make sure that you understand the model architecture that you just built 
# Notice in the model summary how the number of nodes have been linearly incremented in decreasing values. 
model.summary()

### Build a model, setting `negative_node_incrementation = False` 

Use `create_model` to build a model. 

- Set `n_layers = 10` 
- Set `first_layer_nodes = 100`
- Set `last_layer_nodes = 500`
- Set `act_funct = "relu"`

In [None]:
# use create_model to create a model 

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# run model.summary() and make sure that you understand the model architecture that you just built 
# Notice in the model summary how the number of nodes have been linearly incremented in decreasing values. 
model.summary()

## 1.2 Create a grid search using `sklearn`

### Hyperparameter search

In [None]:
# define the grid search parameters
param_grid = {'n_layers': [2, 3],
              'epochs': [3], 
              "first_layer_nodes": [500, 300],
              "last_layer_nodes": [100, 50]
             }

In [None]:
model = KerasClassifier(create_model)

In [None]:
%%time
# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=param_grid, 
                    n_jobs=-2, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Fitting 3 folds for each of 8 candidates, totalling 24 fits


[Parallel(n_jobs=-2)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3


[Parallel(n_jobs=-2)]: Done  24 out of  24 | elapsed:  7.9min finished


Epoch 1/3
Epoch 2/3
Epoch 3/3
Best: 0.865880012512207 using {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 3}
Means: 0.8655733466148376, Stdev: 0.002064639243405658 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 100, 'n_layers': 2}
Means: 0.8651600082715353, Stdev: 0.0021779715026343997 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 100, 'n_layers': 3}
Means: 0.8637866775194804, Stdev: 0.0016057747023210903 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 2}
Means: 0.865880012512207, Stdev: 0.005965598909352285 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 3}
Means: 0.8622133334477743, Stdev: 0.0006534564363964282 with: {'epochs': 3, 'first_layer_nodes': 300, 'last_layer_nodes': 100, 'n_layers': 2}
Means: 0.8614266713460287, Stdev: 0.003285559739582114 with: {'epochs': 3, 'first_layer_nodes': 300, 'last_layer_nodes': 100, 'n_layers': 3}
Means: 0.860279

In [None]:
best_model = grid_result.best_estimator_

In [None]:
best_model.get_params()

Ok, now that we've played around a bit with  `create_model`, let's build a  simpler model that we'll use to run gridsearches. 

### Build model

Use `create_model` to build a model. 

- Set `n_layers = 2` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = True`

In [None]:
# use create_model to create a model 

###BEGIN SOLUTION
# use create_model to create a model 

###END SOLUTION

In [None]:
# run model.summary() and make sure that you understand the model architecture that you just built 
model.summary()

Model: "sequential_27"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_83 (Dense)             (None, 500)               392500    
_________________________________________________________________
dense_84 (Dense)             (None, 10)                5010      
Total params: 397,510
Trainable params: 397,510
Non-trainable params: 0
_________________________________________________________________


In [None]:
# define the grid search parameters
param_grid = {'n_layers': [2, 3],
              'epochs': [3], 
              "first_layer_nodes": [500, 300],
              "last_layer_nodes": [100, 50]
             }

In [None]:
model = KerasClassifier(create_model)

In [None]:
%%time
# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=param_grid, 
                    n_jobs=-2, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Fitting 3 folds for each of 8 candidates, totalling 24 fits


[Parallel(n_jobs=-2)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 1/3
Epoch 2/3
Epoch 3/3


[Parallel(n_jobs=-2)]: Done  24 out of  24 | elapsed:  4.7min finished


Epoch 1/3
Epoch 2/3
Epoch 3/3
Best: 0.8687066634496053 using {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 2}
Means: 0.8660400112469991, Stdev: 0.0018495458387242343 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 100, 'n_layers': 2}
Means: 0.8646533091862997, Stdev: 0.0064865267530943484 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 100, 'n_layers': 3}
Means: 0.8687066634496053, Stdev: 0.0008095208472150726 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 2}
Means: 0.8637599945068359, Stdev: 0.003709595009580496 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 3}
Means: 0.861133337020874, Stdev: 0.006221717108515905 with: {'epochs': 3, 'first_layer_nodes': 300, 'last_layer_nodes': 100, 'n_layers': 2}
Means: 0.8648000160853068, Stdev: 0.00081583667085494 with: {'epochs': 3, 'first_layer_nodes': 300, 'last_layer_nodes': 100, 'n_layers': 3}
Means: 0.859893

In [None]:
best_model = grid_result.best_estimator_

In [None]:
best_model.get_params()

{'build_fn': <function __main__.create_model>,
 'epochs': 3,
 'first_layer_nodes': 500,
 'last_layer_nodes': 50,
 'n_layers': 2}

-----

# Experiment 2: Run the Gridsearch Algorithms 

In this section, we are going to use the same model and dataset in order to benchmark 3 different gridsearch approaches: 

- Gridsearch
- Random Search
- Bayesian Optimization. 


Our goal in this experiment is two-fold. We want to see which appraoch 

- Scores the highest accuracy
- Has the shortest run time 

We want to see how these 3 gridsearch approaches handle these trade-offs and to give you a sense of those trades offs.

### Trade-offs

`Gridsearch` will train a model on every single unique hyperparameter combination, this guarantees that you'll get the highest possible accuracy from your parameter set but your gridsearch might have a very long run-time. 

`Random Search` will randomly sample from your parameter set which, depending on how many samples, the run-time might be significantly cut down but you might or might not sample the parameters that correspond to the heightest possible accuracies. 

`Bayesian Optimization` has a bit of intelligence built into it's search algorithm but you do need to manually select some parameters which may greatly influence the model learning outcomes. 

-------
### Build our model

In [None]:
# because gridsearching can take a lot of time and we are bench marking 3 different approaches
# let's build a simple model to minimize run time 

def build_model(hp):
    
    """
    Returns a complied keras model ready for keras-tuner gridsearch algorithms 
    """
    
    model = Sequential()
    
    # hidden layer
    model.add(Dense(units=hp.get('units'),activation=hp.get("activation")))
    
    # output layer
    model.add(Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=Adam(hp.get('learning_rate')),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
    return model
  

In [None]:
# build out our hyperparameter dictionary 
hp = HyperParameters()
hp.Int('units', min_value=32, max_value=512, step=32)
hp.Choice('learning_rate',values=[1e-1, 1e-2, 1e-3])
hp.Choice('activation',values=["relu", "sigmoid"])

'relu'

---------
## 2.1 Gridsearch Optimization


### Populate a `sklearn` compatible parameter dictionary

In [None]:
# build out our hyperparameter dictionary 
hyper_parameters = {
    # BUG Fix: cast array as list otherwise GridSearchCV will throw error
    "units": np.arange(32, 512, 32).tolist(),
    "learning_rate": [1e-1, 1e-2, 1e-3],
    "activation":["relu", "sigmoid"]
}

In [None]:
hyper_parameters

### Build a `sklearn` compatible model function

In [None]:
def build_model(units, learning_rate, activation):
    
    """
    Returns a complie keras model ready for keras-tuner gridsearch algorithms 
    """
    
    model = Sequential()
    
    # hidden layer
    model.add(Dense(units, activation=activation))
    
    # output layer
    model.add(Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=Adam(learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
    return model

### Apply the "wrapper" to make the model compatible with `sklearn`

In [None]:
model = KerasClassifier(build_fn = build_model)

In [None]:
# save start time 
start = time()

# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=hyper_parameters, 
                    n_jobs=-2, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# save end time 
end = time()

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

In [None]:
# total run time 
total_run_time_in_miniutes = (end - start)/60
total_run_time_in_miniutes

In [None]:
grid_result.best_params_

In [None]:
# because all other optimization approaches are reporting test set score
# let's calculate the test set score in this case 
best_model = grid_result.best_estimator_
test_acc = best_model.score(X_test, y_test)

In [None]:
test_acc

 ### Results
 
Identify and write the the best performing hyperparameter combination and model score. 
 
 

YOUR ANSWER HERE

------
## 2.2 Random Search with `keras-tuner`

Be sure to check out the [**docs for Keras-Tuner**](https://keras-team.github.io/keras-tuner/documentation/tuners/). Here you can read about the input parameters for the `RandomSearch` tuner.

In [None]:
# how many unique hyperparameter combinations do we have? 
# HINT: take the product of the number of possible values for each hyperparameter 
# save your answer to n_unique_hparam_combos

# YOUR CODE HERE
n_unique_hparam_combos = 
raise NotImplementedError()

In [None]:
# how many of these do we want to randomly sample?
# let's pick 25% of n_unique_hparam_combos param combos to sample
# save this number to n_param_combos_to_sample

# YOUR CODE HERE
fraction_to_sample = 0.25
n_param_combos_to_sample = 
# raise NotImplementedError()

### Instantiate a `RandomSearch()` object for your grid search

In [None]:
random_tuner = RandomSearch(
            build_model,
            objective='val_accuracy',
            max_trials=n_param_combos_to_sample, # number of times to sample the parameter set and build a model 
            seed=1234,
            hyperparameters=hp, # pass in our hyperparameter dictionary
            directory='./keras-tuner-trial',
            project_name='random_search')

In [None]:
 # take note of Total elapsed time in print out -- took ~10 minutes without GPU
random_tuner.search(X_train, y_train,
                    epochs=3,
                    validation_data=(X_test, y_test))

Trial 20 Complete [00h 00m 34s]
val_accuracy: 0.8131999969482422

Best val_accuracy So Far: 0.8730000257492065
Total elapsed time: 00h 10m 09s
INFO:tensorflow:Oracle triggered exit


In [None]:
# identify the best score and hyperparamter (should be at the top since scores are ranked)
random_tuner.results_summary()

Results summary
Results in ./keras-tuner-trial/random_search
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
units: 384
learning_rate: 0.001
activation: relu
Score: 0.8730000257492065
Trial summary
Hyperparameters:
units: 352
learning_rate: 0.001
activation: relu
Score: 0.8729199767112732
Trial summary
Hyperparameters:
units: 448
learning_rate: 0.001
activation: sigmoid
Score: 0.8636400103569031
Trial summary
Hyperparameters:
units: 480
learning_rate: 0.001
activation: sigmoid
Score: 0.8621199727058411
Trial summary
Hyperparameters:
units: 416
learning_rate: 0.001
activation: sigmoid
Score: 0.8591200113296509
Trial summary
Hyperparameters:
units: 288
learning_rate: 0.001
activation: sigmoid
Score: 0.8572400212287903
Trial summary
Hyperparameters:
units: 224
learning_rate: 0.001
activation: sigmoid
Score: 0.8525599837303162
Trial summary
Hyperparameters:
units: 320
learning_rate: 0.01
activation: sigmoid
Score: 0.8378400206565857
Tri

 ### Results
 
Identify and write the the best performing hyperparameter combination and model score. 
Note that because this is Random Search, multiple runs might have slighly different outcomes. 
 
 

YOUR ANSWER HERE

------
## 2.3 Bayesian Optimization with `keras-tuner`

![](https://upload.wikimedia.org/wikipedia/commons/0/02/GpParBayesAnimationSmall.gif)

Be sure to check out the [**docs for Keras-Tuner**](https://keras-team.github.io/keras-tuner/documentation/tuners/). Here you can read about the input parameters for the `BayesianOptimization` tuner.

Pay special attention to these `BayesianOptimization` parameters: `num_initial_points` and `beta`. 

`num_initial_points`: 

Number of randomly selected hyperparameter combinations to try before applying bayesian probability to determine liklihood of which param combo to try next based on expected improvement


`beta`: 

Larger values means more willing to explore new hyperparameter combinations (analogous to searching for the global minimum in Gradient Descent), smaller values means that it is less willing to try new hyperparameter combinations (analogous to getting stuck in a local minimum in Gradient Descent). 

As a start, error on the side of larger values. What defines a small or large value you ask? That question would pull us into the mathematical intricacies of Bayesian Optimization and Gaussian Processes. For simplicity, notice that the default value is 2.6 and work from there. 

In [None]:
# we know that 24 samples is about 25% of 96 possible hyper-parameter combos
# let's set up a run with the same parameters we used for RandomSearch() so the comparison will be aplles-to-apples
# feel free to play with any of these numbers later
max_trials=24
num_initial_points=5
beta=5.0

#### Instantiate a `BayesianOptimization()` object for your grid search

In [None]:
bayesian_tuner = BayesianOptimization(
                    build_model,
                    objective='val_accuracy',
                    max_trials=max_trials,
                    hyperparameters=hp, # pass in our hyperparameter dictionary
                    num_initial_points=num_initial_points, 
                    beta=beta, 
                    seed=1234,
                    directory='./keras-tuner-trial',
                    project_name='bayesian_optimization_4')

INFO:tensorflow:Reloading Oracle from existing project ./keras-tuner-trial/bayesian_optimization_4/oracle.json
INFO:tensorflow:Reloading Tuner from ./keras-tuner-trial/bayesian_optimization_4/tuner0.json


In [None]:
bayesian_tuner.search(X_train, y_train,
               epochs=3,
               validation_data=(X_test, y_test))

Trial 9 Complete [00h 00m 21s]
val_accuracy: 0.8266800045967102

Best val_accuracy So Far: 0.875
Total elapsed time: 00h 03m 05s
INFO:tensorflow:Oracle triggered exit


In [None]:
bayesian_tuner.results_summary()

Results summary
Results in ./keras-tuner-trial/bayesian_optimization_4
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
units: 512
learning_rate: 0.001
activation: relu
Score: 0.875
Trial summary
Hyperparameters:
units: 352
learning_rate: 0.001
activation: relu
Score: 0.8678799867630005
Trial summary
Hyperparameters:
units: 480
learning_rate: 0.001
activation: sigmoid
Score: 0.860319972038269
Trial summary
Hyperparameters:
units: 256
learning_rate: 0.001
activation: sigmoid
Score: 0.8562399744987488
Trial summary
Hyperparameters:
units: 352
learning_rate: 0.01
activation: sigmoid
Score: 0.8393200039863586
Trial summary
Hyperparameters:
units: 512
learning_rate: 0.01
activation: sigmoid
Score: 0.8347200155258179
Trial summary
Hyperparameters:
units: 32
learning_rate: 0.001
activation: relu
Score: 0.8316400051116943
Trial summary
Hyperparameters:
units: 32
learning_rate: 0.001
activation: relu
Score: 0.83024001121521
Trial summary
Hype

 ### Results
 
Identify and write the the best performing hyperparameter combination and model score. 
Note that because this is  Bayesian Optimization, multiple runs might have slighly different outcomes. 
 
 

YOUR ANSWER HERE

We should point out that Gridsearch split the training set internally and created a test set whereas keras-tuner allows us to pass in a test set. This means that the keras-tuner algorithms were using one test set and our skearn GridSearchCV was using a different test set - so this isn't a perfectly exact 1-to-1 comparision but it'll have to do. In order to compensate for this, we did score the best model on the same test set that keras-tuner used. 

_______

# Conclusion

The spirit of this experiment is to expose you to the idea of benchmarking and comparing the trade-offs of various gridsearch approaches. 

Even if we did find a way to pass in the original test set into GridSearchCV, we can see that both Random Search and Bayesian Optimization are arguably better alternatives to a brute force grid search when we consider the trade-offs of run time and locating the best performing model. 

----

# Stretch Goals

- Feel free to run whatever gridsearch experiments on whatever models you like!

In [None]:
# this is your open playground - be free to explore as you wish 