

# Tune Practice

## *Data Science Unit 4 Sprint 2 Assignment 3*

# Gridsearch Hyperparameters

In the guided project, you learned how to use sklearn's GridsearchCV and keras-tuner library to tune the hyperparamters of a neural network model. For your module project you'll continue using these two libraries however we are going to make things a little more interesting for you. 

Continue to use TensorFlow Keras & a sample of the [Quickdraw dataset](https://github.com/googlecreativelab/quickdraw-dataset) to build a sketch classification model. The dataset has been sampled to only 10 classes and 10000 observations per class. 



**Don't forgot to switch to GPU on Colab!**

In [1]:
# native python libraries imports 
import math
from time import time

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# sklearn imports 
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

# keras imports 
import keras
from keras import Sequential
from keras.layers import Dense
from kerastuner.tuners import RandomSearch, BayesianOptimization, Sklearn
from kerastuner.engine.hyperparameters import HyperParameters
from keras.activations import relu, sigmoid
from keras.optimizers import Adam, SGD
from keras.utils import get_file

# required for compatibility between sklearn and keras
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [2]:
def load_quickdraw10():
    """
    Loads a sample of the Quickdraw dataset, which can be found at:
    https://github.com/LambdaSchool/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/quickdraw10.npz?raw=true
    
    After loading the data, the data is normalized, and split into the feature matrix
    and target vector.
    
    The feature matrix and target vector then undergo shuffled splitting into a train
    and test data set.
    
    Returned:
    ---------
    normalized feature matrix train and test
    target vector train and test
    """
    
    # url where data can be found
    URL_ = "https://github.com/LambdaSchool/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/quickdraw10.npz?raw=true"
    
    # create path to the file
    path_to_zip = get_file('./quickdraw10.npz', origin=URL_, extract=False)

    # load in data
    data = np.load(path_to_zip)
    
    # normalize your image data
    max_pixel_value = 255
    X = data['arr_0']/max_pixel_value # create normalized feature matrix
    Y = data['arr_1'] # create target vector
        
    # split feature matrix and target vector in train and test data sets
    # shuffle the data before the split
    return train_test_split(X, Y, shuffle=True)

In [3]:
X_train, X_test, y_train, y_test = load_quickdraw10()

In [4]:
X_train.shape

(75000, 784)

In [5]:
y_train.shape

(75000,)

_____

# Experiment 1

## Tune Hyperperameters using Enhanced GridsearchCV 

We are going to use GridsearchCV again to tune a deep learning model however we are going to add some additional functionality to our gridsearch. Specifically, we are going to automate away the generation of how many nodes to use in a layer and how many layers to use in a model! 

By the way, yes, there is a function within a function. Try to not let that bother you. An alternative to this would be to create a class. If you're up for the challenge give it a shot. However, consider this a stretch goal that you come back to after you finish going through this assignment. 


### Objective 

The objective of this experiment is to show you how to automate the generation of layers and layer nodes for the purposes of gridsearch. Up until now, we've been manually selecting the number of layers and layer nodes.

In [6]:
# Function to create model, required for KerasClassifier
def create_model(n_layers,  first_layer_nodes, last_layer_nodes, act_funct ="relu", negative_node_incrementation=True):
    """"
    Returns a complied keras model 
    
    Parameters
    ----------
    n_layers: int 
        number of hidden layers in model 
        To be clear, this excludes the input and output layer.
        
    first_layer_nodes: int
        Number of nodes in the first hidden layer 

    last_layer_nodes: int
        Number of nodes in the last hidden layer (this is the layer just prior to the output layer)
        
     act_funct: string 
         Name of activation function to use in hidden layers (this excludes the output layler)
        
    Returns
    -------
    model: keras object 
    """
    
    def gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation=True):
        """
        Generates and returns the number of nodes in each hidden layer. 
        To be clear, this excludes the input and output layer. 

        Note
        ----
        Number of nodes in each layer is linearly incremented. 
        For example, gen_layer_nodes(5, 500, 100) will generate [500, 400, 300, 200, 100]

        Parameters
        ----------
        n_layers: int
            Number of hidden layers
            This values should be 2 or greater 

        first_layer_nodes: int

        last_layer_nodes: int

        Returns
        -------
        layers: list of ints
            Contains number of nodes for each layer 
        """

        # throws an error if n_layers is less than 2 
        assert n_layers >= 2, "n_layers must be 2 or greater"

        layers = []

        # PROTIP: IF YOU WANT THE NODE INCREMENTATION TO BE SPACED DIFFERENTLY
        # THEN YOU'LL NEED TO CHANGE THE WAY THAT IT'S CALCULATED - HAVE FUN!
        # when set to True number of nodes are decreased for subsequent layers 
        if negative_node_incrementation:
            # subtract this amount from previous layer's nodes in order to increment towards smaller numbers 
            nodes_increment = (last_layer_nodes - first_layer_nodes)/ (n_layers-1)
            
        # when set to False number of nodes are increased for subsequent layers
        else:
            # add this amount from previous layer's nodes in order to increment towards larger numbers 
            nodes_increment = (first_layer_nodes - last_layer_nodes)/ (n_layers-1)

        nodes = first_layer_nodes

        for i in range(1, n_layers+1):

            layers.append(math.ceil(nodes))

            # increment nodes for next layer 
            nodes = nodes + nodes_increment

        return layers
    
    # create model
    model = Sequential()
    
    n_nodes = gen_layer_nodes(n_layers, first_layer_nodes, last_layer_nodes, negative_node_incrementation)
    
    for i in range(1, n_layers):
        if i==1:
            model.add(Dense(first_layer_nodes, input_dim=X_train.shape[1], activation=act_funct))
        else:
            model.add(Dense(n_nodes[i-1], activation=act_funct))
            
            
    # output layer 
    model.add(Dense(10, # 10 unit/neurons in output layer because we have 10 possible labels to predict  
                    activation='softmax')) # use softmax for a label set greater than 2            
    
    # Compile model
    model.compile(loss='sparse_categorical_crossentropy', 
                  optimizer='adam', # adam is a good default optimizer 
                  metrics=['accuracy'])
    
    # do not include model.fit() inside the create_model function
    # KerasClassifier is expecting a complied model 
    return model


## Explore create_model

Let's build a few different models in order to understand how the above code works in practice. 

### Build model 

Use `create_model` to build a model. 

- Set `n_layers = 10` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = True`

In [8]:
# use create_model to create a model 
model = create_model(n_layers=10, 
                     first_layer_nodes=500,
                     last_layer_nodes=100,
                     act_funct ="relu", 
                     negative_node_incrementation=True)

In [9]:
# run model.summary() and make sure that you understand the model architecture that you just built 
# Notice in the model summary how the number of nodes have been linearly incremented in decreasing values. 
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 500)               392500    
_________________________________________________________________
dense_1 (Dense)              (None, 456)               228456    
_________________________________________________________________
dense_2 (Dense)              (None, 412)               188284    
_________________________________________________________________
dense_3 (Dense)              (None, 367)               151571    
_________________________________________________________________
dense_4 (Dense)              (None, 323)               118864    
_________________________________________________________________
dense_5 (Dense)              (None, 278)               90072     
_________________________________________________________________
dense_6 (Dense)              (None, 234)               6

### Build model 

Use `create_model` to build a model. 

- Set `n_layers = 10` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = False`

In [10]:
# use create_model to create a model 
model = create_model(n_layers=10, 
                     first_layer_nodes=500,
                     last_layer_nodes=100,
                     act_funct ="relu", 
                     negative_node_incrementation=False)

In [11]:
# run model.summary() and make sure that you understand the model architecture that you just built 
# Notice in the model summary how the number of nodes have been linearly incremented in increasing values.
# The output layer must have 10 nodes because there are 10 labels to predict 
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 500)               392500    
_________________________________________________________________
dense_11 (Dense)             (None, 545)               273045    
_________________________________________________________________
dense_12 (Dense)             (None, 589)               321594    
_________________________________________________________________
dense_13 (Dense)             (None, 634)               374060    
_________________________________________________________________
dense_14 (Dense)             (None, 678)               430530    
_________________________________________________________________
dense_15 (Dense)             (None, 723)               490917    
_________________________________________________________________
dense_16 (Dense)             (None, 767)              

Ok, now that we've played around a bit with  `create_model` in order to understand how it works, let's build a much simpler model that we'll be running gridsearches. 

### Build model 

Use `create_model` to build a model. 

- Set `n_layers = 2` 
- Set `first_layer_nodes = 500`
- Set `last_layer_nodes = 100`
- Set `act_funct = "relu"`
- Make sure that `negative_node_incrementation = True`

In [15]:
# use create_model to create a model 
model = create_model(n_layers=2, 
                     first_layer_nodes=500,
                     last_layer_nodes=100,
                     act_funct ="relu", 
                     negative_node_incrementation=True)

In [16]:
# run model.summary() and make sure that you understand the model architecture that you just built 
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_22 (Dense)             (None, 500)               392500    
_________________________________________________________________
dense_23 (Dense)             (None, 10)                5010      
Total params: 397,510
Trainable params: 397,510
Non-trainable params: 0
_________________________________________________________________


In [14]:
# define the grid search parameters
param_grid = {'n_layers': [2, 3],
              'epochs': [3], 
              "first_layer_nodes": [500, 300],
              "last_layer_nodes": [100, 50]
             }

In [17]:
model = KerasClassifier(create_model)

In [18]:
# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=param_grid, 
                    n_jobs=-1, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Fitting 3 folds for each of 8 candidates, totalling 24 fits
Epoch 1/3
Epoch 2/3
Epoch 3/3
Best: 0.8660533428192139 using {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 2}
Means: 0.8648933172225952, Stdev: 0.003692923752483053 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 100, 'n_layers': 2}
Means: 0.865933338801066, Stdev: 0.0009590753884723514 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 100, 'n_layers': 3}
Means: 0.8660533428192139, Stdev: 0.0024547819066268834 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 2}
Means: 0.8656399846076965, Stdev: 0.001301482958733984 with: {'epochs': 3, 'first_layer_nodes': 500, 'last_layer_nodes': 50, 'n_layers': 3}
Means: 0.8616000016530355, Stdev: 0.0010203041373970016 with: {'epochs': 3, 'first_layer_nodes': 300, 'last_layer_nodes': 100, 'n_layers': 2}
Means: 0.8633333245913187, Stdev: 0.0025667285290341692 with: {'epochs': 3, 'first_layer_nodes'

In [19]:
best_model = grid_result.best_estimator_

In [27]:
best_model.get_params()

{'epochs': 3,
 'first_layer_nodes': 500,
 'last_layer_nodes': 50,
 'n_layers': 2,
 'build_fn': <function __main__.create_model(n_layers, first_layer_nodes, last_layer_nodes, act_funct='relu', negative_node_incrementation=True)>}

-----

# Experiment 2

## Benchmark different Optimization Algorithms 

In this section, we are going to use the same model and dataset in order to benchmark 3 different gridsearch approaches: 

- Random Search
- Bayesian Optimization. 
- Brute Force Gridsearch

Our goal in this experiment is two-fold. We want to see which appraoch 

- Scores the highest accuracy
- Has the shortest run time 

We want to see how these 3 gridsearch approaches handle these trade-offs and to give you a sense of those trades offs.

### Trade-offs

`Brute Force Gridsearch` will train a model on every single unique hyperparameter combination, this guarantees that you'll get the highest possible accuracy from your parameter set but your gridsearch might have a very long run-time. 

`Random Search` will randomly sample from your parameter set which, depending on how many samples, the run-time might be significantly cut down but you might or might not sample the parameters that correspond to the heightest possible accuracies. 

`Bayesian Optimization` has a bit of intelligence built into it's search algorithm but you do need to manually select some parameters which greatly influence the model learning outcomes. 

-------
### Build our model

In [29]:
# because gridsearching can take a lot of time and we are bench marking 3 different approaches
# let's build a simple model to minimize run time 

def build_model(hp):
    
    """
    Returns a complied keras model ready for keras-tuner gridsearch algorithms 
    """
    
    model = Sequential()
    
    # hidden layer
    model.add(Dense(units=hp.get('units'),activation=hp.get("activation")))
    
    # output layer
    model.add(Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=keras.optimizers.Adam(hp.get('learning_rate')),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
    return model
  

In [30]:
# build out our hyperparameter dictionary 
hp = HyperParameters()
hp.Int('units', min_value=32, max_value=512, step=32)
hp.Choice('learning_rate',values=[1e-1, 1e-2, 1e-3])
hp.Choice('activation',values=["relu", "sigmoid"])

'relu'

------
# Run the Gridsearch Algorithms 

### Random Search

Be sure to check out the [**docs for Keras-Tuner**](https://keras-team.github.io/keras-tuner/documentation/tuners/). Here you can read about the input parameters for the `RandomSearch` tuner.

In [54]:
# how many unique hyperparameter combinations do we have? 
# HINT: take the product of the number of possible values for each hyperparameter 
# save your answer to n_unique_hparam_combos
n_unique_hparam_combos = 16 * 3 * 2

In [55]:
# how many of these do we want to randomly sample?
# let's pick 25% of n_unique_hparam_combos param combos to sample
# save this number to n_param_combos_to_sample
n_param_combos_to_sample = n_unique_hparam_combos*0.25

In [56]:
random_tuner = RandomSearch(
            build_model,
            objective='val_accuracy',
            max_trials=n_param_combos_to_sample, # number of times to sample the parameter set and build a model 
            seed=1234,
            hyperparameters=hp, # pass in our hyperparameter dictionary
            directory='./keras-tuner-trial',
            project_name='random_search')

In [57]:
# take note of Total elapsed time in print out
random_tuner.search(X_train, y_train,
                    epochs=3,
                    validation_data=(X_test, y_test))

Trial 24 Complete [00h 00m 08s]
val_accuracy: 0.6904799938201904

Best val_accuracy So Far: 0.8688399791717529
Total elapsed time: 00h 03m 53s
INFO:tensorflow:Oracle triggered exit


In [58]:
# identify the best score and hyperparamter (should be at the top since scores are ranked)
random_tuner.results_summary()

Results summary
Results in ./keras-tuner-trial/random_search
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
units: 352
learning_rate: 0.001
activation: relu
Score: 0.8688399791717529
Trial summary
Hyperparameters:
units: 384
learning_rate: 0.001
activation: relu
Score: 0.863319993019104
Trial summary
Hyperparameters:
units: 448
learning_rate: 0.001
activation: sigmoid
Score: 0.860759973526001
Trial summary
Hyperparameters:
units: 480
learning_rate: 0.001
activation: sigmoid
Score: 0.8600800037384033
Trial summary
Hyperparameters:
units: 416
learning_rate: 0.001
activation: sigmoid
Score: 0.8595200181007385
Trial summary
Hyperparameters:
units: 288
learning_rate: 0.001
activation: sigmoid
Score: 0.8557199835777283
Trial summary
Hyperparameters:
units: 224
learning_rate: 0.001
activation: sigmoid
Score: 0.85343998670578
Trial summary
Hyperparameters:
units: 160
learning_rate: 0.01
activation: sigmoid
Score: 0.8360000252723694
Trial s

 ### Results
 
Identify and write the the best performing hyperparamter combination and model score. 
Note that because this is Random Search, multiple runs might have slighly different outcomes. 
 
 

**Best Performing Hyperparameter Combination--RandomSearch:**

units: 416

learning_rate: 0.001

activation: sigmoid


**Best Score:** 0.8595200181007385

------
### Bayesian Optimization

![](https://upload.wikimedia.org/wikipedia/commons/0/02/GpParBayesAnimationSmall.gif)

Be sure to check out the [**docs for Keras-Tuner**](https://keras-team.github.io/keras-tuner/documentation/tuners/). Here you can read about the input parameters for the `BayesianOptimization` tuner.

Pay special attention to these `BayesianOptimization` parameters: `num_initial_points` and `beta`. 

`num_initial_points`: 

Number of randomly selected hyperparameter combinations to try before applying bayesian probability to determine liklihood of which param combo to try next based on expected improvement


`beta`: 

Larger values means more willing to explore new hyperparameter combinations (analogous to searching for the global minimum in Gradient Descent), smaller values means that it is less willing to try new hyperparameter combinations (analogous to getting stuck in a local minimum in Gradient Descent). 

As a start, error on the side of larger values. What defines a small or large value you ask? That question would pull us into the mathematical intricacies of Bayesian Optimization and Gaussian Processes. For simplicity, notice that the default value is 2.6 and work from there. 

In [59]:
# we know that 24 samples is about 25% of 96 possible hyper-parameter combos
# because BO isn't random (after num_initial_points number of trails) let's see if 15 max trials gives good results
# feel free to play with any of these numbers
max_trials=15
num_initial_points=5
beta=5.0

In [60]:
bayesian_tuner = BayesianOptimization(
                    build_model,
                    objective='val_accuracy',
                    max_trials=max_trials,
                    hyperparameters=hp, # pass in our hyperparameter dictionary
                    num_initial_points=num_initial_points, 
                    beta=beta, 
                    seed=1234,
                    directory='./keras-tuner-trial',
                    project_name='bayesian_optimization_4')

In [61]:
bayesian_tuner.search(X_train, y_train,
               epochs=3,
               validation_data=(X_test, y_test))

Trial 15 Complete [00h 00m 11s]
val_accuracy: 0.8248400092124939

Best val_accuracy So Far: 0.8712800145149231
Total elapsed time: 00h 02m 31s
INFO:tensorflow:Oracle triggered exit


In [62]:
bayesian_tuner.results_summary()

Results summary
Results in ./keras-tuner-trial/bayesian_optimization_4
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
units: 512
learning_rate: 0.001
activation: relu
Score: 0.8712800145149231
Trial summary
Hyperparameters:
units: 352
learning_rate: 0.001
activation: relu
Score: 0.8693199753761292
Trial summary
Hyperparameters:
units: 192
learning_rate: 0.001
activation: relu
Score: 0.8649200201034546
Trial summary
Hyperparameters:
units: 480
learning_rate: 0.001
activation: sigmoid
Score: 0.8603600263595581
Trial summary
Hyperparameters:
units: 256
learning_rate: 0.001
activation: sigmoid
Score: 0.8510800004005432
Trial summary
Hyperparameters:
units: 256
learning_rate: 0.01
activation: sigmoid
Score: 0.8364400267601013
Trial summary
Hyperparameters:
units: 512
learning_rate: 0.01
activation: relu
Score: 0.8248400092124939
Trial summary
Hyperparameters:
units: 32
learning_rate: 0.001
activation: relu
Score: 0.8230000138282776
Tria

 ### Results
 
Identify and write the the best performing hyperparamter combination and model score. 
Note that because this is  Bayesian Optimization, multiple runs might have slighly different outcomes. 
 
 

**Best Performing Hyperparameter Combination--BayesianOptimization:**

units: 512

learning_rate: 0.001

activation: relu

**Best Score:** 0.8712800145149231

---------
## Brute Force Gridsearch Optimization


### Populate a Sklearn compatiable parameter dictionary

In [63]:
# build out our hyperparameter dictionary 
hyper_parameters = {
    # BUG Fix: cast array as list otherwise GridSearchCV will throw error
    "units": np.arange(32, 544, 32).tolist(),
    "learning_rate": [1e-1, 1e-2, 1e-3],
    "activation":["relu", "sigmoid"]
}

In [64]:
hyper_parameters

{'units': [32,
  64,
  96,
  128,
  160,
  192,
  224,
  256,
  288,
  320,
  352,
  384,
  416,
  448,
  480,
  512],
 'learning_rate': [0.1, 0.01, 0.001],
 'activation': ['relu', 'sigmoid']}

### Build a Sklearn compatiable model function

In [65]:
def build_model(units, learning_rate, activation):
    
    """
    Returns a complie keras model ready for keras-tuner gridsearch algorithms 
    """
    
    model = Sequential()
    
    # hidden layer
    model.add(Dense(units, activation=activation))
    
    # output layer
    model.add(Dense(10, activation='softmax'))
    
    model.compile(
        optimizer=Adam(learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    
    return model

In [66]:
model = KerasClassifier(build_fn = build_model)

In [67]:
# save start time 
start = time()

# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=hyper_parameters, 
                    n_jobs=-2, 
                    verbose=1, 
                    cv=3)

grid_result = grid.fit(X_train, y_train)

# save end time 
end = time()

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Fitting 3 folds for each of 96 candidates, totalling 288 fits
Best: 0.8449599941571554 using {'activation': 'relu', 'learning_rate': 0.001, 'units': 480}
Means: 0.2900266647338867, Stdev: 0.01737683738467732 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 32}
Means: 0.25197334090868634, Stdev: 0.05959962968823266 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 64}
Means: 0.20678666730721793, Stdev: 0.03748037823677755 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 96}
Means: 0.2860266665617625, Stdev: 0.04897847250512216 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 128}
Means: 0.2573999911546707, Stdev: 0.03133106985928215 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 160}
Means: 0.23365333179632822, Stdev: 0.05492940521764439 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 192}
Means: 0.24677333235740662, Stdev: 0.06660981615580537 with: {'activation': 'relu', 'learning_rate': 0.1, 'units': 224}
Means: 0.298

In [68]:
# total run time 
total_run_time_in_miniutes = (end - start)/60
total_run_time_in_miniutes

9.250651001930237

In [69]:
grid_result.best_params_

{'activation': 'relu', 'learning_rate': 0.001, 'units': 480}

In [70]:
# because all other optimization approaches are reporting test set score
# let's calculate the test set score in this case 
best_model = grid_result.best_estimator_
test_acc = best_model.score(X_test, y_test)



In [71]:
test_acc

0.8508800268173218

 ### Results
 
Identify and write the the best performing hyperparamter combination and model score. 
 
 

**Best Performing Hyperparameter Combination--GridSearchCV:**

units: 480

learning_rate: 0.001

activation: relu

**Best Score:** 0.8508800268173218

_______

# Conclusion

The spirit of this experiment is to expose you to the idea of benchmarking and comparing the trade-offs of various gridsearch approaches. 

Even if we did find a way to pass in the original test set into GridSearchCV, we can see that both Random Search and Bayesian Optimization are arguably better alternatives to a brute force grid search when we consider the trade-offs of run time and locating the best performing model. 

----


# Stretch Goals

- Feel free to run whatever gridserach experiments on whatever models you like!

# Benchmark differenct Optimization Algorithms and different Model Architectures

## Create model

In [121]:
# create my own function to create a model
def create_model_k(hp_k):
    """"
    Returns a complied keras model 
    
    Parameters
    ----------
    hp_k: hyperparameter dictionary of HyperParameter class
        
    Returns
    -------
    model: keras object 
    """
    
    # create model
    model = Sequential()
    
    model.add(Dense(hp_k.get('first_layer_nodes'), input_dim=X_train.shape[1], activation='relu'))
            
    # output layer 
    model.add(Dense(10, # 10 unit/neurons in output layer because we have 10 possible labels to predict  
                    activation='softmax')) # use softmax for a label set greater than 2            
    
    # Compile model
    model.compile(loss='sparse_categorical_crossentropy', 
                  optimizer=keras.optimizers.Adam(hp.get('learning_rate')), # adam is a good default optimizer 
                  metrics=['accuracy'])
    
    # do not include model.fit() inside the create_model function
    # KerasClassifier is expecting a complied model 
    return model

In [156]:
# build my own hyperparameter dictionary 
hp_k = HyperParameters()
hp_k.Choice('first_layer_nodes', values=[275, 300, 450, 512, 550, 600])
hp_k.Choice('learning_rate',values=[0.1, 0.05, 0.001])

0.1

## Run Gridsearch Algorithms

### RandomSearch

In [157]:
# how many unique hyperparameter combinations do we have? 
# HINT: take the product of the number of possible values for each hyperparameter 
# save your answer to n_unique_hparam_combos
n_unique_hparam_combos = 6 * 3

In [158]:
# how many of these do we want to randomly sample?
# let's pick 25% of n_unique_hparam_combos param combos to sample
# save this number to n_param_combos_to_sample
n_param_combos_to_sample = n_unique_hparam_combos*0.5

In [145]:
random_tuner_k5 = RandomSearch(
            create_model_k,
            objective='val_accuracy',
            max_trials=n_param_combos_to_sample, # number of times to sample the parameter set and build a model 
            seed=567,
            hyperparameters=hp_k, # pass in our hyperparameter dictionary
            directory='./keras-tuner-trial-stretch-goal',
            project_name='random_search_5')

In [146]:
# take note of Total elapsed time in print out
random_tuner_k5.search(X_train, y_train,
                     epochs=5,
                     validation_data=(X_test, y_test))

Trial 18 Complete [00h 00m 18s]
val_accuracy: 0.2539199888706207

Best val_accuracy So Far: 0.32708001136779785
Total elapsed time: 00h 05m 35s
INFO:tensorflow:Oracle triggered exit


In [147]:
# identify the best score and hyperparamter (should be at the top since scores are ranked)
random_tuner_k5.results_summary()

Results summary
Results in ./keras-tuner-trial-stretch-goal/random_search_5
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
first_layer_nodes: 300
learning_rate: 0.1
Score: 0.32708001136779785
Trial summary
Hyperparameters:
first_layer_nodes: 512
learning_rate: 0.05
Score: 0.3051599860191345
Trial summary
Hyperparameters:
first_layer_nodes: 275
learning_rate: 0.1
Score: 0.3022400140762329
Trial summary
Hyperparameters:
first_layer_nodes: 600
learning_rate: 0.001
Score: 0.3012399971485138
Trial summary
Hyperparameters:
first_layer_nodes: 450
learning_rate: 0.075
Score: 0.2990800142288208
Trial summary
Hyperparameters:
first_layer_nodes: 550
learning_rate: 0.1
Score: 0.29820001125335693
Trial summary
Hyperparameters:
first_layer_nodes: 300
learning_rate: 0.001
Score: 0.2938399910926819
Trial summary
Hyperparameters:
first_layer_nodes: 275
learning_rate: 0.001
Score: 0.2937999963760376
Trial summary
Hyperparameters:
first_layer_nodes: 

### Results

**Attempt 1**

**Best Hyperparameters--RandomSearch:**

n_layers: 4

first_layer_nodes: 300

last_layer_nodes: 50

learning_rate: 0.01

activation: relu

**Best Score:** 0.24476000666618347

**Attempt 2**

- Increased number of epochs and trials

**Best Hyperparameters--RandomSearch:**

n_layers: 3

first_layer_nodes: 300

last_layer_nodes: 100

learning_rate: 0.1

activation: relu

**Best Score:** 0.2573600113391876

**Attempt 3**

- Removed activation hyperparameter. Set it to 'relu'
- Changed hyperparameters for n_layers, first_layer_nodes, learning_rate

**Best Hyperparameters--RandomSearch:**

n_layers: 2

first_layer_nodes: 350

last_layer_nodes: 50

learning_rate: 0.1

**Best Score:** 0.3113600015640259

**Attempt 4**

- Only searching first_layer_nodes and learning_rate
    - Using findings from non-stretch goal searches and stretch goal searches to pick hyperparameters for this attempt
- Keeping layers at 2

**Best Hyperparameters--RandomSearch:**

first_layer_nodes: 350

learning_rate: 0.001

**Best Score:** 0.2972399890422821

**Attempt 5**

- Switched up hyperparameter values

**Best Hyperparameters--RandomSearch:**

first_layer_nodes: 300

learning_rate: 0.1

**Best Score:** 0.32708001136779785

### Bayesian Optimization

In [151]:
max_trials=15
num_initial_points=5
beta=5.0

In [153]:
bayesian_tuner_k5_1 = BayesianOptimization(
                    create_model_k,
                    objective='val_accuracy',
                    max_trials=max_trials,
                    hyperparameters=hp_k, # pass in our hyperparameter dictionary
                    num_initial_points=num_initial_points, 
                    beta=beta, 
                    seed=567,
                    directory='./keras-tuner-trial-stretch-goal',
                    project_name='bayesian_optimization_5_1')

In [154]:
bayesian_tuner_k5_1.search(X_train, y_train,
               epochs=5,
               validation_data=(X_test, y_test))

Trial 15 Complete [00h 00m 25s]
val_accuracy: 0.29264000058174133

Best val_accuracy So Far: 0.35124000906944275
Total elapsed time: 00h 05m 38s
INFO:tensorflow:Oracle triggered exit


In [155]:
bayesian_tuner_k5_1.results_summary()

Results summary
Results in ./keras-tuner-trial-stretch-goal/bayesian_optimization_5_1
Showing 10 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
first_layer_nodes: 450
learning_rate: 0.001
Score: 0.35124000906944275
Trial summary
Hyperparameters:
first_layer_nodes: 450
learning_rate: 0.001
Score: 0.3463200032711029
Trial summary
Hyperparameters:
first_layer_nodes: 550
learning_rate: 0.1
Score: 0.32328000664711
Trial summary
Hyperparameters:
first_layer_nodes: 600
learning_rate: 0.1
Score: 0.31380000710487366
Trial summary
Hyperparameters:
first_layer_nodes: 300
learning_rate: 0.075
Score: 0.3067600131034851
Trial summary
Hyperparameters:
first_layer_nodes: 300
learning_rate: 0.001
Score: 0.3041599988937378
Trial summary
Hyperparameters:
first_layer_nodes: 450
learning_rate: 0.001
Score: 0.2951200008392334
Trial summary
Hyperparameters:
first_layer_nodes: 450
learning_rate: 0.001
Score: 0.29412001371383667
Trial summary
Hyperparameters:
first_l

 ### Results
  
 
**Attempt 1**

**Best Hyperparameters--Bayesian Optimization:**

n_layers: 3

first_layer_nodes: 300

last_layer_nodes: 100

learning_rate: 0.1

activation: relu

**Best Score:** 0.25172001123428345
 
 

**Attempt 2**

- Increased number of epochs and trials

**Best Hyperparameters--Bayesian Optimization:**

n_layers: 3

first_layer_nodes: 300

last_layer_nodes: 50

learning_rate: 0.1

activation: relu

**Best Score:** 0.24028000235557556

**Attempt 3**

- Changed beta to 5
- Removed activation hyperparameter. Set it to 'relu'
- Changed hyperparameters for n_layers, first_layer_nodes, learning_rate

**Best Hyperparameters--Bayesian Optimization:**

n_layers: 2

first_layer_nodes: 250

last_layer_nodes: 50

learning_rate: 0.05

**Best Score:** 0.33191999793052673

**Attempt 4**

- Only searching first_layer_nodes and learning_rate
    - Using findings from non-stretch goal searches and stretch goal searches to pick hyperparameters for this attempt
- Keeping layers at 2

**Best Hyperparameters--Bayesian Optimization:**

first_layer_nodes: 275

learning_rate: 0.075

**Best Score:** 0.3412800133228302

**Attempt 5**

- Switched up hyperparameter values

**Best Hyperparameters--Bayesian Optimization:**

first_layer_nodes: 450

learning_rate: 0.001

**Best Score:** 0.35124000906944275