<a href="https://colab.research.google.com/github/skhabiri/DS-Unit-4-Sprint-2-Neural-Networks/blob/main/module3-Tune/LS_DS17_423_Tune_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Train Practice

## *Data Science Unit 4 Sprint 2 Assignment 3*

Continue to use TensorFlow Keras & a sample of the [Quickdraw dataset](https://github.com/googlecreativelab/quickdraw-dataset) to build a sketch classification model. The dataset has been sampled to only 10 classes and 10000 observations per class. Using your baseline model from yesterday, hyperparameter tune it and report on your highest validation accuracy. Your singular goal today is to achieve the highest accuracy possible.

*Don't forgot to switch to GPU on Colab!*

### Hyperparameters to Tune

At a minimum, tune each of these hyperparameters using any strategy we discussed during lecture today: 
- Optimizer
- Learning Rate
- Activiation Function
  - At least 1 subparameter within the Relu activation function
- Number of Neurons in Hidden Layers
- Number of Hidden Layers
- Weight Initialization

In [2]:
pip install wget



In [3]:
import numpy as np
import tensorflow as tf
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
import wget

In [4]:
def load_quickdraw10(path):
  wget.download(path)
  data = np.load('quickdraw10.npz')
  X = data['arr_0']
  y = data['arr_1']

  print(X.shape)
  print(y.shape)

  X, y = shuffle(X, y)
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  
  return X_train, y_train, X_test, y_test

In [5]:
path = 'https://github.com/skhabiri/DS-Unit-4-Sprint-2-Neural-Networks/raw/main/module1-Architect/quickdraw10.npz'
X_train, y_train, X_test, y_test = load_quickdraw10(path)

(100000, 784)
(100000,)


In [6]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD, Adam

### Keras Tuner

In [7]:
!pip install keras-tuner

Collecting keras-tuner
[?25l  Downloading https://files.pythonhosted.org/packages/a7/f7/4b41b6832abf4c9bef71a664dc563adb25afc5812831667c6db572b1a261/keras-tuner-1.0.1.tar.gz (54kB)
[K     |████████████████████████████████| 61kB 1.8MB/s 
Collecting terminaltables
  Downloading https://files.pythonhosted.org/packages/9b/c4/4a21174f32f8a7e1104798c445dacdc1d4df86f2f26722767034e4de4bff/terminaltables-3.1.0.tar.gz
Collecting colorama
  Downloading https://files.pythonhosted.org/packages/c9/dc/45cdef1b4d119eb96316b3117e6d5708a08029992b2fee2c143c7a0a5cc5/colorama-0.4.3-py2.py3-none-any.whl
Building wheels for collected packages: keras-tuner, terminaltables
  Building wheel for keras-tuner (setup.py) ... [?25l[?25hdone
  Created wheel for keras-tuner: filename=keras_tuner-1.0.1-cp36-none-any.whl size=73200 sha256=2f4e84800ed137ab362f80081138b298945fdf0a39e623f58a5984d1034a515c
  Stored in directory: /root/.cache/pip/wheels/b9/cc/62/52716b70dd90f3db12519233c3a93a5360bc672da1a10ded43
  Buildi

In [9]:
from tensorflow import keras
from tensorflow.keras import layers
import kerastuner.tuners as kt
from tensorflow.keras.optimizers import *

"""
This model Tunes:
- Number of Neurons in three Hidden Layers
- Learning Rate
- optimizer
- activation function
"""

def build_model(hp):
    hp_units1 = hp.Int('units_1', min_value=32, max_value=128, step=32)
    hp_lrate = hp.Choice('lrate', values = [1e-3]) 
    hp_optimizer = hp.Choice('optimizer', ['adam', 'rmsprop']) # 'sgd', 

    hp_activation = hp.Choice('activationfn',
                            [
                              # 'softmax',
                              # 'softplus',
                              # 'softsign',
                              'relu',
                              # 'tanh',
                              'sigmoid',
                              # 'hard_sigmoid',
                              'linear'
                            ])

    model = keras.Sequential()
    model.add(layers.Dense(units=hp_units1, input_dim=784, activation='relu'))
    for i in range(2,4,1):
      hp_units = hp.Int(
          'units_' + str(i),
          min_value=8,
          max_value=64,
          step=8
      )
      model.add(layers.Dense(units=hp_units, activation=hp_activation))
    model.add(layers.Dense(10, activation='softmax'))

    
    # if hp_optimizer == 'adam':
    #   opt = keras.optimizers.Adam(learning_rate=hp_lrate)
    # elif hp_optimizer== 'sgd':
    #   opt = keras.optimizers.SGD(learning_rate=hp_lrate)
    # elif hp_optimizer== 'rmsprop':
    #   opt = keras.optimizers.RMSprop(learning_rate=hp_lrate)
    # else:
    #   raise ValueError(f'Unexpected optimizer: {hp_optimizer}')
    
    opt_dic = {'adam': keras.optimizers.Adam(learning_rate=hp_lrate), 
            'sgd': keras.optimizers.SGD(learning_rate=hp_lrate),
            'rmsprop': keras.optimizers.RMSprop(learning_rate=hp_lrate)
            }

    model.compile(optimizer=opt_dic[hp_optimizer], loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    # model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model


In [10]:
tuner_rs = kt.RandomSearch(build_model, objective='val_accuracy', max_trials=8, 
                           executions_per_trial=5, directory='./kt-randomsearch', 
                           project_name='kt-RS')

In [11]:
tuner_rs.search_space_summary()

In [10]:
tuner_rs.search(X_train, y_train, epochs=5, validation_data=(X_test, y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


INFO:tensorflow:Oracle triggered exit


In [11]:
tuner_rs.results_summary()

In [12]:
best_model = tuner_rs.get_best_models()[0]
# Evaluate the best model.
loss0, accuracy0 = best_model.evaluate(X_test, y_test)
print(f"""best accuracy: {accuracy0}""")
print("best parameters", tuner_rs.get_best_hyperparameters(num_trials=1)[0].values)

best accuracy: 0.8343499898910522
best parameters {'units_1': 96, 'lrate': 0.001, 'optimizer': 'rmsprop', 'activationfn': 'linear', 'units_2': 16, 'units_3': 16}


In [16]:
tuner_hb = kt.Hyperband(build_model,
                     objective = 'val_accuracy', 
                     max_epochs = 5,
                     factor = 3,
                     directory = './kt-hyperbrand',
                     project_name = 'kt-HB')  

In [17]:
tuner_hb.search_space_summary()

In [18]:
tuner_hb.search(X_train, y_train, epochs=5, validation_data=(X_test, y_test))

Epoch 1/2
Epoch 2/2


Epoch 1/2
Epoch 2/2


Epoch 1/2
Epoch 2/2


Epoch 1/2
Epoch 2/2


Epoch 1/2
Epoch 2/2


Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


INFO:tensorflow:Oracle triggered exit


In [19]:
tuner_hb.get_best_hyperparameters(num_trials=1)[0].values

{'activationfn': 'tanh',
 'lrate': 0.0001,
 'tuner/bracket': 0,
 'tuner/epochs': 5,
 'tuner/initial_epoch': 0,
 'tuner/round': 0,
 'units_1': 96,
 'units_2': 56,
 'units_3': 48}

In [20]:
# Evaluate the best model.
print("best accuracy: ", tuner_hb.get_best_models()[0].evaluate(X_test, y_test)[1])
print("best parameters", tuner_hb.get_best_hyperparameters(num_trials=1)[0].values)

best accuracy:  0.7452499866485596
best parameters {'units_1': 96, 'lrate': 0.0001, 'activationfn': 'tanh', 'units_2': 56, 'units_3': 48, 'tuner/epochs': 5, 'tuner/initial_epoch': 0, 'tuner/bracket': 0, 'tuner/round': 0}


### While Hyperbrand runs faster, RandomSearch tuner does a better job in finding the optimum hyper parameters.

### Experiment Tracking Framework using tensorboard

In [12]:
# Load an ipython extension
%load_ext tensorboard

In [13]:
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp

import os
import datetime

In [20]:
# Define hyper parametres and score metrics
HP_NUM_UNITS1 = hp.HParam('num_units1', hp.Discrete([32,64]))
HP_NUM_UNITS2 = hp.HParam('num_units2', hp.Discrete([8,32]))

HP_LEARNING_RATE = hp.HParam('learning_rate', hp.RealInterval(0.001,.005))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam']))
HP_ACTIVATION = hp.HParam('actfn', hp.Discrete(['relu', 'sigmoid']))

METRIC_ACCURACY = 'accuracy'

with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
  hp.hparams_config(
      hparams=[HP_NUM_UNITS1,HP_NUM_UNITS2, HP_LEARNING_RATE, HP_OPTIMIZER],
      metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')]
  )

In [21]:
# Adapt Model Function with HParams
def train_test_model(hparams):
  
  model = tf.keras.Sequential(
      [layers.Dense(units=hparams[HP_NUM_UNITS1], input_dim=784, activation='relu'),
       layers.Dense(units=hparams[HP_NUM_UNITS2], activation=hparams[HP_ACTIVATION]),
       layers.Dense(10, activation='softmax')
       ])

  # Optimizer need the learning rate
  opt_name = hparams[HP_OPTIMIZER]
  lr = hparams[HP_LEARNING_RATE]

  if opt_name == 'adam':
    opt = tf.keras.optimizers.Adam(learning_rate=lr)
  elif opt_name == 'sgd':
    opt = tf.keras.optimizers.SGD(learning_rate=lr)
  else:
    raise ValueError(f'Unexpected optimizer: {opt_name}')

  # Compile defines optimizer, loss function and metric
  model.compile(
      optimizer=opt,
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy']
  )

  model.fit(X_train, y_train, epochs=5)
  _, accuracy = model.evaluate(X_test, y_test)

  return accuracy

In [25]:
# For each run, log an hparams set and final accuracy
def run(run_dir, hparams):
  with tf.summary.create_file_writer(run_dir).as_default():
    # .hparams is a method of hp
    hp.hparams(hparams)  # record the values used in this trial
    accuracy = train_test_model(hparams)
    tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)

In [24]:
# Loop over parameters to create unique params set
session_num = 0

# Basically a grid search
for num_units1 in HP_NUM_UNITS1.domain.values:
  for num_units2 in HP_NUM_UNITS2.domain.values:
    for learning_rate in (HP_LEARNING_RATE.domain.min_value, HP_LEARNING_RATE.domain.max_value):
      for activation in HP_ACTIVATION.domain.values:
        for optimizer in HP_OPTIMIZER.domain.values:
          # create a dict eith key of type hp.HParam and unique values , grid search
          hparams = {
            HP_NUM_UNITS1: num_units1,
            HP_NUM_UNITS2: num_units2,
            HP_LEARNING_RATE: learning_rate,
            HP_ACTIVATION: activation,
            HP_OPTIMIZER: optimizer
            }

          run_name = f'run-{session_num}'
          print(f'--- Starting trial: {run_name}')
          # type(param): <class 'tensorboard.plugins.hparams.summary_v2.HParam'>
          # param: <HParam 'num_units1': {32,64,128}>
          #  param.name: num_units1
          print({param.name: hparams[param] for param in hparams})
          run('logs/hparams_tuning/' + run_name, hparams)
          session_num += 1

--- Starting trial: run-0
{'num_units1': 32, 'num_units2': 8, 'learning_rate': 0.001, 'actfn': 'linear', 'optimizer': 'adam'}
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--- Starting trial: run-1
{'num_units1': 32, 'num_units2': 8, 'learning_rate': 0.001, 'actfn': 'linear', 'optimizer': 'sgd'}
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--- Starting trial: run-2
{'num_units1': 32, 'num_units2': 8, 'learning_rate': 0.001, 'actfn': 'relu', 'optimizer': 'adam'}
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--- Starting trial: run-3
{'num_units1': 32, 'num_units2': 8, 'learning_rate': 0.001, 'actfn': 'relu', 'optimizer': 'sgd'}
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--- Starting trial: run-4
{'num_units1': 32, 'num_units2': 8, 'learning_rate': 0.001, 'actfn': 'sigmoid', 'optimizer': 'adam'}
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--- Starting trial: run-5
{'num_units1': 32, 'num_units2': 8, 'learning_rate': 0.001, 'actfn': 'sigmoid', 'optimizer': 'sgd'}

In [26]:
# Visualization with tensorboard
%tensorboard --logdir logs/hparams_tuning

<IPython.core.display.Javascript object>

### Scikit learn hypertunning tools: GridSearchCV, RandomizedSearchCV

In [None]:
# WARNING - may take a few minutes before any output is visible

import numpy
import pandas as pd
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# relu is fast for computation. 0 or y=x, compared to sigmoid

# Function to create model with .fit method. This is required in KerasClassifier

def create_model(units1=32, units2=8, opti='rmsprop', init='glorot_uniform', actfn='relu'):
    # create model
    model = Sequential()
    # units are the number of hidden neurons
    model.add(Dense(units1, input_dim=784, kernel_initializer=init, activation='relu'))
    model.add(Dense(units2, kernel_initializer=init, activation=actfn))
    # 10 output labels
    model.add(Dense(10, kernel_initializer=init, activation='softmax'))
    # Compile model
    #sparse requires 1hot-encoding at the output
    model.compile(loss='sparse_categorical_crossentropy', optimizer=opti, metrics=['accuracy'])
    return model

# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# create model that can accept the fit method
model = KerasClassifier(build_fn=create_model, verbose=1)

# define the grid search parameters
optimizers = ['rmsprop', 'adam']
optimizer = {'rmsprop': keras.optimizers.RMSprop(learn_rate),
             'adam': keras.optimizers.Adam(learn_rate)
             }

lrates = [0.001]
inits = ['normal', 'uniform']
epochs = [5, 10]
batches = [32, 128]
acts = ['relu', 'sigmoid']
# param_grid = dict(opti=optimizers, epochs=epochs, batch_size=batches, init=inits, units1=[32, 64], units2=[8,32], actfn=acts)

param_grid = {'batch_size': batches,
              'epochs': epochs,
              'units1': [32, 64],
              'units2': [8, 32],
              'opti': optimizer[opimizers],
              'learn_rate' : lrates,
              'init': inits
              }

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

# Cross validation parameters
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
# one mean, std, param set for each gridsearch vector. 
# The mean refers to test_score mean of all epochs within each gridsearch verctor
# The epochs log only shows the epoch train steps for the best gridsearch vector (parameter set), batch_size=32 in this case
# perhaps gridsearch decide the best params based on mean score of all epochs, which is not necessarily the best last trained score
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

### Stretch Goals
- Implement Bayesian Hyper-parameter Optimization
- Select a new dataset and apply a neural network to it.
- Use a cloud base experiment tracking framework such as weights and biases
- Research potential architecture ideas for this problem. Try Lenet-10 for example. 