<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_08_4_bayesian_hyperparameter_opt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Part 8.4: Bayesian Hyperparameter Optimization for Keras

Bayesian Hyperparameter Optimization is a method of finding hyperparameters in a more efficient way than a grid search.  Because each candidate set of hyperparameters requires a retraining of the neural network, it is best to keep the number of candidate sets to a minimum. Bayesian Hyperparameter Optimization achieves this by training a model to predict good candidate sets of hyperparameters.

Snoek, J., Larochelle, H., & Adams, R. P. (2012). [Practical bayesian optimization of machine learning algorithms](https://arxiv.org/pdf/1206.2944.pdf). In *Advances in neural information processing systems* (pp. 2951-2959).


* [bayesian-optimization](https://github.com/fmfn/BayesianOptimization)
* [hyperopt](https://github.com/hyperopt/hyperopt)
* [spearmint](https://github.com/JasperSnoek/spearmint)

In [4]:
!nvidia-smi

Sat Sep 12 07:36:33 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                 ERR! |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [5]:
samplesizename = "100k"
epochcount = 30
logsfilename = samplesizename + str(epochcount) + "epoch"

In [6]:
import json
import numpy as np

In [7]:
def getdata():
  #get data
  from google.colab import drive
  drive.mount('/content/drive')

  %cp "/content/drive/My Drive/teamcompanalyzer/100k.rar" /content/data.rar
  %rm data.json
  %rm labels.json
  !unrar x data.rar
  x = []
  y = []
  with open("data.json", "r") as f:
    x = np.array(json.load(f))

  with open("labels.json", "r") as f:
    y = np.array(json.load(f))
  return x, y
x, y = getdata()

Mounted at /content/drive
rm: cannot remove 'data.json': No such file or directory
rm: cannot remove 'labels.json': No such file or directory

UNRAR 5.50 freeware      Copyright (c) 1993-2017 Alexander Roshal


Extracting from data.rar

Extracting  data.json                                                     98%  OK 
Extracting  labels.json                                                   99%  OK 
All OK


In [8]:
import pandas as pd
import os
import time
import tensorflow.keras.initializers
import statistics
import tensorflow.keras
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, InputLayer
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import StratifiedShuffleSplit
from tensorflow.keras.layers import LeakyReLU,PReLU
from tensorflow.keras.optimizers import Adam

def generate_model(dropout, neuronPct, neuronShrink):
    # We start with some percent of 5000 starting neurons on the first hidden layer.
    neuronCount = int(neuronPct * 5000)
    
    # Construct neural network
    # kernel_initializer = tensorflow.keras.initializers.he_uniform(seed=None)
    model = Sequential()

    # So long as there would have been at least 25 neurons and fewer than 10
    # layers, create a new layer.
    layer = 0
    while neuronCount>25 and layer<10:
        # The first (0th) layer needs an input input_dim(neuronCount)
        if layer==0:
            model.add(Dense(neuronCount,
                #input_dim=x.shape[1] 
                input_shape=(1500,), 
                activation=PReLU()))
                #activation="relu"))
        else:
            model.add(Dense(neuronCount, activation=PReLU())) 
        layer += 1

        # Add dropout after each hidden layer
        model.add(Dropout(dropout))

        # Shrink neuron count for each layer
        neuronCount = neuronCount * neuronShrink

    #model.add(Dense(y.shape[1],activation='softmax')) # Output
    model.add(Dense(1,activation='sigmoid'))
    return model

In [9]:
# Generate a model and see what the resulting structure looks like.
model = generate_model(dropout=0.2, neuronPct=0.1, neuronShrink=0.25)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 500)               751000    
_________________________________________________________________
dropout (Dropout)            (None, 500)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 125)               62750     
_________________________________________________________________
dropout_1 (Dropout)          (None, 125)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 31)                3937      
_________________________________________________________________
dropout_2 (Dropout)          (None, 31)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 3

In [10]:
def evaluate_network(dropout,lr,neuronPct,neuronShrink):
    SPLITS = 2

    # Bootstrap
    boot = StratifiedShuffleSplit(n_splits=SPLITS, test_size=0.1)

    # Track progress
    mean_benchmark = []
    epochs_needed = []
    num = 0
    

    # Loop through samples
    for train, test in boot.split(x, y):
        start_time = time.time()
        num+=1

        # Split train and test
        x_train = x[train]
        y_train = y[train]
        x_test = x[test]
        y_test = y[test]

        model = generate_model(dropout, neuronPct, neuronShrink)
        model.compile(loss='binary_crossentropy', optimizer=Adam(lr=lr), metrics=["accuracy"]) #changed
        monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=100, verbose=2, mode='auto', restore_best_weights=True)

        # Train on the bootstrap sample
        model.fit(x_train,y_train, #batch_size = 16, 
                  validation_data=(x_test,y_test),
                  callbacks=[monitor],verbose=0,epochs=epochcount)
        epochs = monitor.stopped_epoch
        epochs_needed.append(epochs)

        # Predict on the out of boot (validation)
        pred = model.predict(x_test)
        #print(pred)#
        # Measure this bootstrap's log loss
        y_compare = np.argmax(y_test,axis=1) # For log loss calculation
        #print("compare" + str(y_compare))
        #print("test" + str(y_test))
        try:
            score = metrics.log_loss(y_test, pred, eps=1e-7) #used to be y_compare
        except ValueError:
            return -100.0
        #print(score)
        mean_benchmark.append(score)
        m1 = statistics.mean(mean_benchmark)
        m2 = statistics.mean(epochs_needed)
        mdev = statistics.pstdev(mean_benchmark)

        # Record this iteration
        time_took = time.time() - start_time
        
    tensorflow.keras.backend.clear_session()
    return (-m1)



In [11]:
#print(evaluate_network(
#    dropout=0.2,
#    lr=1e-3,
#    neuronPct=0.2,
#    neuronShrink=0.2))


In [None]:
!pip install bayesian-optimization
from bayes_opt import BayesianOptimization
import time
from bayes_opt.logger import JSONLogger
from bayes_opt.event import Events
from bayes_opt.util import load_logs

class newJSONLogger(JSONLogger) :

      def __init__(self, path):
            self._path=None
            super(JSONLogger, self).__init__()
            self._path = path if path[-5:] == ".json" else path + ".json"
# Supress NaN warnings
import warnings
warnings.filterwarnings("ignore",category =RuntimeWarning)

# Bounded region of parameter space
pbounds = {'dropout': (0.0, 0.499),
           'lr': (0.0, 0.1),
           'neuronPct': (0.01, 1),
           'neuronShrink': (0.01, 1)
          }

optimizer = BayesianOptimization(
    f=evaluate_network,
    pbounds=pbounds,
    verbose=1,  # verbose = 1 prints only when a maximum 
    # is observed, verbose = 0 is silent
    random_state=1,
)
try:
  load_logs(optimizer, logs=["/content/drive/My Drive/teamcompanalyzer/" + logsfilename + ".json"]) #to load a previous save
except FileNotFoundError:
  pass

logger = newJSONLogger("/content/drive/My Drive/teamcompanalyzer/" + logsfilename + ".json") # to save progress
optimizer.subscribe(Events.OPTIMIZATION_STEP, logger)

#ScreenLogger(verbose=2)
#optimizer.subscribe()
start_time = time.time()
optimizer.maximize(init_points=10, n_iter=1000)
time_took = time.time() - start_time

print("Total runtime:" + str(time_took))
print(optimizer.max)

Collecting bayesian-optimization
  Downloading https://files.pythonhosted.org/packages/bb/7a/fd8059a3881d3ab37ac8f72f56b73937a14e8bb14a9733e68cc8b17dbe3c/bayesian-optimization-1.2.0.tar.gz
Building wheels for collected packages: bayesian-optimization
  Building wheel for bayesian-optimization (setup.py) ... [?25l[?25hdone
  Created wheel for bayesian-optimization: filename=bayesian_optimization-1.2.0-cp36-none-any.whl size=11685 sha256=84bf81ce6a1af6bdd30c9af65cd7f6b5a0afe4d1aa84a68604a4eec61abaf7c7
  Stored in directory: /root/.cache/pip/wheels/5a/56/ae/e0e3c1fc1954dc3ec712e2df547235ed072b448094d8f94aec
Successfully built bayesian-optimization
Installing collected packages: bayesian-optimization
Successfully installed bayesian-optimization-1.2.0


Total runtime:4567.445779800415

{'target': -0.6932720859646797, 'params': {'dropout': 0.24858337004439982, 'lr': 0.004121493053087515, 'neuronPct': 0.33485495615382616, 'neuronShrink': 0.6423925330540974}}

> Indented block

