# Part 5.5: Benchmarking Regularization Techniques

Quite a few hyperparameters have been introduced so far.  Tweaking each of these values can have an effect on the score obtained by your neural networks.  Some of the hyperparameters seen so far include:

* Number of layers in the neural network
* How many neurons in each layer
* What activation functions to use on each layer
* Dropout percent on each layer
* L1 and L2 values on each layer

To try out each of these hyperparameters you will need to run train neural networks with multiple settings for each hyperparameter.  However, you may have noticed that neural networks often produce somewhat different results when trained multiple times.  This is because the neural networks start with random weights.  Because of this it is necessary to fit and evaluate a neural network times to ensure that one set of hyperparameters are actually better than another.  Bootstrapping can be an effective means of benchmarking (comparing) two sets of hyperparameters.  

Bootstrapping is similar to cross-validation.  Both go through a number of cycles/folds providing validation and training sets.  However, bootstrapping can have an unlimited number of cycles.  Bootstrapping chooses a new train and validation split each cycle, with replacement.  The fact that each cycle is chosen with replacement means that, unlike cross validation, there will often be repeated rows selected between cycles.  If you run the bootstrap for enough cycles, there will be duplicate cycles.

In this part we will use bootstrapping for hyperparameter benchmarking.  We will train a neural network for a specified number of splits (denoted by the SPLITS constant).  For these examples we use 100.  We will compare the average score at the end of the 100.  By the end of the cycles the mean score will have converged somewhat.  This ending score will be a much better basis of comparison than a single cross-validation.  Additionally, the average number of epochs will be tracked to give an idea of a possible optimal value.  Because the early stopping validation set is also used to evaluate the the neural network as well, it might be slightly inflated.  This is because we are both stopping and evaluating on the same sample.  However, we are using the scores only as relative measures to determine the superiority of one set of hyperparameters to another, so this slight inflation should not present too much of a problem.

Because we are benchmarking, we will display the amount of time taken for each cycle.  The following function can be used to nicely format a time span.

In [1]:
# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed/(60*60))
    m = int((sec_elapsed%(60*60))/60)
    s = sec_elapsed % 60
    return "{} : {:>02}:{:>05.2f}".format(h, m, s)
    

In [2]:
hms_string(86400)

'24 : 00:00.00'

### Bootstrapping for Regression

Regression bootstrapping uses the **ShuffleSplit** object to perform the splits.  This is similar to **KFold** for cross validation, no balancing takes place.  To demonstrate this technique we will attempt to predict the age column for the jh-simple-dataset this data is loaded by the following code.

In [3]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values = ['NA','?'])
df.info()

df = pd.concat([df, pd.get_dummies(df['job'], prefix='job')], axis = 1)
df.drop('job', axis = 1, inplace = True)

df = pd.concat([df, pd.get_dummies(df['area'], prefix='area')], axis = 1)
df.drop('area', axis = 1, inplace = True)

df = pd.concat([df, pd.get_dummies(df['product'], prefix='product')], axis = 1)
df.drop('product', axis = 1, inplace = True)

#missing values for income
df['income'] = df['income'].fillna(df['income'].median())

df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

#convert to numpy classifications
x_columns = df.columns.drop('age').drop('id')
x = df[x_columns].values
y = df['age'].values


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   id              2000 non-null   int64  
 1   job             2000 non-null   object 
 2   area            2000 non-null   object 
 3   income          1941 non-null   float64
 4   aspect          2000 non-null   float64
 5   subscriptions   2000 non-null   int64  
 6   dist_healthy    2000 non-null   float64
 7   save_rate       2000 non-null   int64  
 8   dist_unhealthy  2000 non-null   float64
 9   age             2000 non-null   int64  
 10  pop_dense       2000 non-null   float64
 11  retail_dense    2000 non-null   float64
 12  crime           2000 non-null   float64
 13  product         2000 non-null   object 
dtypes: float64(7), int64(4), object(3)
memory usage: 218.9+ KB


The following code performs the bootstrap.  The architecture of the neural network can be adjusted to compare many different configurations. 

In [8]:
import pandas as pd
import os
import numpy as np
import time
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import ShuffleSplit

SPLITS = 50

#Bootstrap
boot = ShuffleSplit(n_splits = SPLITS, test_size = 0.1, random_state = 45)

#Track Progress
mean_benchmark = []
epochs_needed = []
num = 0

#loop through samples

for train, test in boot.split(x):
    start_time = time.time()
    num+=1
    
    #split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]
    
    #build neural network
    model = Sequential()
    model.add(Dense(20, input_dim=x.shape[1], activation = 'relu'))
    model.add(Dense(10, activation = 'relu'))
    model.add(Dense(1))
    model.compile(loss= 'mean_squared_error', optimizer = 'adam')
    monitor = EarlyStopping(monitor = 'val_loss', min_delta = 1e-3,
                           patience = 5, verbose = 0, mode = 'auto',
                           restore_best_weights = True)
    
    # train the bootstrap samples
    model.fit(x_train,y_train,validation_data=(x_test,y_test), callbacks=[monitor],verbose=0,epochs=1000)
    epochs = monitor.stopped_epoch
    epochs_needed.append(epochs)
    
    # predict on the boot up validation
    pred = model.predict(x_test)
    
    #measure this bootstrap log loss
    score = np.sqrt(metrics.mean_squared_error(pred, y_test))
    mean_benchmark.append(score)
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # record this iteration
    time_look = time.time() - start_time
    print(f"#{num} : score = {score:.6f}, mean Score = {m1:.6f}, stdev = {mdev:.6f}, epochs = {epochs}, mean epochs = {int(m2)}, time = {hms_string(time_look)}")

#1 : score = 0.490907, mean Score = 0.490907, stdev = 0.000000, epochs = 126, mean epochs = 126, time = 0 : 00:07.54
#2 : score = 0.747860, mean Score = 0.619384, stdev = 0.128476, epochs = 114, mean epochs = 120, time = 0 : 00:06.53
#3 : score = 0.693089, mean Score = 0.643952, stdev = 0.110505, epochs = 88, mean epochs = 109, time = 0 : 00:05.03
#4 : score = 0.730178, mean Score = 0.665509, stdev = 0.102726, epochs = 83, mean epochs = 102, time = 0 : 00:04.66
#5 : score = 0.664435, mean Score = 0.665294, stdev = 0.091882, epochs = 129, mean epochs = 108, time = 0 : 00:07.05
#6 : score = 1.152424, mean Score = 0.746482, stdev = 0.199982, epochs = 97, mean epochs = 106, time = 0 : 00:05.49
#7 : score = 0.696688, mean Score = 0.739369, stdev = 0.185966, epochs = 101, mean epochs = 105, time = 0 : 00:05.80
#8 : score = 0.552896, mean Score = 0.716060, stdev = 0.184563, epochs = 119, mean epochs = 107, time = 0 : 00:06.56
#9 : score = 0.916639, mean Score = 0.738346, stdev = 0.185074, epo

### Bootstrapping for Classification

Regression bootstrapping uses the **StratifiedShuffleSplit** object to perform the splits.  This is similar to **StratifiedKFold** for cross validation, as the classes are balanced so that the sampling has no effect on proportions.  To demonstrate this technique we will attempt to predict the product column for the jh-simple-dataset this data is loaded by the following code.

In [10]:
import pandas as pd
from scipy.stats import zscore

# Reading Datasets
df = pd.read_csv("https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Dummy value for job
df = pd.concat([df, pd.get_dummies(df['job'], prefix ="job")], axis = 1)
df.drop('job', axis = 1, inplace = True)

# Dummy value for area
df = pd.concat([df, pd.get_dummies(df['area'], prefix = "area")], axis = 1)
df.drop('area', axis = 1, inplace = True)

# Filling missing values for income
df['income'] = df['income'].fillna(df['income'].median())

# Standarize range
df['income'] = zscore(df['income'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions']) 
df['aspect'] = zscore(df['aspect'])
df['age'] = zscore(df['age'])

# Converting to numpy classifications
x_columns = df.columns.drop('product').drop('id')
x = df[x_columns].values
dummies = pd.get_dummies(df['product'])
products = dummies.columns
y = dummies.values

In [11]:
import pandas as pd
import os
import numpy as np
import time
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import StratifiedShuffleSplit

SPLITS = 50

# Bootstrap
boot = StratifiedShuffleSplit(n_splits=SPLITS, test_size=0.1, 
                                random_state=42)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x,df['product']):
    start_time = time.time()
    num+=1

    # Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Construct neural network
    model = Sequential()
    model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
    model.add(Dense(25, activation='relu')) # Hidden 2
    model.add(Dense(y.shape[1],activation='softmax')) # Output
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=25, verbose=0, mode='auto', restore_best_weights=True)

    # Train on the bootstrap sample
    model.fit(x_train,y_train,validation_data=(x_test,y_test),
              callbacks=[monitor],verbose=0,epochs=1000)
    epochs = monitor.stopped_epoch
    epochs_needed.append(epochs)
    
    # Predict on the out of boot (validation)
    pred = model.predict(x_test)
  
    # Measure this bootstrap's log loss
    y_compare = np.argmax(y_test,axis=1) # For log loss calculation
    score = metrics.log_loss(y_compare, pred)
    mean_benchmark.append(score)
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"#{num}: score={score:.6f}, mean score={m1:.6f}," +\
          f"stdev={mdev:.6f}, epochs={epochs}, mean epochs={int(m2)}," +\
          f" time={hms_string(time_took)}")

#1: score=0.708463, mean score=0.708463,stdev=0.000000, epochs=55, mean epochs=55, time=0 : 00:03.67
#2: score=0.689820, mean score=0.699142,stdev=0.009321, epochs=56, mean epochs=55, time=0 : 00:03.71
#3: score=0.684167, mean score=0.694150,stdev=0.010380, epochs=49, mean epochs=53, time=0 : 00:03.16
#4: score=0.682796, mean score=0.691311,stdev=0.010246, epochs=70, mean epochs=57, time=0 : 00:04.02
#5: score=0.685340, mean score=0.690117,stdev=0.009471, epochs=78, mean epochs=61, time=0 : 00:04.61
#6: score=0.718882, mean score=0.694911,stdev=0.013772, epochs=51, mean epochs=59, time=0 : 00:03.08
#7: score=0.725180, mean score=0.699235,stdev=0.016576, epochs=85, mean epochs=63, time=0 : 00:04.85
#8: score=0.738616, mean score=0.704158,stdev=0.020249, epochs=55, mean epochs=62, time=0 : 00:03.14
#9: score=0.635029, mean score=0.696477,stdev=0.028921, epochs=64, mean epochs=62, time=0 : 00:03.93
#10: score=0.645151, mean score=0.691344,stdev=0.031463, epochs=85, mean epochs=64, time=0 