Benchmarking Regularization Techniques
Quite a few hyperparameters have been introduced so far. Tweaking each of these values can have an effect on the score obtained by your neural networks. Some of the hyperparameters seen so far include:

* Number of layers in the neural network
* How many neurons in each layer
* What activation functions to use on each layer
* Dropout percent on each layer
* L1 and L2 values on each layer

To try out each of these hyperparameters you will need to run train neural networks with multiple settings for each hyperparameter. However, you may have noticed that neural networks often produce somewhat different results when trained multiple times. This is because the neural networks start with random weights. Because of this it is necessary to fit and evaluate a neural network times to ensure that one set of hyperparameters are actually better than another. Bootstrapping can be an effective means of benchmarking (comparing) two sets of hyperparameters.

Bootstrapping is similar to cross-validation. Both go through a number of cycles/folds providing validation and training sets. However, bootstrapping can have an unlimited number of cycles. Bootstrapping chooses a new train and validation split each cycle, with replacement. The fact that each cycle is chosen with replacement means that, unlike cross validation, there will often be repeated rows selected between cycles. If you run the bootstrap for enough cycles, there will be duplicate cycles.

In this part we will use bootstrapping for hyperparameter benchmarking. We will train a neural network for a specified number of splits (denoted by the SPLITS constant). For these examples we use 100. We will compare the average score at the end of the 100. By the end of the cycles the mean score will have converged somewhat. This ending score will be a much better basis of comparison than a single cross-validation. Additionally, the average number of epochs will be tracked to give an idea of a possible optimal value. Because the early stopping validation set is also used to evaluate the the neural network as well, it might be slightly inflated. This is because we are both stopping and evaluating on the same sample. However, we are using the scores only as relative measures to determine the superiority of one set of hyperparameters to another, so this slight inflation should not present too much of a problem.

Because we are benchmarking, we will display the amount of time taken for each cycle. The following function can be used to nicely format a time span.



In [1]:
# Nicely formated time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int (sec_elapsed % (60 * 60))
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

# Bootstrapping for Regression
Regression bootstrapping uses the ShuffleSplit object to perform the splits. This is similar to KFold for cross validation, no balancing takes place. To demonstrate this technique we will attempt to predict the age column for the jh-simple-dataset this data is loaded by the following code.

In [2]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split
import os
import time

start_time = time.time()
# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'], prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

#Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

#Generate dummies for product
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product")],axis=1)
df.drop('product', axis=1, inplace=True)

#Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)


#Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

#Convert to numpy -Classification
x_columns = df.columns.drop('age').drop('id')
x = df[x_columns].values
y = df['age'].values

The following code performs the bootstrap. The architecture of the neural network can be adjusted to compare many different configurations.



In [3]:
import pandas as pd
import os
import numpy as np
import time
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import ShuffleSplit

SPLITS = 50

# Bootstrap
boot = ShuffleSplit(n_splits=SPLITS, test_size=0.1, random_state=42)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x):
    start_time = time.time()
    num+=1
    
    #Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]
    
    #Construct neural network
    model = Sequential()
    model.add(Dense(20, input_dim=x_train.shape[1], activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='mean_squared_error', optimizer='adam')
    
    monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3,
                           patience=5, verbose=0, mode='auto', restore_best_weights=True)
    
    # Train on the bootstrap sample
    model.fit(x_train, y_train,validation_data=(x_test,y_test),
             callbacks=[monitor],verbose=0,epochs=1000)
    epochs = monitor.stopped_epoch
    epochs_needed.append(epochs)
    
    #predict
    pred = model.predict(x_test)
    
    #Measure this bootstrap's log loss
    score = np.sqrt(metrics.mean_squared_error(pred,y_test))
    mean_benchmark.append(score)
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"{num}: score={score:.6f},mean score={m1:.6f},stdev={mdev:.6f},epochs={epochs},mean epochs={int(m2)},time={hms_string(time_took)}")

1: score=0.749195,mean score=0.749195,stdev=0.000000,epochs=129,mean epochs=129,time=0:15:15.90
2: score=0.908850,mean score=0.829023,stdev=0.079828,epochs=109,mean epochs=119,time=0:11:11.71
3: score=0.543559,mean score=0.733868,stdev=0.149523,epochs=115,mean epochs=117,time=0:12:12.23
4: score=0.731385,mean score=0.733247,stdev=0.129495,epochs=135,mean epochs=122,time=0:14:14.40
5: score=0.768235,mean score=0.740245,stdev=0.116666,epochs=115,mean epochs=120,time=0:12:12.23
6: score=0.782542,mean score=0.747294,stdev=0.107661,epochs=172,mean epochs=129,time=0:18:18.26
7: score=0.548854,mean score=0.718946,stdev=0.121478,epochs=138,mean epochs=130,time=0:14:14.68
8: score=0.638537,mean score=0.708895,stdev=0.116703,epochs=106,mean epochs=127,time=0:11:11.43
9: score=0.629531,mean score=0.700077,stdev=0.112820,epochs=106,mean epochs=125,time=0:11:11.38
10: score=1.169910,mean score=0.747060,stdev=0.176981,epochs=95,mean epochs=122,time=0:10:10.18
11: score=1.110526,mean score=0.780102,s

The bootstrapping process for classification is similar and is presented in the next section.



## Bootstrapping for Classification

Regression bootstrapping uses the  *StratifiedShuffleSplit*  object to perform the splits. This is similar to *StratifiedKFold*  for cross validation, as the classes are balanced so that the sampling has no effect on proportions. To demonstrate this technique we will attempt to predict the product column for the jh-simple-dataset this data is loaded by the following code.

In [4]:
import pandas as pd
from scipy.stats import zscore

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['age'] = zscore(df['age'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('product').drop('id')
x = df[x_columns].values
dummies = pd.get_dummies(df['product']) # Classification
products = dummies.columns
y = dummies.values

In [5]:
import os
import numpy as np
import time
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import StratifiedShuffleSplit

SPLITS = 50

# Bootstrap
boot = StratifiedShuffleSplit(n_splits=SPLITS, test_size=0.1, 
                                random_state=42)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x,df['product']):
    start_time = time.time()
    num+=1

    # Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Construct neural network
    model = Sequential()
    model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
    model.add(Dense(25, activation='relu')) # Hidden 2
    model.add(Dense(y.shape[1],activation='softmax')) # Output
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=25, verbose=0, mode='auto', restore_best_weights=True)
    
    # Train on the bootstrap sample
    model.fit(x_train,y_train,validation_data=(x_test,y_test),
              callbacks=[monitor],verbose=0,epochs=1000)
    epochs = monitor.stopped_epoch
    epochs_needed.append(epochs)
    
    # Predict on the out of boot (validation)
    pred = model.predict(x_test)
  
    # Measure this bootstrap's log loss
    y_compare = np.argmax(y_test,axis=1) # For log loss calculation
    score = metrics.log_loss(y_compare, pred)
    mean_benchmark.append(score)
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"#{num}: score={score:.6f}, mean score={m1:.6f}," +\
          f"stdev={mdev:.6f}, epochs={epochs}, mean epochs={int(m2)}," +\
          f" time={hms_string(time_took)}")

#1: score=0.700584, mean score=0.700584,stdev=0.000000, epochs=65, mean epochs=65, time=0:07:07.80
#2: score=0.668703, mean score=0.684644,stdev=0.015941, epochs=57, mean epochs=61, time=0:06:06.91
#3: score=0.671223, mean score=0.680170,stdev=0.014472, epochs=50, mean epochs=57, time=0:06:06.11
#4: score=0.638795, mean score=0.669826,stdev=0.021864, epochs=119, mean epochs=72, time=0:13:13.97
#5: score=0.630965, mean score=0.662054,stdev=0.024981, epochs=78, mean epochs=73, time=0:09:09.38
#6: score=0.710793, mean score=0.670177,stdev=0.029155, epochs=68, mean epochs=72, time=0:08:08.12
#7: score=0.740103, mean score=0.680167,stdev=0.036432, epochs=41, mean epochs=68, time=0:05:05.06
#8: score=0.739089, mean score=0.687532,stdev=0.039257, epochs=50, mean epochs=66, time=0:06:06.13
#9: score=0.628604, mean score=0.680984,stdev=0.041386, epochs=74, mean epochs=66, time=0:08:08.88
#10: score=0.665996, mean score=0.679486,stdev=0.039519, epochs=69, mean epochs=67, time=0:08:08.25
#11: sco

# Benchmarking
Now that we've seen how to bootstrap with both classification and regression we can start to try to optimize the hyperparameters for the jh-simple-dataset data. For this example we will encode for classification of the product column. Evaluation will be in log loss.

In [6]:
import pandas as pd
from scipy.stats import zscore

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])
# generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'], prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],
               axis=1)
df.drop('area', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['age'] = zscore(df['age'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('product').drop('id')
x = df[x_columns].values
dummies = pd.get_dummies(df['product']) # Classification
products = dummies.columns
y = dummies.values

In [7]:
import  pandas as pd
import os
import numpy as np
import time
import tensorflow.keras.initializers
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from tensorflow.keras.models import Sequential
from tensorflow.keras import regularizers
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import StratifiedShuffleSplit
from tensorflow.keras.layers import LeakyReLU, PReLU, Dense, Activation, Dropout

SPLITS = 100


# Bootstrap
boot = StratifiedShuffleSplit(n_splits=SPLITS, test_size=0.1)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x,df['product']):
    start_time = time.time()
    num+=1
    
    # Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]
    
    # Construct neural network
    model = Sequential()
    model.add(Dense(100, input_dim=x.shape[1], activation=PReLU(), \
        kernel_regularizer=regularizers.l2(1e-4))) # Hidden 1
    model.add(Dropout(0.5))
    model.add(Dense(100, activation=PReLU(), \
        activity_regularizer=regularizers.l2(1e-4))) # Hidden 2
    model.add(Dropout(0.5))
    model.add(Dense(100, activation=PReLU(), \
        activity_regularizer=regularizers.l2(1e-4)
    )) # Hidden 3
#    model.add(Dropout(0.5)) - Usually better performance 
# without dropout on final layer
    model.add(Dense(y.shape[1],activation='softmax')) # Output
    model.compile(loss='categorical_crossentropy', optimizer='adam')
    monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, 
        patience=100, verbose=0, mode='auto', restore_best_weights=True)

    # Train on the bootstrap sample
    model.fit(x_train,y_train,validation_data=(x_test,y_test), \
              callbacks=[monitor],verbose=0,epochs=1000)
    epochs = monitor.stopped_epoch
    epochs_needed.append(epochs)
    
    # Predict on the out of boot (validation)
    pred = model.predict(x_test)
  
    # Measure this bootstrap's log loss
    y_compare = np.argmax(y_test,axis=1) # For log loss calculation
    score = metrics.log_loss(y_compare, pred)
    mean_benchmark.append(score)
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"#{num}: score={score:.6f},mean score{m1:.6f},stdev={mdev:.6f},epochs={epochs}, mean epochs={int(m2)},time={hms_string(time_took)}")

#1: score=0.634329,mean score0.634329,stdev=0.000000,epochs=180, mean epochs=180,time=0:31:31.50
#2: score=0.671775,mean score0.653052,stdev=0.018723,epochs=153, mean epochs=166,time=0:26:26.72
#3: score=0.681500,mean score0.662535,stdev=0.020336,epochs=135, mean epochs=156,time=0:24:24.31
#4: score=0.630569,mean score0.654543,stdev=0.022399,epochs=209, mean epochs=169,time=0:36:36.59
#5: score=0.627776,mean score0.649190,stdev=0.022716,epochs=236, mean epochs=182,time=0:41:41.06
#6: score=0.765292,mean score0.668540,stdev=0.047981,epochs=147, mean epochs=176,time=0:25:25.69
#7: score=0.755685,mean score0.680990,stdev=0.053882,epochs=147, mean epochs=172,time=0:25:25.96
#8: score=0.653674,mean score0.677575,stdev=0.051205,epochs=209, mean epochs=177,time=0:36:36.82
#9: score=0.670722,mean score0.676814,stdev=0.048324,epochs=220, mean epochs=181,time=0:38:38.60
#10: score=0.713250,mean score0.680457,stdev=0.047130,epochs=180, mean epochs=181,time=0:31:31.39
#11: score=0.604052,mean scor

#85: score=0.596459,mean score0.654464,stdev=0.052671,epochs=251, mean epochs=196,time=0:45:45.48
#86: score=0.593169,mean score0.653751,stdev=0.052775,epochs=247, mean epochs=196,time=0:42:42.53
#87: score=0.584291,mean score0.652953,stdev=0.052991,epochs=244, mean epochs=197,time=0:45:45.45
#88: score=0.684961,mean score0.653317,stdev=0.052798,epochs=244, mean epochs=197,time=0:43:43.17
#89: score=0.688087,mean score0.653707,stdev=0.052628,epochs=184, mean epochs=197,time=0:33:33.01
#90: score=0.581779,mean score0.652908,stdev=0.052875,epochs=183, mean epochs=197,time=0:33:33.06
#91: score=0.569617,mean score0.651993,stdev=0.053296,epochs=306, mean epochs=198,time=0:53:53.41
#92: score=0.721419,mean score0.652747,stdev=0.053492,epochs=147, mean epochs=198,time=0:27:27.06
#93: score=0.653426,mean score0.652755,stdev=0.053204,epochs=229, mean epochs=198,time=0:40:40.71
#94: score=0.719833,mean score0.653468,stdev=0.053366,epochs=141, mean epochs=197,time=0:25:25.38
#95: score=0.670793,

In [8]:
time_took = time.time() - start_time
print(f"time={hms_string(time_took)}")

time=0:26:26.89
