<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/pytorch/t81_558_class_05_5_bootstrap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 5: Regularization and Dropout**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 5 Material

* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso [[Video]](https://www.youtube.com/watch?v=jfgRtCYjoBs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/pytorch/t81_558_class_05_1_reg_ridge_lasso.ipynb)
* Part 5.2: Using K-Fold Cross Validation with PyTorch [[Video]](https://www.youtube.com/watch?v=maiQf8ray_s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/pytorch/t81_558_class_05_2_kfold.ipynb)
* Part 5.3: Using L1 and L2 Regularization with PyTorch to Decrease Overfitting [[Video]](https://www.youtube.com/watch?v=JEWzWv1fBFQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/pytorch/t81_558_class_05_3_keras_l1_l2.ipynb)
* Part 5.4: Drop Out for PyTorch to Decrease Overfitting [[Video]](https://www.youtube.com/watch?v=bRyOi0L6Rs8&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/pytorch/t81_558_class_05_4_dropout.ipynb)
* **Part 5.5: Benchmarking PyTorch Deep Learning Regularization Techniques** [[Video]](https://www.youtube.com/watch?v=1NLBwPumUAs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/pytorch/t81_558_class_05_5_bootstrap.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [1]:
import torch

try:
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

import io
import copy

# L2 Normlization
def add_l2_norm_loss(model, l2_lambda = 0.001):
  l2_norm = sum(p.pow(2.0).sum()
    for p in model.parameters())
  return l2_lambda * l2_norm
  
# L1 Normlization
def add_l1_norm_loss(model, l1_lambda = 0.001):
  l1_norm = sum(p.abs().sum()
    for p in model.parameters())
  return l1_lambda * l1_norm

# Define class for early stopping. For more information, see module 3.4.
class EarlyStopping():
  def __init__(self, patience=5, min_delta=1e-4, restore_best_weights=True):
    self.patience = patience
    self.min_delta = min_delta
    self.restore_best_weights = restore_best_weights
    self.best_model = None
    self.best_loss = None
    self.counter = 0
    self.status = ""
    
  def __call__(self, model, val_loss):
    if self.best_loss == None:
      self.best_loss = val_loss
      self.best_model = copy.deepcopy(model)
    elif self.best_loss - val_loss > self.min_delta:
      self.best_loss = val_loss
      self.counter = 0
      self.best_model.load_state_dict(model.state_dict())
    elif self.best_loss - val_loss < self.min_delta:
      self.counter += 1
      if self.counter >= self.patience:
        self.status = f"Stopped on {self.counter}"
        if self.restore_best_weights:
          model.load_state_dict(self.best_model.state_dict())
        return True
    self.status = f"{self.counter}/{self.patience}"
    return False

# Make use of a GPU if one is available.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Note: using Google CoLab
Using device: cuda


# Part 5.5: Benchmarking Regularization Techniques

Quite a few hyperparameters have been introduced so far.  Tweaking each of these values can have an effect on the score obtained by your neural networks.  Some of the hyperparameters seen so far include:

* Number of layers in the neural network
* How many neurons in each layer
* What activation functions to use on each layer
* Dropout percent on each layer
* L1 and L2 values on each layer

To try out each of these hyperparameters you will need to run train neural networks with multiple settings for each hyperparameter.  However, you may have noticed that neural networks often produce somewhat different results when trained multiple times.  This is because the neural networks start with random weights.  Because of this it is necessary to fit and evaluate a neural network times to ensure that one set of hyperparameters are actually better than another.  Bootstrapping can be an effective means of benchmarking (comparing) two sets of hyperparameters.  

Bootstrapping is similar to cross-validation.  Both go through a number of cycles/folds providing validation and training sets.  However, bootstrapping can have an unlimited number of cycles.  Bootstrapping chooses a new train and validation split each cycle, with replacement.  The fact that each cycle is chosen with replacement means that, unlike cross validation, there will often be repeated rows selected between cycles.  If you run the bootstrap for enough cycles, there will be duplicate cycles.

In this part we will use bootstrapping for hyperparameter benchmarking.  We will train a neural network for a specified number of splits (denoted by the SPLITS constant).  For these examples we use 100.  We will compare the average score at the end of the 100.  By the end of the cycles the mean score will have converged somewhat.  This ending score will be a much better basis of comparison than a single cross-validation.  Additionally, the average number of epochs will be tracked to give an idea of a possible optimal value.  Because the early stopping validation set is also used to evaluate the the neural network as well, it might be slightly inflated.  This is because we are both stopping and evaluating on the same sample.  However, we are using the scores only as relative measures to determine the superiority of one set of hyperparameters to another, so this slight inflation should not present too much of a problem.

Because we are benchmarking, we will display the amount of time taken for each cycle.  The following function can be used to nicely format a time span.

In [2]:
# Nicely formatted time string
def hms_string(sec_elapsed):
    h = int(sec_elapsed / (60 * 60))
    m = int((sec_elapsed % (60 * 60)) / 60)
    s = sec_elapsed % 60
    return "{}:{:>02}:{:>05.2f}".format(h, m, s)

## Bootstrapping for Regression

Regression bootstrapping uses the **ShuffleSplit** object to perform the splits.  This technique is similar to **KFold** for cross-validation; no balancing occurs.  We will attempt to predict the age column for the **jh-simple-dataset**; the following code loads this data.

In [3]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

# Generate dummies for product
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product")],axis=1)
df.drop('product', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('age').drop('id')
x = df[x_columns].values
y = df['age'].values

The following code performs the bootstrap.  The architecture of the neural network can be adjusted to compare many different configurations. 

In [4]:
import pandas as pd
import os
import numpy as np
import time
import statistics
import torch.nn as nn
import torch.nn.functional as F
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import ShuffleSplit
from torch.utils.data import DataLoader, TensorDataset


SPLITS = 50
BATCH_SIZE = 16

# Bootstrap
boot = ShuffleSplit(n_splits=SPLITS, test_size=0.1, random_state=42)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x):
    start_time = time.time()
    num+=1

    # Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Define the PyTorch Neural Network
    class Net(nn.Module):
      def __init__(self, in_count, out_count):
          super(Net, self).__init__()
          self.fc1 = nn.Linear(in_count, 50)
          self.fc2 = nn.Linear(50, 25)
          self.fc3 = nn.Linear(25, out_count)
          self.softmax = nn.Softmax(dim=1)

      def forward(self, x):
          x = F.relu(self.fc1(x))
          x = F.relu(self.fc2(x))
          return self.softmax(self.fc3(x))

    # Numpy to PyTorch
    x_train = torch.Tensor(x_train).float()
    y_train = torch.Tensor(y_train).float()

    x_test = torch.Tensor(x_test).float().to(device)
    y_test = torch.Tensor(y_test).float().to(device)

    # Create datasets
    dataset_train = TensorDataset(x_train, y_train)
    dataloader_train = DataLoader(dataset_train,\
      batch_size=BATCH_SIZE, shuffle=True)

    dataset_test = TensorDataset(x_test, y_test)
    dataloader_test = DataLoader(dataset_test,\
      batch_size=BATCH_SIZE, shuffle=True)

    # Train the network
    model = Net(x.shape[1],1).to(device)

    # Define the loss function for regression
    loss_fn = nn.MSELoss()

    # Define the optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    es = EarlyStopping()

    epoch = 0
    done = False
    while epoch<1000 and not done:
      epoch += 1
      steps = list(enumerate(dataloader_train))
      model.train()
      for i, (x_batch, y_batch) in steps:
        y_batch_pred = model(x_batch.to(device)).flatten()
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        loss += add_l2_norm_loss(model)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        loss, current = loss.item(), (i + 1)* len(x_batch)
        if i == len(steps)-1:
          model.eval()
          pred = model(x_test).flatten()
          vloss = loss_fn(pred, y_test)
          if es(model,vloss): done = True
    
    model.eval()
    pred = model(x_test)
  
    # Measure this bootstrap's log loss
    score = np.sqrt(metrics.mean_squared_error(pred.cpu().detach(),y_test.cpu().detach()))
    mean_benchmark.append(float(score))
    epochs_needed.append(int(epoch))
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"#{num}: score={score:.6f}, mean score={m1:.6f},"
          f" stdev={mdev:.6f}", 
          f" epochs={epoch}, mean epochs={int(m2)}", 
          f" time={hms_string(time_took)}")

#1: score=45.117290, mean score=45.117290, stdev=0.000000  epochs=6, mean epochs=6  time=0:00:04.31
#2: score=44.898441, mean score=45.007866, stdev=0.109425  epochs=6, mean epochs=6  time=0:00:01.71
#3: score=45.204094, mean score=45.073275, stdev=0.128605  epochs=6, mean epochs=6  time=0:00:01.69
#4: score=45.553978, mean score=45.193451, stdev=0.236074  epochs=6, mean epochs=6  time=0:00:01.57
#5: score=44.616253, mean score=45.078011, stdev=0.312874  epochs=6, mean epochs=6  time=0:00:01.56
#6: score=44.611824, mean score=45.000313, stdev=0.334305  epochs=6, mean epochs=6  time=0:00:01.57
#7: score=44.856827, mean score=44.979815, stdev=0.313553  epochs=6, mean epochs=6  time=0:00:01.57
#8: score=45.120007, mean score=44.997339, stdev=0.296943  epochs=6, mean epochs=6  time=0:00:01.55
#9: score=45.166359, mean score=45.016119, stdev=0.284955  epochs=6, mean epochs=6  time=0:00:01.57
#10: score=44.665646, mean score=44.981072, stdev=0.290059  epochs=6, mean epochs=6  time=0:00:01.55

The bootstrapping process for classification is similar, and I present it in the next section.

## Bootstrapping for Classification

Regression bootstrapping uses the **StratifiedShuffleSplit** class to perform the splits.  This class is similar to **StratifiedKFold** for cross-validation, as the classes are balanced so that the sampling does not affect proportions.  To demonstrate this technique, we will attempt to predict the product column for the **jh-simple-dataset**; the following code loads this data.

In [5]:
import pandas as pd
from scipy.stats import zscore
from sklearn import preprocessing

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['age'] = zscore(df['age'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
le = preprocessing.LabelEncoder()
x_columns = df.columns.drop('product').drop('id')
x = df[x_columns].values
y = le.fit_transform(df['product'])
products = le.classes_

In [6]:
# Define the PyTorch Neural Network
class Net(nn.Module):
    def __init__(self, in_count, out_count):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(in_count, 50)
        self.fc2 = nn.Linear(50, 25)
        self.fc3 = nn.Linear(25, out_count)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.softmax(self.fc3(x))

We now run this data through a number of splits specified by the SPLITS variable. We track the average error through each of these splits.

In [7]:
import pandas as pd
import os
import numpy as np
import time
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import StratifiedShuffleSplit
import tqdm

SPLITS = 50
BATCH_SIZE = 16

# Bootstrap
boot = StratifiedShuffleSplit(n_splits=SPLITS, test_size=0.1, 
                                random_state=42)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x,df['product']):
    start_time = time.time()
    num+=1

    # Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Numpy to PyTorch
    x_train = torch.Tensor(x_train).float()
    y_train = torch.Tensor(y_train).long()

    x_test = torch.Tensor(x_test).float().to(device)
    y_test = torch.Tensor(y_test).long().to(device)

    # Create datasets
    dataset_train = TensorDataset(x_train, y_train)
    dataloader_train = DataLoader(dataset_train,\
      batch_size=BATCH_SIZE, shuffle=True)

    dataset_test = TensorDataset(x_test, y_test)
    dataloader_test = DataLoader(dataset_test,\
      batch_size=BATCH_SIZE, shuffle=True)

    # Create model
    model = Net(x.shape[1],len(products)).to(device)

    loss_fn = nn.CrossEntropyLoss()# cross entropy loss

    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    es = EarlyStopping()

    epoch = 0
    done = False
    while epoch<1000 and not done:
      epoch += 1
      steps = list(enumerate(dataloader_train))
      model.train()
      for i, (x_batch, y_batch) in steps:
        y_batch_pred = model(x_batch.to(device))
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        loss, current = loss.item(), (i + 1)* len(x_batch)
        if i == len(steps)-1:
          model.eval()
          pred = model(x_test)
          vloss = loss_fn(pred, y_test)
          if es(model,vloss): done = True
    
    model.eval()
    pred = model(x_test)
    score = loss_fn(pred, y_test)
    mean_benchmark.append(float(score))
    epochs_needed.append(int(epoch))
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"#{num}: score={score:.6f}, mean score={m1:.6f}," +\
          f"stdev={mdev:.6f}, epochs={epoch}, mean epochs={int(m2)}," +\
          f" time={hms_string(time_took)}")

#1: score=1.685417, mean score=1.685417,stdev=0.000000, epochs=6, mean epochs=6, time=0:00:01.11
#2: score=1.501370, mean score=1.593394,stdev=0.092024, epochs=13, mean epochs=9, time=0:00:02.38
#3: score=1.513329, mean score=1.566705,stdev=0.084084, epochs=21, mean epochs=13, time=0:00:03.78
#4: score=1.506279, mean score=1.551599,stdev=0.077377, epochs=10, mean epochs=12, time=0:00:01.82
#5: score=1.641635, mean score=1.569606,stdev=0.078018, epochs=16, mean epochs=13, time=0:00:02.89
#6: score=1.502168, mean score=1.558367,stdev=0.075525, epochs=19, mean epochs=14, time=0:00:03.45
#7: score=1.536047, mean score=1.555178,stdev=0.070357, epochs=10, mean epochs=13, time=0:00:01.82
#8: score=1.449447, mean score=1.541962,stdev=0.074526, epochs=16, mean epochs=13, time=0:00:02.92
#9: score=1.684237, mean score=1.557770,stdev=0.083284, epochs=11, mean epochs=13, time=0:00:02.00
#10: score=1.481287, mean score=1.550122,stdev=0.082274, epochs=14, mean epochs=13, time=0:00:02.53
#11: score=1

## Benchmarking

Now that we've seen how to bootstrap with both classification and regression, we can start to try to optimize the hyperparameters for the **jh-simple-dataset** data.  For this example, we will encode for classification of the product column.  Evaluation will be in log loss.

In [8]:
import pandas as pd
from scipy.stats import zscore

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],
               axis=1)
df.drop('area', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['age'] = zscore(df['age'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('product').drop('id')
x = df[x_columns].values
dummies = pd.get_dummies(df['product']) # Classification
products = dummies.columns
y = dummies.values

I performed some optimization, and the code has the best settings that I could determine. Later in this book, we will see how we can use an automatic process to optimize the hyperparameters.

In [9]:
# Define the PyTorch Neural Network
class Net(nn.Module):
  def __init__(self, in_count, out_count):
      super(Net, self).__init__()
      self.fc1 = nn.Linear(in_count, 50)
      self.fc2 = nn.Linear(50, 25)
      self.fc3 = nn.Linear(25, out_count)
      self.softmax = nn.Softmax(dim=1)

  def forward(self, x):
      x = F.relu(self.fc1(x))
      x = F.relu(self.fc2(x))
      self.dropout = nn.Dropout(0.25)
      return self.softmax(self.fc3(x))

In [10]:
import pandas as pd
import os
import numpy as np
import time
import statistics
from sklearn import metrics
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import StratifiedShuffleSplit

SPLITS = 50
BATCH_SIZE = 16

# Bootstrap
boot = StratifiedShuffleSplit(n_splits=SPLITS, test_size=0.1, 
                                random_state=42)

# Track progress
mean_benchmark = []
epochs_needed = []
num = 0

# Loop through samples
for train, test in boot.split(x,df['product']):
    start_time = time.time()
    num+=1

    # Split train and test
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Numpy to PyTorch
    x_train = torch.Tensor(x_train).float()
    y_train = torch.Tensor(y_train).float()

    x_test = torch.Tensor(x_test).float().to(device)
    y_test = torch.Tensor(y_test).float().to(device)

    # Create datasets
    dataset_train = TensorDataset(x_train, y_train)
    dataloader_train = DataLoader(dataset_train,\
      batch_size=BATCH_SIZE, shuffle=True)

    dataset_test = TensorDataset(x_test, y_test)
    dataloader_test = DataLoader(dataset_test,\
      batch_size=BATCH_SIZE, shuffle=True)

    # Train the network
    model = Net(x.shape[1],len(products)).to(device)

    # Define the loss function for classification
    loss_fn = nn.CrossEntropyLoss()# cross entropy loss

    # Define the optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    es = EarlyStopping()

    epoch = 0
    done = False
    while epoch<1000 and not done:
      epoch += 1
      steps = list(enumerate(dataloader_train))
      model.train()
      for i, (x_batch, y_batch) in steps:
        y_batch_pred = model(x_batch.to(device))
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        loss, current = loss.item(), (i + 1)* len(x_batch)
        if i == len(steps)-1:
          model.eval()
          pred = model(x_test)
          vloss = loss_fn(pred, y_test)
          if es(model,vloss): done = True
    
    model.eval()
    pred = model(x_test)
    score = loss_fn(pred, y_test)
    mean_benchmark.append(float(score))
    epochs_needed.append(int(epoch))
    m1 = statistics.mean(mean_benchmark)
    m2 = statistics.mean(epochs_needed)
    mdev = statistics.pstdev(mean_benchmark)
    
    # Record this iteration
    time_took = time.time() - start_time
    print(f"#{num}: score={score:.6f}, mean score={m1:.6f}," +\
          f"stdev={mdev:.6f}, epochs={epoch}, mean epochs={int(m2)}," +\
          f" time={hms_string(time_took)}")

#1: score=1.532517, mean score=1.532517,stdev=0.000000, epochs=7, mean epochs=7, time=0:00:01.38
#2: score=1.471129, mean score=1.501823,stdev=0.030694, epochs=9, mean epochs=8, time=0:00:01.77
#3: score=1.519374, mean score=1.507673,stdev=0.026392, epochs=11, mean epochs=9, time=0:00:02.16
#4: score=1.632895, mean score=1.538979,stdev=0.058843, epochs=17, mean epochs=11, time=0:00:03.31
#5: score=1.475250, mean score=1.526233,stdev=0.058479, epochs=12, mean epochs=11, time=0:00:02.33
#6: score=1.501268, mean score=1.522072,stdev=0.054188, epochs=26, mean epochs=13, time=0:00:05.07
#7: score=1.542092, mean score=1.524932,stdev=0.050655, epochs=10, mean epochs=13, time=0:00:02.03
#8: score=1.499199, mean score=1.521715,stdev=0.048142, epochs=7, mean epochs=12, time=0:00:01.38
#9: score=1.564127, mean score=1.526428,stdev=0.047305, epochs=6, mean epochs=11, time=0:00:01.17
#10: score=1.488320, mean score=1.522617,stdev=0.046311, epochs=10, mean epochs=11, time=0:00:01.95
#11: score=1.661