<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_2_kfold.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 5: Regularization and Dropout**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 5 Material

* Part 5.1: Part 5.1: Introduction to Regularization: Ridge and Lasso [[Video]](https://www.youtube.com/watch?v=jfgRtCYjoBs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_1_reg_ridge_lasso.ipynb)
* **Part 5.2: Using K-Fold Cross Validation with Keras** [[Video]](https://www.youtube.com/watch?v=maiQf8ray_s&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_2_kfold.ipynb)
* Part 5.3: Using L1 and L2 Regularization with Keras to Decrease Overfitting [[Video]](https://www.youtube.com/watch?v=JEWzWv1fBFQ&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_3_keras_l1_l2.ipynb)
* Part 5.4: Drop Out for Keras to Decrease Overfitting [[Video]](https://www.youtube.com/watch?v=bRyOi0L6Rs8&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_4_dropout.ipynb)
* Part 5.5: Benchmarking Keras Deep Learning Regularization Techniques [[Video]](https://www.youtube.com/watch?v=1NLBwPumUAs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_05_5_bootstrap.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [None]:
import os

try:
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

Note: using Google CoLab


# Part 5.2: Using K-Fold Cross-validation with PyTorch

You can use cross-validation for a variety of purposes in predictive modeling:

* Generating out-of-sample predictions from a neural network
* Estimate a good number of epochs to train a neural network for (early stopping)
* Evaluate the effectiveness of certain hyperparameters, such as activation functions, neuron counts, and layer counts

Cross-validation uses several folds and multiple models to provide each data segment a chance to serve as both the validation and training set. Figure 5.CROSS shows cross-validation.

**Figure 5.CROSS: K-Fold Crossvalidation**
![K-Fold Crossvalidation](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_1_kfold.png "K-Fold Crossvalidation")

It is important to note that each fold will have one model (neural network). To generate predictions for new data (not present in the training set), predictions from the fold models can be handled in several ways:

* Choose the model with the highest validation score as the final model.
* Preset new data to the five models (one for each fold) and average the result (this is an [ensemble](https://en.wikipedia.org/wiki/Ensemble_learning)).
* Retrain a new model (using the same settings as the cross-validation) on the entire dataset. Train for as many epochs and with the same hidden layer structure.

Generally, I prefer the last approach and will retrain a model on the entire data set once I have selected hyper-parameters. Of course, I will always set aside a final holdout set for model validation that I do not use in any aspect of the training process.

## Regression vs Classification K-Fold Cross-Validation

Regression and classification are handled somewhat differently concerning cross-validation. Regression is the simpler case where you can break up the data set into K folds with little regard for where each item lands. For regression, the data items should fall into the folds as randomly as possible. It is also important to remember that not every fold will necessarily have the same number of data items. It is not always possible for the data set to be evenly divided into K folds. For regression cross-validation, we will use the Scikit-Learn class **KFold**.

Cross-validation for classification could also use the **KFold** object; however, this technique would not ensure that the class balance remains the same in each fold as in the original. The balance of classes that a model was trained on must remain the same (or similar) to the training set. Drift in this distribution is one of the most important things to monitor after a trained model has been placed into actual use. Because of this, we want to make sure that the cross-validation itself does not introduce an unintended shift. This technique is called stratified sampling and is accomplished by using the Scikit-Learn object **StratifiedKFold** in place of **KFold** whenever you use classification. In summary, you should use the following two objects in Scikit-Learn:

* **KFold** When dealing with a regression problem.
* **StratifiedKFold** When dealing with a classification problem.

The following two sections demonstrate cross-validation with classification and regression. 

## Out-of-Sample Regression Predictions with K-Fold Cross-Validation

The following code trains the simple dataset using a 5-fold cross-validation. The expected performance of a neural network of the type trained here would be the score for the generated out-of-sample predictions. We begin by preparing a feature vector using the **jh-simple-dataset** to predict age. This model is set up as a regression problem.

In [None]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cuda


In [None]:
import io
import copy

class EarlyStopping():
  def __init__(self, patience=5, min_delta=0, restore_best_weights=True):
    self.patience = patience
    self.min_delta = min_delta
    self.restore_best_weights = restore_best_weights
    self.best_model = None
    self.best_loss = None
    self.counter = 0
    self.status = ""
    
  def __call__(self, model, val_loss):
    if self.best_loss == None:
      self.best_loss = val_loss
      self.best_model = copy.deepcopy(model)
    elif self.best_loss - val_loss > self.min_delta:
      self.best_loss = val_loss
      self.counter = 0
      self.best_model.load_state_dict(model.state_dict())
    elif self.best_loss - val_loss < self.min_delta:
      self.counter += 1
      if self.counter >= self.patience:
        self.status = f"Stopped on {self.counter}"
        if self.restore_best_weights:
          model.load_state_dict(self.best_model.state_dict())
        return True
    self.status = f"{self.counter}/{self.patience}"
    return False

In [None]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

# Generate dummies for product
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product")],axis=1)
df.drop('product', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('age').drop('id')
x = df[x_columns].values
y = df['age'].values

Now that the feature vector is created a 5-fold cross-validation can be performed to generate out-of-sample predictions.  We will assume 500 epochs and not use early stopping.  Later we will see how we can estimate a more optimal epoch count.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.autograd import Variable
from sklearn import preprocessing
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import KFold
from sklearn import metrics
import tqdm
import time

EPOCHS=500
BATCH_SIZE = 16

# Define the PyTorch Neural Network
class Net(nn.Module):
    def __init__(self, in_count, out_count):
        super(Net, self).__init__()
        # We must define each of the layers.
        self.fc1 = nn.Linear(in_count, 50)
        self.fc2 = nn.Linear(50, 25)
        self.fc3 = nn.Linear(25, 1)

    def forward(self, x):
        # In the forward pass, we must calculate all of the layers we 
        # previously defined.
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

# Cross-Validate
kf = KFold(5, shuffle=True, random_state=42) # Use for KFold classification
oos_y_list = []
oos_pred_list = []

fold = 0
for train, test in kf.split(x):
    fold+=1
    print(f"Fold #{fold}")
        
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Numpy to PyTorch
    x_train = torch.Tensor(x_train).float()
    y_train = torch.Tensor(y_train).float()

    x_test = torch.Tensor(x_test).float().to(device)
    y_test = torch.Tensor(y_test).float().to(device)

    # Create datasets
    dataset_train = TensorDataset(x_train, y_train)
    dataloader_train = DataLoader(dataset_train,\
      batch_size=BATCH_SIZE, shuffle=True)

    dataset_test = TensorDataset(x_test, y_test)
    dataloader_test = DataLoader(dataset_test,\
      batch_size=BATCH_SIZE, shuffle=True)

    # Train the network
    model = Net(x.shape[1],1).to(device)

    # Define the loss function for regression
    loss_fn = nn.MSELoss()

    # Define the optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    es = EarlyStopping()

    epoch = 0
    done = False
    while epoch<1000 and not done:
      epoch += 1
      steps = list(enumerate(dataloader_train))
      pbar = tqdm.tqdm(steps)
      model.train()
      for i, (x_batch, y_batch) in pbar:
        y_batch_pred = model(x_batch.to(device))
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        loss, current = loss.item(), (i + 1)* len(x_batch)
        if i == len(steps)-1:
          model.eval()
          pred = model(x_test)
          vloss = loss_fn(pred, y_test)
          if es(model,vloss): done = True
          pbar.set_description(f"Epoch: {epoch}, tloss: {loss}, vloss: {vloss:>7f}, EStop:[{es.status}]")
        else:
          pbar.set_description(f"Epoch: {epoch}, tloss {loss:}")
    
    pred = model(x_test)
    
    oos_y_list.append(y_test.cpu().detach())
    oos_pred_list.append(pred.cpu().detach())    

    # Measure this fold's RMSE
    score = np.sqrt(metrics.mean_squared_error(pred.cpu().detach(),y_test.cpu().detach()))
    print(f"Fold score (RMSE): {score}")

# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y_list)
oos_pred = np.concatenate(oos_pred_list)
score = np.sqrt(metrics.mean_squared_error(oos_pred,oos_y))
print(f"Final, out of sample score (RMSE): {score}")    
    
# Write the cross-validated prediction
oos_y = pd.DataFrame(oos_y)
oos_pred = pd.DataFrame(oos_pred)
oosDF = pd.concat( [df, oos_y, oos_pred],axis=1 )
#oosDF.to_csv(filename_write,index=False)


Fold #1


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 30.578550338745117, vloss: 43.894638, EStop:[0/5]: 100%|██████████| 100/100 [00:04<00:00, 22.60it/s]
Epoch: 2, tloss: 9.249143600463867, vloss: 22.657480, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 119.57it/s]
Epoch: 3, tloss: 14.337535858154297, vloss: 16.175705, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 103.78it/s]
Epoch: 4, tloss: 18.48504638671875, vloss: 16.491512, EStop:[1/5]: 100%|██████████| 100/100 [00:01<00:00, 79.80it/s]
Epoch: 5, tloss: 25.37423324584961, vloss: 19.750761, EStop:[2/5]: 100%|██████████| 100/100 [00:01<00:00, 73.26it/s]
Epoch: 6, tloss: 12.55465030670166, vloss: 16.859362, EStop:[3/5]: 100%|██████████| 100/100 [00:00<00:00, 116.70it/s]
Epoch: 7, tloss: 16.12624168395996, vloss: 17.102427, EStop:[4/5]: 100%|██████████| 100/100 [00:00<00:00, 125.50it/s]
Epoch: 8, tloss: 33.937992095947266, vloss: 17.794113

Fold score (RMSE): 4.177460670471191
Fold #2


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 46.080265045166016, vloss: 41.330215, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 121.87it/s]
Epoch: 2, tloss: 23.554439544677734, vloss: 19.077240, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 125.44it/s]
Epoch: 3, tloss: 9.904412269592285, vloss: 17.828743, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 144.42it/s]
Epoch: 4, tloss: 14.562386512756348, vloss: 19.205624, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 116.10it/s]
Epoch: 5, tloss: 15.748347282409668, vloss: 20.499165, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 149.69it/s]
Epoch: 6, tloss: 13.248491287231445, vloss: 16.023619, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 171.50it/s]
Epoch: 7, tloss: 13.56807804107666, vloss: 17.985609, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 174.91it/s]
Epoch: 8, tloss: 24.326189041137695, vloss: 16.

Fold score (RMSE): 3.90688157081604
Fold #3


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 50.78443145751953, vloss: 42.575474, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 166.35it/s]
Epoch: 2, tloss: 23.228158950805664, vloss: 16.819767, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 156.12it/s]
Epoch: 3, tloss: 35.030948638916016, vloss: 18.762680, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 168.18it/s]
Epoch: 4, tloss: 16.667007446289062, vloss: 15.578781, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 165.31it/s]
Epoch: 5, tloss: 25.617565155029297, vloss: 16.492470, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 166.26it/s]
Epoch: 6, tloss: 7.281801223754883, vloss: 14.466459, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 172.73it/s]
Epoch: 7, tloss: 15.833910942077637, vloss: 16.955782, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 162.36it/s]
Epoch: 8, tloss: 15.108352661132812, vloss: 15.

Fold score (RMSE): 3.9992218017578125
Fold #4


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 124.17198181152344, vloss: 56.555309, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 136.00it/s]
Epoch: 2, tloss: 20.612083435058594, vloss: 18.344934, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 149.57it/s]
Epoch: 3, tloss: 14.890791893005371, vloss: 15.424962, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 165.52it/s]
Epoch: 4, tloss: 8.32586669921875, vloss: 15.830782, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 163.84it/s]
Epoch: 5, tloss: 14.84170913696289, vloss: 14.789660, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 156.47it/s]
Epoch: 6, tloss: 7.293020725250244, vloss: 14.778517, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 153.11it/s]
Epoch: 7, tloss: 13.877546310424805, vloss: 14.837282, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 162.85it/s]
Epoch: 8, tloss: 12.859580993652344, vloss: 28.92

Fold score (RMSE): 3.5507962703704834
Fold #5


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 25.368885040283203, vloss: 47.316936, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 165.82it/s]
Epoch: 2, tloss: 31.06095314025879, vloss: 15.951253, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 163.07it/s]
Epoch: 3, tloss: 19.316743850708008, vloss: 15.081687, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 164.57it/s]
Epoch: 4, tloss: 13.616600036621094, vloss: 15.750995, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 162.58it/s]
Epoch: 5, tloss: 64.75232696533203, vloss: 13.487203, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 159.97it/s]
Epoch: 6, tloss: 11.454826354980469, vloss: 13.751986, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 161.67it/s]
Epoch: 7, tloss: 25.83644676208496, vloss: 15.916903, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 170.17it/s]
Epoch: 8, tloss: 13.691373825073242, vloss: 14.8

Fold score (RMSE): 3.3854522705078125
Final, out of sample score (RMSE): 3.815183162689209


As you can see, the above code also reports the average number of epochs needed.  A common technique is to then train on the entire dataset for the average number of epochs required.

## Classification with Stratified K-Fold Cross-Validation

The following code trains and fits the **jh**-simple-dataset dataset with cross-validation to generate out-of-sample.  It also writes the out-of-sample (predictions on the test set) results.

It is good to perform stratified k-fold cross-validation with classification data.  This technique ensures that the percentages of each class remain the same across all folds.  Use the **StratifiedKFold** object instead of the **KFold** object used in the regression.

In [None]:
import pandas as pd
from scipy.stats import zscore

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['age'] = zscore(df['age'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('product').drop('id')
x = df[x_columns].values
dummies = pd.get_dummies(df['product']) # Classification
products = dummies.columns
y = dummies.values

We will assume 500 epochs and not use early stopping.  Later we will see how we can estimate a more optimal epoch count.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.autograd import Variable
from sklearn import preprocessing
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import KFold
from sklearn import metrics
import tqdm
import time

EPOCHS=500
BATCH_SIZE = 16

# Define the PyTorch Neural Network
class Net(nn.Module):
    def __init__(self, in_count, out_count):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(in_count, 50)
        self.fc2 = nn.Linear(50, 25)
        self.fc3 = nn.Linear(25, out_count)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.softmax(self.fc3(x))

# Cross-Validate
kf = KFold(5, shuffle=True, random_state=42) # Use for KFold classification
oos_y_list = []
oos_pred_list = []

fold = 0
for train, test in kf.split(x):
    fold+=1
    print(f"Fold #{fold}")
        
    x_train = x[train]
    y_train = y[train]
    x_test = x[test]
    y_test = y[test]

    # Numpy to PyTorch
    x_train = torch.Tensor(x_train).float()
    y_train = torch.Tensor(y_train).float()

    x_test = torch.Tensor(x_test).float().to(device)
    y_test = torch.Tensor(y_test).float().to(device)

    # Create datasets
    dataset_train = TensorDataset(x_train, y_train)
    dataloader_train = DataLoader(dataset_train,\
      batch_size=BATCH_SIZE, shuffle=True)

    dataset_test = TensorDataset(x_test, y_test)
    dataloader_test = DataLoader(dataset_test,\
      batch_size=BATCH_SIZE, shuffle=True)

    # Train the network
    model = Net(x.shape[1],len(products)).to(device)

    # Define the loss function for classification
    loss_fn = nn.CrossEntropyLoss()# cross entropy loss

    # Define the optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    es = EarlyStopping()

    epoch = 0
    done = False
    while epoch<1000 and not done:
      epoch += 1
      steps = list(enumerate(dataloader_train))
      pbar = tqdm.tqdm(steps)
      model.train()
      for i, (x_batch, y_batch) in pbar:
        y_batch_pred = model(x_batch.to(device))
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        loss, current = loss.item(), (i + 1)* len(x_batch)
        if i == len(steps)-1:
          model.eval()
          pred = model(x_test)
          vloss = loss_fn(pred, y_test)
          if es(model,vloss): done = True
          pbar.set_description(f"Epoch: {epoch}, tloss: {loss}, vloss: {vloss:>7f}, EStop:[{es.status}]")
        else:
          pbar.set_description(f"Epoch: {epoch}, tloss {loss:}")
    
    pred = model(x_test)
    
    oos_y_list.append(y_test.cpu().detach())
    oos_pred_list.append(pred.cpu().detach())    

    # Measure this fold's RMSE
    #score = np.sqrt(metrics.mean_squared_error(pred.cpu().detach(),y_test.cpu().detach()))
    #print(f"Fold score (RMSE): {score}")

    # Measure this fold's accuracy
    y_compare = np.argmax(y_test.cpu().detach(),axis=1) # For accuracy calculation
    pred = np.argmax(pred.cpu().detach(),axis=1) # For accuracy calculation
    score = metrics.accuracy_score(y_compare, pred)
    print(f"Fold score (accuracy): {score}")


Fold #1


Epoch: 1, tloss: 1.6087956428527832, vloss: 1.670450, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 165.36it/s]
Epoch: 2, tloss: 1.6654220819473267, vloss: 1.671296, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 166.64it/s]
Epoch: 3, tloss: 1.563202977180481, vloss: 1.652720, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 169.47it/s]
Epoch: 4, tloss: 1.727919340133667, vloss: 1.650997, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 165.07it/s]
Epoch: 5, tloss: 1.6640689373016357, vloss: 1.670205, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 158.33it/s]
Epoch: 6, tloss: 1.6031205654144287, vloss: 1.655255, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 163.95it/s]
Epoch: 7, tloss: 1.5405099391937256, vloss: 1.655346, EStop:[3/5]: 100%|██████████| 100/100 [00:00<00:00, 170.34it/s]
Epoch: 8, tloss: 1.7903709411621094, vloss: 1.666811, EStop:[4/5]: 100%|██████████| 100/100 [00:00<00:00, 165.79it/s]
Epoch: 9, tloss: 1.9779222011566162, vloss: 1.652718, ESto

Fold score (accuracy): 0.51
Fold #2


Epoch: 1, tloss: 1.6029226779937744, vloss: 1.705420, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 169.29it/s]
Epoch: 2, tloss: 1.6029179096221924, vloss: 1.705420, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 170.14it/s]
Epoch: 3, tloss: 1.4779222011566162, vloss: 1.705420, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 174.69it/s]
Epoch: 4, tloss: 1.7902865409851074, vloss: 1.705407, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 164.60it/s]
Epoch: 5, tloss: 1.7904220819473267, vloss: 1.705422, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 172.88it/s]
Epoch: 6, tloss: 1.7904222011566162, vloss: 1.705422, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 168.53it/s]
Epoch: 7, tloss: 1.6654222011566162, vloss: 1.705422, EStop:[3/5]: 100%|██████████| 100/100 [00:00<00:00, 169.81it/s]
Epoch: 8, tloss: 1.7279222011566162, vloss: 1.705422, EStop:[4/5]: 100%|██████████| 100/100 [00:00<00:00, 171.20it/s]
Epoch: 9, tloss: 1.7904222011566162, vloss: 1.705422, ES

Fold score (accuracy): 0.46
Fold #3


Epoch: 1, tloss: 1.718611240386963, vloss: 1.717940, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 164.68it/s]
Epoch: 2, tloss: 1.4582247734069824, vloss: 1.473347, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 167.74it/s]
Epoch: 3, tloss: 1.3773486614227295, vloss: 1.490890, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 172.32it/s]
Epoch: 4, tloss: 1.3661129474639893, vloss: 1.477760, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 171.39it/s]
Epoch: 5, tloss: 1.478950023651123, vloss: 1.488662, EStop:[3/5]: 100%|██████████| 100/100 [00:00<00:00, 174.45it/s]
Epoch: 6, tloss: 1.42991042137146, vloss: 1.473330, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 164.33it/s]
Epoch: 7, tloss: 1.4827299118041992, vloss: 1.467703, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 164.59it/s]
Epoch: 8, tloss: 1.4781827926635742, vloss: 1.457410, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 169.18it/s]
Epoch: 9, tloss: 1.3880345821380615, vloss: 1.488907, EStop:

Fold score (accuracy): 0.7125
Fold #4


Epoch: 1, tloss: 1.7097184658050537, vloss: 1.614560, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 164.01it/s]
Epoch: 2, tloss: 1.4826369285583496, vloss: 1.554938, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 173.54it/s]
Epoch: 3, tloss: 1.6032769680023193, vloss: 1.571010, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 167.88it/s]
Epoch: 4, tloss: 1.7871029376983643, vloss: 1.530774, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 166.12it/s]
Epoch: 5, tloss: 1.5062246322631836, vloss: 1.539827, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 170.83it/s]
Epoch: 6, tloss: 1.616584300994873, vloss: 1.710253, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 168.25it/s]
Epoch: 7, tloss: 1.4750868082046509, vloss: 1.528093, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 162.25it/s]
Epoch: 8, tloss: 1.5480645895004272, vloss: 1.565197, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 152.87it/s]
Epoch: 9, tloss: 1.708003044128418, vloss: 1.531778, ESto

Fold score (accuracy): 0.64
Fold #5


Epoch: 1, tloss: 1.731004238128662, vloss: 1.684747, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 167.78it/s]
Epoch: 2, tloss: 1.657412052154541, vloss: 1.536799, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 171.53it/s]
Epoch: 3, tloss: 1.7215187549591064, vloss: 1.510977, EStop:[0/5]: 100%|██████████| 100/100 [00:00<00:00, 165.47it/s]
Epoch: 4, tloss: 1.606339454650879, vloss: 1.515504, EStop:[1/5]: 100%|██████████| 100/100 [00:00<00:00, 165.22it/s]
Epoch: 5, tloss: 1.3746075630187988, vloss: 1.637032, EStop:[2/5]: 100%|██████████| 100/100 [00:00<00:00, 170.65it/s]
Epoch: 6, tloss: 1.4720460176467896, vloss: 1.513013, EStop:[3/5]: 100%|██████████| 100/100 [00:00<00:00, 173.38it/s]
Epoch: 7, tloss: 1.5504348278045654, vloss: 1.561378, EStop:[4/5]: 100%|██████████| 100/100 [00:00<00:00, 166.76it/s]
Epoch: 8, tloss: 1.6036622524261475, vloss: 1.544061, EStop:[Stopped on 5]: 100%|██████████| 100/100 [00:00<00:00, 177.18it/s]

Fold score (accuracy): 0.66





In [None]:
# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y_list)
oos_pred = np.concatenate(oos_pred_list)
oos_y = np.argmax(oos_y,axis=1)
oos_pred = np.argmax(oos_pred,axis=1)
score = metrics.accuracy_score(oos_pred,oos_y)
print(f"Final OOS score (accuracy): {score}")

# Write the cross-validated prediction
oos_y_df = pd.DataFrame(oos_y)
oos_pred_df = pd.DataFrame(oos_pred)
oosDF = pd.concat( [df, oos_y_df, oos_pred_df],axis=1 )
#oosDF.to_csv(filename_write,index=False)

Final OOS score (accuracy): 0.5965


## Training with both a Cross-Validation and a Holdout Set

If you have a considerable amount of data, it is always valuable to set aside a holdout set before you cross-validate. This holdout set will be the final evaluation before using your model for its real-world use. Figure 5. HOLDOUT shows this division.

**Figure 5. HOLDOUT: Cross-Validation and a Holdout Set**
![Cross Validation and a Holdout Set](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/class_3_hold_train_val.png "Cross-Validation and a Holdout Set")

The following program uses a holdout set and then still cross-validates.  

In [None]:
import pandas as pd
from scipy.stats import zscore
from sklearn.model_selection import train_test_split

# Read the data set
df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/jh-simple-dataset.csv",
    na_values=['NA','?'])

# Generate dummies for job
df = pd.concat([df,pd.get_dummies(df['job'],prefix="job")],axis=1)
df.drop('job', axis=1, inplace=True)

# Generate dummies for area
df = pd.concat([df,pd.get_dummies(df['area'],prefix="area")],axis=1)
df.drop('area', axis=1, inplace=True)

# Generate dummies for product
df = pd.concat([df,pd.get_dummies(df['product'],prefix="product")],axis=1)
df.drop('product', axis=1, inplace=True)

# Missing values for income
med = df['income'].median()
df['income'] = df['income'].fillna(med)

# Standardize ranges
df['income'] = zscore(df['income'])
df['aspect'] = zscore(df['aspect'])
df['save_rate'] = zscore(df['save_rate'])
df['subscriptions'] = zscore(df['subscriptions'])

# Convert to numpy - Classification
x_columns = df.columns.drop('age').drop('id')
x = df[x_columns].values
y = df['age'].values

Now that the data has been preprocessed, we are ready to build the neural network.

In [None]:
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from torch.autograd import Variable
from sklearn import preprocessing
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import KFold
from sklearn import metrics
import tqdm
import time

# Keep a 10% holdout
x_main, x_holdout, y_main, y_holdout = train_test_split(    
    x, y, test_size=0.10) 

x_holdout = torch.Tensor(x_holdout).float().to(device)
#y_holdout = torch.Tensor(y_holdout).float().to(device)

EPOCHS=500
BATCH_SIZE = 16

# Define the PyTorch Neural Network
class Net(nn.Module):
    def __init__(self, in_count, out_count):
        super(Net, self).__init__()
        # We must define each of the layers.
        self.fc1 = nn.Linear(in_count, 50)
        self.fc2 = nn.Linear(50, 25)
        self.fc3 = nn.Linear(25, 1)

    def forward(self, x):
        # In the forward pass, we must calculate all of the layers we 
        # previously defined.
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

# Cross-Validate
kf = KFold(5, shuffle=True, random_state=42) # Use for KFold classification
oos_y_list = []
oos_pred_list = []

fold = 0
for train, test in kf.split(x_main):
    fold+=1
    print(f"Fold #{fold}")
        
    x_train = x_main[train]
    y_train = y_main[train]
    x_test = x_main[test]
    y_test = y_main[test]

    # Numpy to PyTorch
    x_train = torch.Tensor(x_train).float()
    y_train = torch.Tensor(y_train).float()

    x_test = torch.Tensor(x_test).float().to(device)
    y_test = torch.Tensor(y_test).float().to(device)

    # Create datasets
    dataset_train = TensorDataset(x_train, y_train)
    dataloader_train = DataLoader(dataset_train,\
      batch_size=BATCH_SIZE, shuffle=True)

    dataset_test = TensorDataset(x_test, y_test)
    dataloader_test = DataLoader(dataset_test,\
      batch_size=BATCH_SIZE, shuffle=True)

    # Train the network
    model = Net(x.shape[1],1).to(device)

    # Define the loss function for regression
    loss_fn = nn.MSELoss()

    # Define the optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    es = EarlyStopping()

    epoch = 0
    done = False
    while epoch<1000 and not done:
      epoch += 1
      steps = list(enumerate(dataloader_train))
      pbar = tqdm.tqdm(steps)
      model.train()
      for i, (x_batch, y_batch) in pbar:
        y_batch_pred = model(x_batch.to(device))
        loss = loss_fn(y_batch_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        loss, current = loss.item(), (i + 1)* len(x_batch)
        if i == len(steps)-1:
          model.eval()
          pred = model(x_test)
          vloss = loss_fn(pred, y_test)
          if es(model,vloss): done = True
          pbar.set_description(f"Epoch: {epoch}, tloss: {loss}, vloss: {vloss:>7f}, EStop:[{es.status}]")
        else:
          pbar.set_description(f"Epoch: {epoch}, tloss {loss:}")
    
    pred = model(x_test)
    
    oos_y_list.append(y_test.cpu().detach())
    oos_pred_list.append(pred.cpu().detach())    

    # Measure this fold's RMSE
    score = np.sqrt(metrics.mean_squared_error(pred.cpu().detach(),y_test.cpu().detach()))
    print(f"Fold score (RMSE): {score}")  

Fold #1


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 72.56485748291016, vloss: 54.645782, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 162.35it/s]
Epoch: 2, tloss: 19.881534576416016, vloss: 19.900444, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 159.16it/s]
Epoch: 3, tloss: 17.700916290283203, vloss: 18.652140, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 175.86it/s]
Epoch: 4, tloss: 11.553352355957031, vloss: 16.144112, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 181.55it/s]
Epoch: 5, tloss: 12.69009017944336, vloss: 15.238962, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 166.40it/s]
Epoch: 6, tloss: 12.362306594848633, vloss: 18.409935, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 180.71it/s]
Epoch: 7, tloss: 8.815723419189453, vloss: 16.545795, EStop:[2/5]: 100%|██████████| 90/90 [00:00<00:00, 177.16it/s]
Epoch: 8, tloss: 16.19245147705078, vloss: 14.816410, EStop:[0

Fold score (RMSE): 3.838550090789795
Fold #2


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 65.76457977294922, vloss: 50.920387, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 154.95it/s]
Epoch: 2, tloss: 13.398838996887207, vloss: 16.759438, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 157.08it/s]
Epoch: 3, tloss: 7.2599968910217285, vloss: 16.348392, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 160.90it/s]
Epoch: 4, tloss: 33.01200866699219, vloss: 15.119592, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 164.66it/s]
Epoch: 5, tloss: 15.800431251525879, vloss: 14.899918, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 162.19it/s]
Epoch: 6, tloss: 18.202293395996094, vloss: 18.668779, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 171.22it/s]
Epoch: 7, tloss: 9.618451118469238, vloss: 24.711700, EStop:[2/5]: 100%|██████████| 90/90 [00:00<00:00, 172.80it/s]
Epoch: 8, tloss: 5.976446151733398, vloss: 14.655143, EStop:[0

Fold score (RMSE): 3.5167007446289062
Fold #3


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 79.90404510498047, vloss: 54.838440, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 157.64it/s]
Epoch: 2, tloss: 36.34873580932617, vloss: 17.494644, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 156.38it/s]
Epoch: 3, tloss: 28.1075496673584, vloss: 19.427069, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 175.10it/s]
Epoch: 4, tloss: 17.659423828125, vloss: 15.360039, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 174.25it/s]
Epoch: 5, tloss: 17.803409576416016, vloss: 15.534598, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 170.95it/s]
Epoch: 6, tloss: 13.97917366027832, vloss: 14.996989, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 178.17it/s]
Epoch: 7, tloss: 25.034334182739258, vloss: 16.119177, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 174.24it/s]
Epoch: 8, tloss: 24.613719940185547, vloss: 20.997194, EStop:[2/5]:

Fold score (RMSE): 3.9091691970825195
Fold #4


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 53.84135055541992, vloss: 48.789276, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 159.68it/s]
Epoch: 2, tloss: 12.208961486816406, vloss: 24.114445, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 161.54it/s]
Epoch: 3, tloss: 7.535606861114502, vloss: 22.149927, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 164.53it/s]
Epoch: 4, tloss: 14.461817741394043, vloss: 15.741406, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 169.88it/s]
Epoch: 5, tloss: 12.456652641296387, vloss: 16.828775, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 172.67it/s]
Epoch: 6, tloss: 28.93013572692871, vloss: 21.270340, EStop:[2/5]: 100%|██████████| 90/90 [00:00<00:00, 175.13it/s]
Epoch: 7, tloss: 11.464447975158691, vloss: 15.651596, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 168.37it/s]
Epoch: 8, tloss: 15.525167465209961, vloss: 19.280731, EStop:[

Fold score (RMSE): 3.7630841732025146
Fold #5


  return F.mse_loss(input, target, reduction=self.reduction)
  return F.mse_loss(input, target, reduction=self.reduction)
Epoch: 1, tloss: 83.92796325683594, vloss: 69.805984, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 153.55it/s]
Epoch: 2, tloss: 31.090370178222656, vloss: 24.205339, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 171.21it/s]
Epoch: 3, tloss: 19.13016128540039, vloss: 22.080797, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 170.25it/s]
Epoch: 4, tloss: 18.185937881469727, vloss: 18.302670, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 175.63it/s]
Epoch: 5, tloss: 8.021072387695312, vloss: 16.906092, EStop:[0/5]: 100%|██████████| 90/90 [00:00<00:00, 168.42it/s]
Epoch: 6, tloss: 15.051443099975586, vloss: 17.442562, EStop:[1/5]: 100%|██████████| 90/90 [00:00<00:00, 168.33it/s]
Epoch: 7, tloss: 11.39267635345459, vloss: 18.741539, EStop:[2/5]: 100%|██████████| 90/90 [00:00<00:00, 169.44it/s]
Epoch: 8, tloss: 15.929899215698242, vloss: 16.490429, EStop:[0

Fold score (RMSE): 4.034495830535889





In [None]:
# Build the oos prediction list and calculate the error.
oos_y = np.concatenate(oos_y_list)
oos_pred = np.concatenate(oos_pred_list)
score = np.sqrt(metrics.mean_squared_error(oos_pred,oos_y))
print(f"Final, out of sample score (RMSE): {score}")  

Final, out of sample score (RMSE): 3.816312551498413


In [None]:
# Write the cross-validated prediction (from the last neural network)
holdout_pred = model(x_holdout).cpu().detach()

score = np.sqrt(metrics.mean_squared_error(holdout_pred,y_holdout))
print(f"Holdout score (RMSE): {score}") 

Holdout score (RMSE): 3.519663184152827
