# Summary

This notebook will train a DL model using PyTorch.  

The dataset is the same example as before, the california housing dataset from sklearn.

In [1]:
# imports
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
import torch
from torch import nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from typing import Tuple

# Dataset and Dataloader

PyTorch provides Dataset and DataLoader.  These classes help to decouple the dataset from the model training code.  

Create a Dataset and DataLoader using the published interface so that we can cleanly supply the california housing data to a pytorch model.  

Create a train/test set as before and corresponding PyTorch Datasets.

In [2]:
housing = fetch_california_housing()

In [3]:
x_train, x_test, y_train, y_test = train_test_split(housing.data,
                                                    housing.target,
                                                    test_size=0.1,
                                                    random_state=66)

In [4]:
# reshape y values to add a column dimension
y_train = y_train.reshape(-1,1)
y_test = y_test.reshape(-1,1)

Standardizing the train and test data is essential and helps the model converge faster and find a lower mimimum for the loss.  It is a common misconception that neural networks do not need standardized numeric data, or at least do not benefit as much as a statistical model.  

In [5]:
# standardize x values to assist in convergence
# Compute mean and standard deviation
mean = np.mean(x_train, axis=0)
std = np.std(x_train, axis=0)

# Perform standardization
# When standardizing the test set it is important to use the training set mean/std  
#  otherwise information about your test data will bleed into your evaluation.
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std

To create a custom Dataset class, implement the interface provided by torch.utils.data.Dataset.  This includes functions __init__, __len__, __getitem__

In [6]:
class CaliforniaHousingDataset(Dataset):

    def __init__(self, x, y):
        self.x = torch.tensor(data=x,
                              dtype=torch.float32)
        self.y = torch.tensor(data=y,
                              dtype=torch.float32)

    def __len__(self) -> int:
        return len(self.x)
    
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, torch.Tensor]:
        return self.x[index], self.y[index]


Create instances of the CaliforniaHousingDataset class.  
One for training data and one for test data.

In [7]:
train_data = CaliforniaHousingDataset(x=x_train, y=y_train)
test_data = CaliforniaHousingDataset(x=x_test, y=y_test)

In [8]:
print(f'Train: {len(train_data)}')
print(f'Test: {len(test_data)}')

Train: 18576
Test: 2064


The DataLoader class is used to build training batches and optionally shuffle the data at each epoch.  
Use a large batch size for this trivial problem.

In [9]:
train_dataloader = DataLoader(train_data, batch_size=256, shuffle=True)

# Model

The neural network model is class defined as a subclass of torch.nn.Module.  
The __init__ function is used to define the network, input and output shapes, hidden layers, and the final layer.
The node of a hidden layer in a neural network is a linear function, followed by a non-linear activation function.  The network definition below includes each linear function and each activation function.

The code below will create a simple network with 2 hidden layers.  The inputs initially have 8 dimensions, the first hidden layer will expand this to 32 nodes, the 2nd hidden layer will collapse down to 4 nodes.  The final layer will output a single value representing the prediction y value MedHouseVal.

This is probably a more complex network than is needed for this dataset, used to demonstrate the use of hidden layers.

In [10]:
class HousingNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.model_output = nn.Sequential(
            nn.Linear(8, 32),
            nn.ReLU(),
            nn.Linear(32, 4),
            nn.ReLU(),
            nn.Linear(4, 1),
        )

    def forward(self, x):
        return self.model_output(x)

Create an instance of HousingNeuralNetwork and print its structure.

In [11]:
model = HousingNeuralNetwork().to("cpu")
print(model)

HousingNeuralNetwork(
  (model_output): Sequential(
    (0): Linear(in_features=8, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=4, bias=True)
    (3): ReLU()
    (4): Linear(in_features=4, out_features=1, bias=True)
  )
)


# Optimization

A neural network needs a loss function and an optimizer in order to be trained.

The loss function is the value the neural network will work to minimize during training.  The optimizer represents the method used to learn optimal values of the trainable parameters.

The Housing dataset will use MSELoss (mean squared error) and the optimizer is SGD (stochastic gradient descent).  These choices are consistent with the Neural Network From Scratch.

In [12]:
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)

# Train

Training requires iterations over the batches of data made available via the DataLoader.  
Each pass over the entire dataset is considered an epoch, the training loop will usually be run for multiple epochs.  

Typically the loss values are logged within the mini-batches.  This can be a little deceiving for 2 reasons:  
1.  The loss within a mini-batch will often oscilate due to variations in the training data.  This can be a little misleading, what matters if the loss if viewed in this manner is that it trends down.
2.  What matters is that the loss is decreasing epoch over epoch.  So long as the loss continues to decrease epoch to epoch training should continue (at the risk of overfitting).  A common practice is to stop training once the loss stops decreasing by a certain threshold (early stopping).
   
The example below trains for a large number of epochs (256).  The training loss continues to decrease and does not begin to become unstable until about epoch 180.  This instability is expected as the optimizer has found the minima for the loss function and subsequent attempts to learn cause the loss to oscilate around this minima.  


In [13]:
def train_loop(dataloader, model, loss_fn, optimizer):
    model.train() # Puts model into train mode.
    epoch_loss = 0.0
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to("cpu"), y.to("cpu")

        optimizer.zero_grad()   # 0 out gradents for next computation.

        # forward propagation
        pred = model(X)         # Get predicted values for inputs
        loss = loss_fn(pred, y) # Compute loss

        # Backpropagation
        loss.backward()         # Compute gradients
        optimizer.step()        # Update trainable parameters.

        # print/track loss
        epoch_loss += loss.item()

    return epoch_loss

In [14]:
epochs = 256
for t in range(epochs):
    epoch_loss = train_loop(train_dataloader, model, loss_fn, optimizer)
    print(f'Epoch {t+1} loss: {epoch_loss}')


Epoch 1 loss: 130.59871238470078
Epoch 2 loss: 60.54444885253906
Epoch 3 loss: 51.07842779159546
Epoch 4 loss: 47.814879179000854
Epoch 5 loss: 45.5601641535759
Epoch 6 loss: 43.62619835138321
Epoch 7 loss: 41.85950309038162
Epoch 8 loss: 40.26475903391838
Epoch 9 loss: 38.65514063835144
Epoch 10 loss: 37.1774580180645
Epoch 11 loss: 35.87482288479805
Epoch 12 loss: 34.72008156776428
Epoch 13 loss: 33.6549214720726
Epoch 14 loss: 32.65789410471916
Epoch 15 loss: 31.978035241365433
Epoch 16 loss: 31.32862663269043
Epoch 17 loss: 30.92395830154419
Epoch 18 loss: 30.562648057937622
Epoch 19 loss: 30.229025542736053
Epoch 20 loss: 29.957667380571365
Epoch 21 loss: 29.70040661096573
Epoch 22 loss: 29.447811484336853
Epoch 23 loss: 29.212144523859024
Epoch 24 loss: 29.13114669919014
Epoch 25 loss: 28.900121957063675
Epoch 26 loss: 28.77874591946602
Epoch 27 loss: 28.54921844601631
Epoch 28 loss: 28.51083305478096
Epoch 29 loss: 28.34859025478363
Epoch 30 loss: 28.246335357427597
Epoch 31 los

# Test

The test dataset does not require a loop or the mini-batches,
Simply compute the test predictions in one call to the trained model.

The loss on the holdout test set is higher than that reported in training, which is normal and expected.  The difference indicates variation between the datasets or some overfitting on the training data.


In [15]:
model.eval() # Put model in evaluation mode.

with torch.no_grad():
    test_predictions = model(test_data.x)
    test_loss = loss_fn(test_predictions, test_data.y).item()

print(f'Loss (MSE) on test dataset {test_loss}')

Loss (MSE) on test dataset 0.30753740668296814


Compute R2 on test dataset for comparison to baseline.

In [16]:
r2_test = r2_score(test_data.y.numpy(), test_predictions.numpy())

print(f"test set r2 score: {r2_test}")


test set r2 score: 0.7813434600830078


# Conclusion

This PyTorch neural network reports a good score on our chosen metric (r^2).  It beats our baseline and it beats the neural network from scratch (perceptron).  However, it does not beat the baseline improvement score (0.836).  

The purpose of this notebook was to demonstrate building and training a neural network with the PyTorch framework.  It was not necessarily to build the best possible PyTorch model.  It is possible that further experimentation with a neural network approach to this problem could beat our best model.  
