# A real NN, on data and overfitting
We'll see what overfitting means.  On a trivial dataset, which we generate, we will have the neural network fit noise.  Early stopping can solve this problem.  In this exercise, we use Pytorch with Lightning to set up your neural network.

(c) Patrick van der Smagt, March 2023.  Please do not distribute this without Patrick's consent (he will give it).

In [None]:
import torch
from torch import optim, nn
from torch.utils.data import DataLoader
import lightning.pytorch as pl
from lightning.pytorch.callbacks import TQDMProgressBar, ModelCheckpoint
from lightning.pytorch.loggers import TensorBoardLogger

import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## data set
We create a simple artificial data set which is easy to plot.  It's a sine which is sampled, to which Gaussian noise is added

In [None]:

# #data in the test and train set
n_train = 30
n_val = 30
n_test = 30
# variance of the noise we put on top
var = 0.2

np.random.seed(0) # everyone should have the same start
x_train = np.arange(n_train)/(n_train/10.)
y_train = np.vectorize(math.sin)(x_train)+np.random.normal(0,var,x_train.shape[0])

x_val = np.arange(n_val)/(n_val/10.) + 5./n_val
y_val = np.vectorize(math.sin)(x_val)+np.random.normal(0,var,x_val.shape[0])

x_test = np.arange(n_test)/(n_test/10.) + 5./n_test
y_test = np.vectorize(math.sin)(x_test)+np.random.normal(0,var,x_test.shape[0])

# the following may be necessary on some Macs
x_train = x_train.astype('float32')/10.
y_train = y_train.astype('float32')
x_val = x_val.astype('float32')/10.
y_val = y_val.astype('float32')
x_test = x_test.astype('float32')/10.
y_test = y_test.astype('float32')

## plot
Blue crosses are the validation data, green dots the training data.

In [None]:
plt.plot(x_train,y_train,'go')
plt.plot(x_val,y_val,'bx')


## neural network
Set up the neural network and the pytorch-lightning helper functions. 
The neural network is set up by subsequent layers, first summing, then putting the nonlinear transformation function.   It also includes a layer called "dropout".  That will have parameter value 0.0 for now, meaning it's not used.

In [None]:

# define the LightningModule
class NeuralNetwork(pl.LightningModule):
    def __init__(self, dropout = 0.0):
        super().__init__()
        self.dropout = dropout # store in the module
        self.neuralnet =  nn.Sequential(nn.Linear(1, 50), nn.Tanh(), nn.Dropout(dropout), nn.Linear(50, 50), nn.Tanh(),nn.Linear(50, 1))

    
    def forward(self, x):
        # flatten the tensor, because the Linear layer only accepts a vector (1d array)
        x = x.view(x.size(0), -1)
        out = self.neuralnet(x)[:,0]
        return out

    def training_step(self, batch, batch_idx):
        x, y = batch
        z = self.forward(x)
        loss = nn.functional.mse_loss(z, y)
        # Logging to TensorBoard (if installed) by default
        self.log("train_loss", loss)
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        z = self.forward(x)
        loss = nn.functional.mse_loss(y, z)
        self.log("val_loss", loss)
        return loss

    def test_step(self, batch, batch_idx):
        x, y = batch
        z = self.forward(x)
        loss = nn.functional.mse_loss(y, z)
        self.log("test_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters())
        return optimizer


nnn = NeuralNetwork(dropout = 0.5)


The helper functions for lightning that give the data to the NN training functions are defined next.

In [None]:
train_loader = DataLoader(dataset = (x_train, y_train), batch_size=10)
val_loader = DataLoader(dataset = (x_val, y_val), batch_size=10)
test_loader = DataLoader(dataset = (x_test, y_test), batch_size=10)

Now we create the neural network and start training.

In [None]:
logger = TensorBoardLogger("tensorlogs", name="sine")
checkpoint_callback = ModelCheckpoint(dirpath="checkpoint", save_top_k=1, monitor="val_loss")
trainer = pl.Trainer(max_epochs=1000, callbacks=[TQDMProgressBar(refresh_rate=0), checkpoint_callback], logger=logger)


In [None]:
trainer.fit(nnn, train_dataloaders=train_loader, val_dataloaders=val_loader)

In [None]:
%tensorboard --logdir logs/fit

In [None]:
# train a bit more
trainer.fit_loop.max_epochs = 8000
trainer.fit(nnn, train_dataloaders=train_loader, val_dataloaders=val_loader)

In [None]:
#trainer.test(ckpt_path="best", dataloaders=train_loader)
#trainer.test(ckpt_path="best", dataloaders=val_loader)
#trainer.test(ckpt_path="last", dataloaders=val_loader)
#trainer.test(ckpt_path="last", dataloaders=test_loader)

When the training is done, we plot the result.  We do so by creating a new input set, called the test set, we run it through the NN, and plot the result.  We also plot the training and the test data, as dots and crosses.

In [None]:
def plot_result(nn):
    # plot the data, test data, and fit
    n = 200
    x_test = np.arange(n)/(n/10.)/10.
    y_test = nn(torch.from_numpy(x_test).float()).data.numpy()
    plt.plot(x_test, y_test)
    plt.plot(x_train,y_train,'go')
    plt.plot(x_val,y_val,'bx')

In [None]:
plot_result(nnn)

Now load the best model w.r.t. the validation data set.  This is called "early stopping", as we use the model where we stopped training when the error on the validation set was lowest.

In [None]:
best_xval_model = NeuralNetwork.load_from_checkpoint(checkpoint_callback.best_model_path)
plot_result(best_xval_model)

Now redo the above and set `dropout=0.5`.  Compare.

In [None]:
nnnd = NeuralNetwork(dropout = 0.5)
trainer.fit_loop.max_epochs = 9000
trainer.fit(nnnd, train_dataloaders=train_loader, val_dataloaders=val_loader)
plot_result(nnnd)