<a href="https://colab.research.google.com/github/sdgroeve/D012554_Machine_Learning_2023/blob/main/02_neural_networks_in_pytorch_lightning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [125]:
#@title
import requests
from pathlib import Path 

request = requests.get("https://raw.githubusercontent.com/sdgroeve/D012554_Machine_Learning_2023/main/utils/utils.py")
with open("utils.py", "wb") as f:
  f.write(request.content)

from utils import plot_decision_boundary

In [126]:
#@title
!pip install tqdm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# 2. Neural networks in PyTorch


In [127]:
import torch
from torch import nn 

torch.manual_seed(46)

# Check PyTorch version
torch.__version__

'1.13.1+cu116'

## Preparing the data

The dataset for this notebook is in a flat file called `dataset_neural_networks.csv`. 

We read this file into a Pandas DataFrame.

In [128]:
from torch.utils.data import Dataset

class XORDataset(Dataset):
    # This loads the data and converts it, make data rdy
    def __init__(self):
        # load data
        fn = "https://raw.githubusercontent.com/sdgroeve/D012554_Machine_Learning_2023/main/datasets/dataset_neural_networks.csv"
        self.df=pd.read_csv(fn)
        # extract labels
        self.df_labels=self.df[['y']]
        self.df.pop('y')
        # conver to torch dtypes
        self.dataset=torch.tensor(self.df.to_numpy(),dtype=torch.float)
        self.labels=torch.tensor(self.df_labels.to_numpy(),dtype=torch.float)
    
    # This returns the total amount of samples in your Dataset
    def __len__(self):
        return len(self.dataset)
    
    # This returns given an index the i-th sample and label
    def __getitem__(self, idx):
        return self.dataset[idx],self.labels[idx]

In [129]:
import torch.utils.data as data
from torch.utils.data import DataLoader
import pytorch_lightning as pl

class XORDataModule(pl.LightningDataModule):
    def __init__(self, batch_size: int = 32):
        super().__init__()
        self.batch_size = batch_size

    def setup(self, stage: str):
        self.data = XORDataset()

        # Random split
        self.train_set_size = int(len(self.data) * 0.8)
        self.valid_set_size = len(self.data) - self.train_set_size
        self.train_set, self.valid_set = data.random_split(self.data, [self.train_set_size, self.valid_set_size])

    def train_dataloader(self):
        return DataLoader(self.train_set, batch_size=self.batch_size)

    def val_dataloader(self):
        return DataLoader(self.valid_set, batch_size=self.batch_size)

## Building the model

We increase the complexity of our model by adding an additional linear layer to the **model architecture**.   

In [130]:
!pip install pytorch-lightning

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [131]:
import pytorch_lightning as pl


class NeuralNetwork(pl.LightningModule):
    def __init__(self, input_dim, output_dim):
        super().__init__()

        num_neurons_layer_2 = 6

        self.layer_1 = nn.Linear(in_features=input_dim, out_features=num_neurons_layer_2)
        self.layer_2 = nn.Linear(in_features=num_neurons_layer_2, out_features=output_dim)
        
        self.sigmoid = nn.Sigmoid()
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.layer_1(x))
        x = self.layer_2(x)
        x = self.sigmoid(x)
        return x

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
        return [optimizer], [lr_scheduler]

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = nn.functional.binary_cross_entropy(y_hat, y)
        print("train_loss = %f"%loss)
        return loss    

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        val_loss = nn.functional.binary_cross_entropy(y_hat, y)
        print("val_loss = %f"%val_loss)

### `__init()__`

Our neural network has two linear layers. The first layer `layer_1` has `input_dim` (the number of features in our dataset) input features that form the **input layer**. It has `num_neurons_layer_2` output features that form the **hidden layer** where these features are typically called **hidden neurons**.

The second layer `layer_2` has `num_neurons_layer_2` input features (neurons) and `output_dim` (which equals to 1 for two-class classification) output features, the **output layer**.

An example of this model architecture with `num_neurons_layer_2 = 6` can be seen [here](https://playground.tensorflow.org/#activation=sigmoid&batchSize=30&dataset=xor&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=6&seed=0.86658&showTestData=false&discretize=false&percTrainData=70&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false).

We will use the Rectified Linear Unit (ReLU) activation function in the hidden layer. In the output layer we use the sigmoid function (through `BCEWithLogitsLoss`, so we not to explicitly apply the sigmoid function during inference (see notebook about logistic regression)).

Next, we create an instance of the class `NeuralNetwork`.

In [132]:
# Two inputs x_1 and x_2
input_dim = 2  
# Single binary output 
output_dim = 1 

# Create an instance of the model (this is a subclass of nn.Module that contains nn.Parameter(s))
model = NeuralNetwork(input_dim, output_dim)

model.state_dict()

OrderedDict([('layer_1.weight', tensor([[ 0.2278, -0.6223],
                      [ 0.0246, -0.4814],
                      [ 0.3607,  0.4794],
                      [-0.6118, -0.0522],
                      [-0.4982, -0.1984],
                      [ 0.6120, -0.6830]])),
             ('layer_1.bias',
              tensor([ 0.6697, -0.1261,  0.5120,  0.5373,  0.2219,  0.4458])),
             ('layer_2.weight',
              tensor([[-0.0155,  0.1950, -0.3828,  0.1673,  0.1930, -0.3202]])),
             ('layer_2.bias', tensor([-0.2894]))])

In [133]:
print(model.layer_1.weight.dtype)

torch.float32


Let's plot the decision boundary of this initial neural network.

In [134]:
#plot_decision_boundary(model, X_train, y_train)

In [135]:
import pandas as pd
trainer = pl.Trainer(limit_train_batches=100, max_epochs=1)
xor = XORDataModule()
trainer.fit(model,xor)
#trainer.validate(model,xor)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.callbacks.model_summary:
  | Name    | Type    | Params
------------------------------------
0 | layer_1 | Linear  | 18    
1 | layer_2 | Linear  | 7     
2 | sigmoid | Sigmoid | 0     
3 | relu    | ReLU    | 0     
------------------------------------
25        Trainable params
0         Non-trainable params
25        Total params
0.000     Total estimated model params size (MB)


dtype dataset
torch.float32
labels
tensor([[0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [1.],
        [0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [1.],
        [0.],

Sanity Checking: 0it [00:00, ?it/s]

  rank_zero_warn(


val_loss = 0.672341
val_loss = 0.696593


Training: 0it [00:00, ?it/s]

Validation: 0it [00:00, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=1` reached.


val_loss = 0.665179
val_loss = 0.685039
val_loss = 0.700893
val_loss = 0.717619
val_loss = 0.784055
val_loss = 0.701019
val_loss = 0.817705


### `forward()`

The `forward()` method applies the neural network to the provided feature vectors. Here we see that the data is first passed through `layer_1`, then through the ReLU activations that then pass through `layer_2`.

In [136]:
from sklearn.metrics import accuracy_score

with torch.inference_mode(): 
    predictions = model(X_test)

predictions = torch.squeeze(torch.round(torch.sigmoid(predictions)))
predictions = predictions.detach().numpy()

print("test set accuracy: {}".format(accuracy_score(y_test,predictions)))

NameError: ignored

## Training the model

We use `BCEWithLogitsLoss` as the loss function and SGD, `torch.optim.SGD(params, lr)` as the optimizer.

In [None]:
learning_rate = 0.005

#the loss function
loss_func = torch.nn.BCEWithLogitsLoss()

#the optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Now we can create and run our training and validation loop.



In [None]:
#number of times we iterate trough the train set
num_epochs = 8000

for epoch in range(num_epochs):

    #step 1
    predictions_train = torch.squeeze(model(X_train)) 

    #step 2
    loss = loss_func(predictions_train, y_train) 

    #step 3
    optimizer.zero_grad() 

    #step 4
    loss.backward() 

    #step 5
    optimizer.step() 
        
    if epoch % 500 == 0:    
      print("training loss: {}".format(loss))    
      model.eval()
      with torch.inference_mode(): 
        predictions_val = torch.squeeze(torch.round(torch.sigmoid(model(X_val)))).detach().numpy()
        print("validation accuracy: {}".format(accuracy_score(y_val,predictions_val)))
      model.train()
      plot_decision_boundary(model, X_train, y_train)
      plt.show()


## Computing predictions and evaluating the model


In [None]:
model.eval()

with torch.inference_mode(): 
    predictions_test = model(X_test)

predictions_test = torch.round(torch.sigmoid(torch.squeeze(predictions_test))).detach().numpy()

print("test set accuracy: {}".format(accuracy_score(y_test,predictions_test)))

In [None]:
plot_decision_boundary(model, X_test, y_test)