<a href="https://colab.research.google.com/github/luimui/KI-2/blob/main/03-neural-networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural networks

In this exercise, we will look at two very simple examples of neural networks. If you are interested in neural networks beyond that, I recommend the course "AI 3: Artificial Neural Networks" in the summer semester.

If you are interested in the theory and mathematical details, I highly recommend the following book: https://www.deeplearningbook.org/ (by Yoshua Bengio, who, together with Geoffrey Hinton and Yann LeCun, received the 2018 Turing Award for work on deep learning methods).

*Task 1*: Explain the core idea and structure of neurons and neural networks!
* Slides Intermediate defense
* see MA

## Neural networks for regression

In this exercise we use `PyTorch` for training neural networks. You can find more information about this package here: https://pytorch.org

However, the examples we look at in the exercise should (hopefully) be fairly self-explanatory. In the example below, we describe the network structure in PyTorch. This network consists of only a single linear layer with only one node, the activation function is the identity. We also specify that we want to use the mean squared error as the loss function and the Adam optimizer as the optimizer.
   
To define a neural network, we now need to write a class that inherits from the `torch.Module` class. We have to provide an `__init__()` and `forward()` method. In the `__init__()` method, we define the structure of our neural network. The `forward()` method defines how the so-called forward pass is calculated for our neural network (i.e. how the NN maps the input to the output).    

*Task*: Get familiar with code below as it serves as a basis for the upcoming tasks!

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

class LinearModel(nn.Module):
    """Class for our linear model. Here we have to overwrite some of the generic funcionalities of the nn.Module class of PyTorch.
    """
    def __init__(self):
        """This is basically how we define our model architecture.
        This function will be automatically called when we initialize an instance of our LinearModel class.
        """
        # execute initialization code for nn.Module()
        super(LinearModel, self).__init__()
        # Building block for our network architecture -> a linear layer
        self.linear = nn.Linear(1,          # number of inputs into the layer
                                1,          # number of units in the layer
                                bias=True)  # nn.linear() has a bias term by default

        # Activation functions we want to use for our network
        self.identity = nn.Identity()       # Identity function

    def forward(self, x: torch.Tensor)-> torch.Tensor:
        """With this method we define how the forward pass of the model is computed.

        Args:
            x (torch.Tensor): Input to our network.

        Returns:
            torch.Tensor: Output of the network.
        """
        x = self.linear(x)      # feed input to linear layer
        x = self.identity(x)    # compute identity on the ouput of our linear layer

        return x

# initialize the model
model = LinearModel()

# define the loss function and optimizer for our network
criterion = nn.MSELoss()  # Mean Squared Error Loss
optimizer = optim.Adam(model.parameters(), lr=0.1) # Adam optimizer


In order to be able to train and evaluate our model, we need to also specify the respective function.    

*Task*: Get familiar with code below as it serves as a basis for the upcoming tasks!

In [2]:
from typing import Callable

def train_linear_model(model: nn.Module, X: torch.Tensor, y:torch.Tensor, criterion:Callable, optimizer: Callable):
    """This is the function with which we can train our model.

    Args:
        model (nn.Module): The model we want to train.
        X (torch.Tensor): The input we want to use to train our model.
        y (torch.Tensor): The targets which belong to the inputs in X.
        criterion (Callable): Our loss function.
        optimizer (Callable): The optimizer we want to use to train our model.
    """
    # compute predictions for all inputs
    y_pred = model(X)
    # compute average loss over all precictions
    loss = criterion(y_pred, y)
    # reset the optimizer
    optimizer.zero_grad()
    # compute the gradients
    loss.backward()
    # optimize the network using the computet gradients
    optimizer.step()

def evaluate_linear_model(model: nn.Module, X: torch.Tensor, y: torch.Tensor, criterion: Callable) -> float:
    """This is the function with which we can evaluate our model.

    Args:
        model (nn.Module): The model we want to train.
        X (torch.Tensor): The input we want to use to train our model.
        y (torch.Tensor): The targets which belong to the inputs in X.
        criterion (Callable): Our loss function.

    Returns:
        float: The average loss for the test dataset.
    """
    # compute no gradients for all inputs -> better performance
    with torch.no_grad():
        # compute predictions for all inputs
        y_pred = model(X)
        # compute average loss over all precictions
        l = criterion(y_pred, y).item()
        return l



Now we have everything we need to train our first neural network. In the example below, we repeat the training over 5 iterations, training 25 epochs in each iteration (this means with each iteration we go through the complete training dataset 25 times).    

*Task*: Execute the training.

In [6]:
from sklearn.model_selection import train_test_split
from tqdm import tqdm
import pandas as pd

from google.colab import drive
drive.mount('/content/drive')

# load the iris dataset into a dataframe
df = pd.read_csv('/content/drive/MyDrive/KI2WS202324/iris.csv')


# get features X and targets y and convert them into tensors
X = torch.tensor(df.SepalWidth.values, dtype=torch.float32).view(-1, 1)
y = torch.tensor(df.PetalLength.values, dtype=torch.float32).view(-1, 1)
# split X and y into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)

# set the number of epochs and iterations
epochs = 25
iterations = 5
num_epochs = 0

# execute the training and evluation
for i in range(iterations):
    with tqdm(total=epochs, desc=f'Iteration {i+1}/{5}', unit='epoch') as pbar:
        # training of the neural network
        for epoch in range (epochs):
            train_linear_model(model, X_train, y_train, criterion, optimizer)
            pbar.update(1)
        # evaluation of the neural network
        test_loss = evaluate_linear_model(model, X_test, y_test, criterion)
        num_epochs = num_epochs+epochs
        print(f'\nLoss after {num_epochs} epochs: {test_loss}')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Iteration 1/5: 100%|██████████| 25/25 [00:00<00:00, 1204.22epoch/s]



Loss after 25 epochs: 2.7031302452087402


Iteration 2/5: 100%|██████████| 25/25 [00:00<00:00, 1075.73epoch/s]



Loss after 50 epochs: 2.7176971435546875


Iteration 3/5: 100%|██████████| 25/25 [00:00<00:00, 1234.46epoch/s]



Loss after 75 epochs: 2.7248427867889404


Iteration 4/5: 100%|██████████| 25/25 [00:00<00:00, 957.15epoch/s]



Loss after 100 epochs: 2.7327001094818115


Iteration 5/5: 100%|██████████| 25/25 [00:00<00:00, 1179.12epoch/s]


Loss after 125 epochs: 2.738478183746338





In [None]:
X;

*Task*: What does our input $X$ look like?
* $X = $ Tensor / like np.array of floats from 2 to 4  / n*1-matix  

*Task*: What kind of function does this network represent?   
* $\hat{y} = \phi(\beta_0 + beta_1*x_1)$  $\phi is the identity$ linear function, as the input is intercept + one feature in design matrix and activation funciton is just identity (instead of sigmoid function to squish into [0,1]

*Task*: What happens if the input is of the form $x = [x_1, x_2, ..., x_n]^T$? What does our weight matrix $W$ look like? What is the equation for $\hat{y}$?
* $W = 1 * n -Vector$
* $\hat{y} = f(\beta_0 + \beta_1*x_1 + ... \beta_n*x_n)$

*Task*: What happens if we additionally extend our layer to k units/neurons? What does our weight matrix $W$ look like? What is the equation for $\hat{y}$?
* $W = k*n$ matrix k units
* $\mathbf{\hat{y}} = F(Wx +b) =[y_1, y_2,...,y_k]^T $   
*KLAUSUR*
*Task*: Display the parameters of the model.


\begin{bmatrix}
              1 & x_1^1 & ...x_n^1\\
              1 & x_1^2 & ...x_n^2\\
              ... & ... & ...\\
              1 & x_1^m & ... x_n^m\\
\end{bmatrix}

In [7]:
for name, param in model.named_parameters():
  print(f'{name}: {param.data}')

linear.weight: tensor([[-1.8722]])
linear.bias: tensor([9.4385])


*Task*: Compare the results with the linear regression from exercise 2. Implement another neural network with 3 layers, where the first layer should have 10 neurons, the second 5 neurons and the third 1 neuron. In addition, use the `ReLU()` (recitified Linear Unit) function as the activation function of the first layer.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim

class MLP(nn.Module):
    """Class for our linear model. Here we have to overwrite some of the generic funcionalities of the nn.Module class of PyTorch.
    """
    def __init__(self):
        """This is basically how we define our model architecture.
        This function will be automatically called when we initialize an instance of our LinearModel class.
        """
        # execute initialization code for nn.Module()
        super(MLP, self).__init__()
        # TODO: define the building blocks for our network architecture
        self.linear1 = nn.Linear(1,          # number of inputs into the layer
                                10,          # number of units in the layer
                                bias=True)  # nn.linear() has a bias term by default

        self.linear2 = nn.Linear(10,          # number of inputs into the layer
                                5,          # number of units in the layer
                                bias=True)  # nn.linear() has a bias term by default

        self.linear3 = nn.Linear(5,          # number of inputs into the layer
                                1,          # number of units in the layer
                                bias=True)  # nn.linear() has a bias term by default

        # TODO: define the activation functions for our network
        self.relu = nn.ReLU()               # ReLU function
        self.identity = nn.Identity()           # Identity function


    def forward(self, x: torch.Tensor)-> torch.Tensor:
        """With this method we define how the forward pass of the model is computed.

        Args:
            x (torch.Tensor): Input to our network.

        Returns:
            torch.Tensor: Output of the network.
        """
        x = self.linear1(x)    # feed input x to our first linear layer
        x = self.relu(x)    # compute ReLU on the ouput of our linear layer 1
        x = self.linear2(x)    # feed x to our second linear layer
        x = self.relu(x)    # compute identity on the ouput of our linear layer 2
        x = self.linear3(x)    # feed x to our third linear layer
        x =  self.identity(x)    # compute identity on the ouput of our linear layer 3
        return x

# initialize the model
model = MLP()

# define the loss function and optimizer for our network
criterion = nn.MSELoss()  # Mean Squared Error Loss
optimizer = optim.Adam(model.parameters(), lr=0.001) # Adam optimizer


Now train your new model! Display the parameters of the model and interpret them! What are the dimensions of the individual parameters of your network?

In [16]:
from sklearn.model_selection import train_test_split
from tqdm import tqdm
X = torch.tensor(df.SepalWidth.values, dtype=torch.float32).view(-1, 1)
y = torch.tensor(df.PetalLength.values, dtype=torch.float32).view(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2)

iterations = 5
epochs = 25
num_epochs = 0

for i in range(iterations):
    with tqdm(total=epochs, desc=f'Iteration {i+1}/{5}', unit='epoch') as pbar:
        for epoch in range (epochs):
            train_linear_model(model, X_train, y_train, criterion, optimizer)
            test_loss = evaluate_linear_model(model, X_test, y_test, criterion)
            pbar.update(1)
        num_epochs = num_epochs+epochs
        print(f'\nLoss after {num_epochs} epochs: {test_loss}')

Iteration 1/5: 100%|██████████| 25/25 [00:00<00:00, 199.83epoch/s]



Loss after 25 epochs: 4.141355514526367


Iteration 2/5: 100%|██████████| 25/25 [00:00<00:00, 194.40epoch/s]



Loss after 50 epochs: 4.152929306030273


Iteration 3/5: 100%|██████████| 25/25 [00:00<00:00, 128.44epoch/s]



Loss after 75 epochs: 4.152390003204346


Iteration 4/5: 100%|██████████| 25/25 [00:00<00:00, 297.06epoch/s]



Loss after 100 epochs: 4.148686408996582


Iteration 5/5: 100%|██████████| 25/25 [00:00<00:00, 430.95epoch/s]


Loss after 125 epochs: 4.144994735717773





In [17]:
for name, param in model.named_parameters():
  print(f'{name}: {param.data}')

linear1.weight: tensor([[ 0.9242],
        [-0.7345],
        [ 0.8187],
        [ 0.1533],
        [-0.0849],
        [ 0.1012],
        [ 0.6942],
        [-0.9916],
        [-0.8407],
        [ 1.1007]])
linear1.bias: tensor([ 0.2010, -0.3640,  0.5897,  0.8945, -0.8068, -0.8127,  0.6193, -0.9725,
        -0.5255,  0.0542])
linear2.weight: tensor([[-0.0091, -0.0985,  0.1333, -0.3442,  0.2090,  0.0814, -0.0436,  0.1068,
          0.1633,  0.0403],
        [ 0.6790, -0.0046,  0.5983,  0.3062, -0.1705, -0.0454,  0.4906, -0.2882,
          0.2402,  0.5087],
        [ 0.0695, -0.2384, -0.2948, -0.0587,  0.0665, -0.1551, -0.2556, -0.1682,
          0.0724,  0.0822],
        [ 0.2578, -0.2512,  0.2247, -0.1631,  0.2784, -0.1273, -0.2225,  0.2579,
         -0.0564, -0.2591],
        [-0.2330, -0.2989,  0.2117, -0.0733, -0.1431, -0.0815, -0.1492,  0.0847,
         -0.0137,  0.1147]])
linear2.bias: tensor([-0.3402,  0.7645, -0.2807, -0.0897, -0.1791])
linear3.weight: tensor([[-0.1550,  0.4306,

## Neural networks for classification
Next, let's look at a much more complicated data set: MNIST contains images of size 28*28 of handwritten digits. The task is to develop a classifier for these digits.    

First, we load the training and test data sets. Both data sets are then loaded into so-called data loaders.    

*Task*: What is a DataLoader used for? Find out for yourself how to load the data into the `DataLoader()` class and set the batch size of `train_loader` to 32.

In [None]:
import torch
from torchvision import datasets, transforms

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

train_data = datasets.MNIST(root=r'data', train=True, download=True,
                                               transform=transform)
test_data = datasets.MNIST(root=r'data', train=False, download=True,
                                               transform=transform)

# initialize dataLoader for train set
train_loader = 'TODO'

# initialize dataLoader for test set
test_loader = 'TODO'


### Attention!
The following tasks require research on your part. Find out for yourself how you can solve the individual tasks!

*Task a)*: Access the data point with the index 42 in the `train_loader`. What can you tell me about the general data structure of a data point in the `train_loader`?

*Task b)*: How do you access the features $X$ and labels/targets $y$ of the individual data points? What data types and dimensions do $X$ and $y$ have? Display the label for index 42! Which data format and which dimensions do the features have?    

*Task c)*: What is the data format of a batch in `train_loader`? What dimensions do the features and labels of a batch have?    
(Hint: Use `next(iter(train_loader))` to access the first batch in `train_loader`).

In [None]:
# Task a):
datapoint_type = 'TODO'                                                             # get type for datapoint at index 42
print(f'The datastructure of index 42 is {datapoint_type}.\n')

# Task b):
y_label = 'TODO'                                                                    # get label y
print(f'Index 42 contains the number {y_label}.')

x_type = 'TODO'                                                                     # get type of x
x_dim = 'TODO'                                                                      # get dimensions of x
print(f'The features have the dataformat {x_type} and the shape {x_dim}.\n')

# Task c):
batch = 'TODO'                                                                      # get first batch
batch_type = 'TODO'                                                                 # get type of batch
print(f'A batch has the data type {batch_type}.')

batch_X_dim = 'TODO'                                                                # get dimensions of X
batch_y_dim = 'TODO'                                                                # get dimensions of y
print(f'The features X of a batch have the dimension {batch_X_dim}.')
print(f'The labels y of a batch have the dimension {batch_y_dim}.')

*Task*: Create a very simple network with only one linear layer, which has 10 neurons and uses the identity as activation function. Make sure you specify the correct number of inputs for the linear layer (note: we use an image with 28x28 pixels as input).    

*Task*: Which error function should we choose for the problem at hand?

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

class MNISTNet(nn.Module):
    """Class for our linear model. Here we have to overwrite some of the generic funcionalities of the nn.Module class of PyTorch.
    """
    def __init__(self):
        """This is basically how we define our model architecture.
        This function will be automatically called when we initialize an instance of our LinearModel class.
        """
        # execute initialization code for nn.Module()
        super(MNISTNet, self).__init__()
        # TODO: Building blocks for our network architecture
        self.linear = 'TODO'

        # TODO: define the activation functions for our network
        self.identity = 'TODO'         # Identity function


    def forward(self, x: torch.Tensor)-> torch.Tensor:
        """With this method we define how the forward pass of the model is computed.

        Args:
            x (torch.Tensor): Input to our network.

        Returns:
            torch.Tensor: Output of the network.
        """
        x = 'TODO'     # feed input x to our first linear layer
        x = 'TODO'    # compute identity on the ouput of our linear layer 3
        return x

# initialize the model
model = MNISTNet()

# define the loss function and optimizer for our network
criterion = 'TODO'
optimizer = optim.Adam(model.parameters(), lr=0.001, eps=1e-07) # Adam optimizer

As you may have noticed, the dimensions of our features in $X$ (1x28x28) do not match the dimensions of the input (784) of our network that we have just defined. What we can do now is to transform the features of each data point into a tensor with only one dimension.    

*Task*: Find out how you can transform a tensor with the dimensions nx1x28x28 into a tensor with the dimensions nx784, where n stands for the size or the number of elements of a batch. Insert your solution at the appropriate place in the code below!

In [None]:
from typing import Callable, Tuple
from tqdm import tqdm

def train_model(model: nn.Module, dataloader: torch.utils.data.DataLoader, criterion:Callable, optimizer: Callable):
    """This is the function with which we can train our model.

    Args:
        model (nn.Module): The model we want to train.
        dataloader(torch.utils.data.DataLoader): The dataloader which contains all the data.
        criterion (Callable): Our loss function.
        optimizer (Callable): The optimizer we want to use to train our model.
    """
    with tqdm(total=len(dataloader), desc=f'\t Training: ', unit=' batches') as pbar:
        for X, y in dataloader:
            X = 'TODO'
            # compute predictions for all inputs
            y_pred = model(X)
            # compute average loss over all precictions
            loss = criterion(y_pred, y)
            # reset the optimizer
            optimizer.zero_grad()
            # compute the gradients
            loss.backward()
            # optimize the network using the computet gradients
            optimizer.step()
            pbar.update(1)

def evaluate_model(model: nn.Module, dataloader: torch.utils.data.DataLoader, criterion: Callable) -> Tuple[float, float]:
    """This is the function with which we can evaluate our model.

    Args:
        model (nn.Module): The model we want to train.
        dataloader(torch.utils.data.DataLoader): The dataloader which contains all the data.
        criterion (Callable): Our loss function.

    Returns:
        Tuple[float, float]: Average loss and accuracy on test data.
    """
    with tqdm(total=len(dataloader), desc=f'\t Test: ', unit=' batches') as pbar:
        # compute no gradients for all inputs -> better performance
        with torch.no_grad():
            loss = 0
            correct_predictions = 0
            total_samples = 0
            for X, y in dataloader:
                X = 'TODO'
                # compute predictions for all inputs
                y_pred = model(X)
                # compute accumulated loss over all precictions
                l = criterion(y_pred, y).item()
                loss += l
                # compute accumulated accuracy
                _, predicted = torch.max(y_pred, 1)
                correct_predictions += (predicted == y).sum().item()
                total_samples += y.size(0)
                pbar.update(1)
    # compute avg loss and accuracy
    loss = loss / len(dataloader)
    accuracy = correct_predictions / total_samples
    return loss, accuracy

Now train your model. What accuracy will you achieve in the end?

In [None]:
# set the number of epochs and iterations
epochs = 1
num_epochs = 0

# execute the training and evluation
for epoch in range(epochs):
    print(f'Epoch {epoch+1}:')
    train_model(model, train_loader, criterion, optimizer)
    # evaluation of the neural network
    test_loss, test_accuracy = evaluate_model(model, test_loader, criterion)
    num_epochs = num_epochs+epochs
    print(f'\t Loss: {test_loss}')
    print(f'\t Accuracy: {test_accuracy}')