<div class="alert block alert-info alert">

# <center> Scientific Programming in Python

## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> PyTorch: Simple Neural Network Example

## <center>  with a Perceptron

<hr style="border:2px solid gray"></hr>

This lecture will parallel the perceptron example written using NumPy, allowing you to compare the approaches directly.

<br>

<center><img src="00_images/31_machine_learning/nn_perceptron_example.png" alt="nn_percepton" style="width: 1000px;"/></center>

<br>

The neural network (NN) will be written in two ways:
1. Basic - to explicitly show all of the steps in a neural network training
2. Advance - to show the basics for how most PyTorch code is actually written

<hr style="border:2px solid gray"></hr>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.optim as optim

Create a helper function that allows us to investigate the different arrays that are used below:

In [2]:
def print_array_specs(in_arrays: dict):
    ''' Helper function for nicely printing NumPy and
        PyTorch arrays.

        Print: shape, data type and values.
    '''
    for key, value in in_arrays.items():
        print(f'{key}:\n{value.shape}, {value.dtype}')
        print(f'{value}\n')

## Basic Example

#### Define the toy data (input values, target values and initial weights):

##### A reminder from the NumPy lecture

A random **seed** will be **explicitly set**, allowing for **reproducible results** (i.e., for teaching purposes). The first epoch data generated below should correspond to the numeric values given in the figure above.

The object naming will also be done to parallel the figure above.

Random Number Generator in NumPY:
- `np.random.default_rng`: https://numpy.org/doc/stable/reference/random/generator.html
- `numpy.random.Generator.normal`: https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html

<br>

**Important Note**: Normally with <font color='dodgerblue'>real-world data</font>, one often should <font color='dodgerblue'>normalize</font> (e.g., **transpose** the date to a range [0, 1]) the <font color='dodgerblue'>input data</font>. This helps the mathematics when different input features have **large magnitude differences** (e.g., 1.5 and 2.5e6).
- https://en.wikipedia.org/wiki/Normalization_(statistics)
- `sklearn.preprocessing.normalize`: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html

In this example, we don't need to worry about this due to how we generate the toy data.

In [3]:
rng = np.random.default_rng(seed=12345)

input_X1_np = rng.normal(size=(2, 10))
target_Y2_np = rng.normal(size=(2, 1))

weight_W1_np = rng.normal(size=(10, 3))
weight_W2_np = rng.normal(size=(3, 1))

Examine the different NumPy arrays:
- shapes (important for matrix multiplication)
- data types (need to be same types)
- values

In [None]:
objects_ini = {'input_X1': input_X1_np, 'target_Y2': target_Y2_np,
               'weight_W1': weight_W1_np, 'weight_W2': weight_W2_np}

print_array_specs(in_arrays=objects_ini)

#### Initialize import parameters

**Neural Network Architecture**
- <font color='dodgerblue'>input_size</font>: how many **data points** are in each **feature** (i.e., each node) within the **input layer**
- <font color='dodgerblue'>hidden_size</font>: how many **data points** are in **each node** within the **hidden layer**
- <font color='dodgerblue'>output_size</font>: how many **data points** are in **each node** within the **output layer**

**Training Parameters**
- <font color='dodgerblue'>learning_rate</font>: **step size** for **gradient descent**
- <font color='dodgerblue'>num_epochs</font>: how many **training epochs** to **run** (instead of having a convergence cutoff criteria)

In [None]:
input_size = 10
hidden_size = 3
output_size = 2

learning_rate = 1e-3
num_epochs = 50

### Now Focus on PyTorch

##### Prepare data

- The <font color='dodgerblue'>NumPy-generated input</font> and initial data values need to be <font color='dodgerblue'>converted to torch arrays</font> using **`torch.from_numpy()`**.

- We can also <font color='dodgerblue'>improve upon</font> the original Numpy model by <font color='dodgerblue'>including biases</font>. These will be used in the <font color='dodgerblue'>linear transform</font> (e.g., **`torch.mm(input_X1, weight_W1) + bias_B1`**).

- Care must be given to specify that **`autograd`** should <font color='dodgerblue'>record operations</font> for the <font color='dodgerblue'>weights and biases</font> (i.e., calculation history), using **`requires_grad_(requires_grad=True)`**.
    - Reminder: <font color='dodgerblue'>only the weights and biases</font> need to be <font color='dodgerblue'>updated</font> based on the <font color='dodgerblue'>loss gradient</font>.

In [None]:
input_X1 = torch.from_numpy(input_X1_np)
target_Y2 = torch.from_numpy(target_Y2_np)

weight_W1 = torch.from_numpy(weight_W1_np).requires_grad_(requires_grad=True)
weight_W2 = torch.from_numpy(weight_W2_np).requires_grad_(requires_grad=True)

bias_B1 = torch.zeros(hidden_size, requires_grad=True)
bias_B2 = torch.zeros(output_size, requires_grad=True)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2,
               'weight_W1': weight_W1, 'input_B1': bias_B1,
               'weight_W2': weight_W2, 'input_B2': bias_B2}

print_array_specs(in_arrays=objects_ini)

#### Model Training

**Multiplying two matrices** (dot product/matrix multiplication):
- `torch.mm(mat1, mat2)`
    - https://pytorch.org/docs/stable/generated/torch.mm.html
    - <font color='dodgerblue'>2-D tensors</font> as inputs

- `torch.matmul(mat1, mat2)`
    - https://pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul
    - <font color='dodgerblue'>more versatile</font>: matrix x matrix, matrix x vector and vector x vector operations
        - (see `broadcasting` for more info: https://www.geeksforgeeks.org/understanding-broadcasting-in-pytorch/)

Both functions are equivalent below in ***this*** particular example. 

<br>

**Element-wise Multiplication** (e.g., <font color='dodgerblue'>multiplying a float</font> and a <font color='dodgerblue'>matrix</font>):
- `torch.mul(input, other)`
    - https://pytorch.org/docs/stable/generated/torch.mul.html
    - `input`: tensor
    - `other`: tensor or number

- Could also use `*`

Both functions are demonstrated below. 

<br>

**Further Explanations**
- `activation = torch.nn.ReLU()`: specify a <font color='dodgerblue'>**callable**</font> for the <font color='dodgerblue'>ReLU</font> activation function
    - https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

<br>

- `torch.autograd.backward` computes the **gradient** (<font color='dodgerblue'>backward pass</font>) in the entire neural network for objects that have **`requires_grad=True`**
    - https://www.geeksforgeeks.org/python-pytorch-backward-function/

<br>

- `with torch.no_grad()`: required because the weights and biases require grad
    - https://pytorch.org/docs/stable/generated/torch.no_grad.html
    - <font color='dodgerblue'>Reduce memory consumption</font> for computations versus those that `requires_grad=True` 
    - If you tried to assign `weight_W1`, `bias_B1`, `weight_W2` and `bias_B2` without this `with torch.no_grad()` you would obtain the following error:
        - `RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.`

<br>

- `torch.Tensor.zero_`: fills a given tensor with zeros
    - https://pytorch.org/docs/stable/generated/torch.Tensor.zero_.html
    - If this was **not done**, the gradients <font color='dodgerblue'>would be accumulated</font> during `.backwards()`, which would not be correct for each <font color='dodgerblue'>forward pass evaluation</font>
    - The **`_`** indicates an **`inplace`** operation (like what we know from Pandas)

In [None]:
for epoch in range(num_epochs):
    # Forward pass
    X2 = torch.mm(input_X1, weight_W1) + bias_B1
    
    # activation = torch.nn.LeakyReLU(0.1)
    activation = torch.nn.ReLU()
    Y1 = activation(X2)
    
    output_Y2 = torch.matmul(Y1, weight_W2) + bias_B2

    loss = torch.mean(torch.square(torch.subtract(output_Y2, target_Y2))) # mean( (Y2-y_target)^2 )

    # Backward pass
    loss.backward()

    # Optimization: update weights and biases
    with torch.no_grad():
        weight_W1 -= torch.mul(learning_rate, weight_W1.grad)
        bias_B1 -= torch.mul(learning_rate, bias_B1.grad)
        weight_W2 -= learning_rate * weight_W2.grad
        bias_B2 -= learning_rate * bias_B2.grad
        print(f'bias 1 grad: {loss.grad}, {bias_B1}, {bias_B1.grad}, {torch.mul(learning_rate, bias_B1.grad)}')
        print(f'bias 2 grad: {loss.grad}, {bias_B2}, {bias_B2.grad}, {torch.mul(learning_rate, bias_B2.grad)}')

        # Reset the gradients to zero
        weight_W1.grad.zero_()
        bias_B1.grad.zero_()
        weight_W2.grad.zero_()
        bias_B2.grad.zero_()

    # print(weight_W1.grad) # visual proof that they are zero

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

    objects_ini = {'weight_W1': weight_W1, 'bias_B1': bias_B1,
               'weight_W2': weight_W2, 'bias_B2': bias_B2}
    print()
    print_array_specs(in_arrays=objects_ini)

In [11]:
## Adding arrays
display(input_X1_np)

weight_example = np.full((10), 10)

input_X1_np + weight_example

array([[-1.42382504,  1.26372846, -0.87066174, -0.25917323, -0.07534331,
        -0.74088465, -1.3677927 ,  0.6488928 ,  0.36105811, -1.95286306],
       [ 2.34740965,  0.96849691, -0.75938718,  0.90219827, -0.46695317,
        -0.06068952,  0.78884434, -1.25666813,  0.57585751,  1.39897899]])

array([[ 8.57617496, 11.26372846,  9.12933826,  9.74082677,  9.92465669,
         9.25911535,  8.6322073 , 10.6488928 , 10.36105811,  8.04713694],
       [12.34740965, 10.96849691,  9.24061282, 10.90219827,  9.53304683,
         9.93931048, 10.78884434,  8.74333187, 10.57585751, 11.39897899]])

#### Summary of Basic Example:
- <font color='dodgerblue'>Tensor creation</font>: Using PyTorch's `from_numpy()` and `zeros()`
- <font color='dodgerblue'>Autograd</font>: Utilizing `requires_grad_()` for automatic differentiation
- Matrix operations: <font color='dodgerblue'>Matrix multiplication</font> (`torch.mm` and `torch.matmul`).
- <font color='dodgerblue'>Activation functions</font>: Implementing a **ReLU** activation function
- <font color='dodgerblue'>Gradients</font>: All computed in **one function call** of `backward()`
- <font color='dodgerblue'>Loss function</font>: Calculating **mean squared error loss**
- <font color='dodgerblue'>Optimization</font>: Performing **manual gradient descent**
- <font color='dodgerblue'>Reset</font> the weight and bias <font color='dodgerblue'>gradients</font>: PyTorch's `.grad.zero_()`

<hr style="border:2px solid gray"></hr>

## Advance Example

Create the same neural network, but now make it even better (readable, K.I.S.S., reusable) using PyTorch:

- uses `torch.nn`: **modules/functions** for **building** a **neural networks**
    - https://pytorch.org/docs/stable/nn.html

<br>

- Uses a class
    - the NN is defined as a subclass of **`nn.Module`**: the <font color='dodgerblue'>base class</font> for all <font color='dodgerblue'>neural network modules</font>
        - https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
        - Enables **easier organization** and **management** of **layers** and **parameters**
    - classes are basically a <font color='dodgerblue'>blueprint</font> that can be <font color='dodgerblue'>reused</font>
        - contains a collection of related functions
        - **Personal Opinon**: they are **often unnecessary** - must have a good reason to implement

<br>

- `torch.nn.Linear`: applies a <font color='dodgerblue'>linear transformation</font> to the <font color='dodgerblue'>incoming data</font>
    - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear
    - below, `fc1` and `fc2` represent **"fully connected"** <font color='dodgerblue'>layers</font>
    - **weights** and **biases** are <font color='dodgerblue'>**automatically initialized**</font>

<br>

- `torch.nn.ReLU`: **ReLU** activation function
    -  https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

<br>

- use a **built-in optimizer**

#### Define the neural network

In [None]:
class Net(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()

        self.fc1 = torch.nn.Linear(input_size, hidden_size)
        self.fc2 = torch.nn.Linear(hidden_size, output_size)
        self.ReLU = torch.nn.ReLU()



    def forward(self, x):
        x = self.fc1(x)
        x = self.ReLU(x)
        x = self.fc2(x)

        return x

#### Revisiting the toy data
Some of PyTorch's functions require the numbers to be `float32` (GPUs are optimized for these). Our above `input_X1` and `input_Y2` tensors have numbers that are `float64`.
- `to(torch.float32)`: changes the tensor's `dtype`

In [None]:
# Prepare data
input_X1 = input_X1.to(torch.float32)
target_Y2 = target_Y2.to(torch.float32)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2}

print_array_specs(in_arrays=objects_ini)

#### Model, Loss and Optimizer
- create the NN model

<br>

- define the **loss function** to use
    - `torch.nn.MSELoss`: mean squared error (a.k.a., Loss2; L2)
        - https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss

<br>

- define the **optimizing function** (i.e., `optim.SGD`) for adjusting the weights and biases
    - Optimization overview: https://pytorch.org/docs/stable/optim.html#module-torch.optim
    - Available algorithms: https://pytorch.org/docs/stable/optim.html#algorithms
        - **gradient decent**: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
     
##### Coding concept: assigning functions to variables
For example: `loss_function = torch.nn.MSELoss()`

Why do this?
- Changing code's behavior: reassign the variable to a different function (e.g., explore different ideas)
- Abstraction: abstract away the specific implementation details
    - more readable
    - more modular
    - easier to maintain

In [None]:
model = Net(input_size, hidden_size, output_size)

loss_function = torch.nn.MSELoss()

optimizer = optim.SGD(model.parameters(), lr=learning_rate)

### Model Training

- `zero_grad()`: reset the gradients of all optimized tensors
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
    - this is the same concept as above when we used `torch.Tensor.zero` in the basic example
        - this is necessary since `.backward()` accumulates the gradients each time it is called
 
- `torch.optim.Optimizer.step`: perform an **optimization step** based on the **current gradients** (stored in `.grad`), which is coming from **`.backward()`** 
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html

In [None]:
for epoch in range(num_epochs):
    # Forward pass
    output_Y2 = model(input_X1)

    loss = loss_function(output_Y2, target_Y2)

    # Backward pass
    optimizer.zero_grad()

    loss.backward()

    # Optimization: update weights and biases
    optimizer.step()
    
# Final outputs, weights and biases
print(f'\nFinal Output: \n {output_Y2}\n')
objects_ini = model.state_dict()

print_array_specs(in_arrays=objects_ini)

Notice the shapes of the weights - they are not yet transposed as done in the above basic example.

#### Summary of Advance Example:
- A class (like a blueprint) and `nn.Module`: a structured PyTorch approach for defining a neural network (e.g., better organization and code reusability)
- Built-in Activation: `torch.nn.ReLU`
- Built-in Loss: `torch.nn.MSELoss` for mean squared loss (i.e., Loss2; L2)
- Built-in Optimizer: `optim.SGD` for gradient descent and usage of `.step()`

In [None]:
from torchmetrics.functional.regression import r2_score
from torcheval.metrics import R2Score
metric = R2Score()
metric.update(input, target)
metric.compute()