<div class="alert block alert-info alert">

# <center> Scientific Programming in Python

## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> PyTorch: Simple Neural Network Example

## <center>  with a Perceptron

<hr style="border:2px solid gray"></hr>

This lecture will parallel the perceptron example written using NumPy, allowing you to compare the approaches directly.

<br>

<center><img src="00_images/31_machine_learning/nn_perceptron_example.png" alt="nn_percepton" style="width: 800px;"/></center>

<br>

The neural network (NN) will be written in two ways:
1. Basic - to explicitly show all of the steps in a neural network training
2. Advance - to show the basics for how most PyTorch code is actually written

<hr style="border:2px solid gray"></hr>

#### New Term:
**An Epoch**: one complete pass of the entire neural network (forward and backward propagation).

<hr style="border:2px solid gray"></hr>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.optim as optim

Create a helper function that allows us to investigate the different arrays that are used below:

In [2]:
def print_array_specs(in_arrays: dict):
    ''' Helper function for nicely printing NumPy and
        PyTorch arrays.

        Print: shape, data type and values.
    '''
    for key, value in in_arrays.items():
        print(f'{key}:\n{value.shape}, {value.dtype}')
        print(f'{value}\n')

## Basic Example

#### Define the toy data (input values, target values and initial weights):

##### A reminder from the NumPy lecture

A random **seed** will be **explicitly set**, allowing for **reproducible results** (i.e., for teaching purposes). The first epoch data generated below should correspond to the numeric values given in the figure above.

The object naming will also be done to parallel the figure above.

Random Number Generator in NumPY:
- `np.random.default_rng`: https://numpy.org/doc/stable/reference/random/generator.html
- `numpy.random.Generator.normal`: https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html

<br>

**Important Note**: Normally with <font color='dodgerblue'>real-world data</font>, one often should <font color='dodgerblue'>normalize</font> (e.g., **transpose** the date to a range [0, 1]) the <font color='dodgerblue'>input data</font>. This helps the mathematics when different input features have **large magnitude differences** (e.g., 1.5 and 2.5e6).
- https://en.wikipedia.org/wiki/Normalization_(statistics)
- `sklearn.preprocessing.normalize`: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html

In this example, we don't need to worry about this due to how we generate the toy data.

In [3]:
rng = np.random.default_rng(seed=12345)

input_X1_np = rng.normal(size=(2, 10))
target_Y2_np = rng.normal(size=(2, 1))

weight_W1_np = rng.normal(size=(10, 3))
weight_W2_np = rng.normal(size=(3, 1))

Examine the different NumPy arrays:
- shapes (important for matrix multiplication)
- data types (need to be same types)
- values

In [4]:
objects_ini = {'input_X1': input_X1_np, 'target_Y2': target_Y2_np,
               'weight_W1': weight_W1_np, 'weight_W2': weight_W2_np}

print_array_specs(in_arrays=objects_ini)

input_X1:
(2, 10), float64
[[-1.42382504  1.26372846 -0.87066174 -0.25917323 -0.07534331 -0.74088465
  -1.3677927   0.6488928   0.36105811 -1.95286306]
 [ 2.34740965  0.96849691 -0.75938718  0.90219827 -0.46695317 -0.06068952
   0.78884434 -1.25666813  0.57585751  1.39897899]]

target_Y2:
(2, 1), float64
[[ 1.32229806]
 [-0.29969852]]

weight_W1:
(10, 3), float64
[[ 0.90291934 -1.62158273 -0.15818926]
 [ 0.44948393 -1.34360107 -0.08168759]
 [ 1.72473993  2.61815943  0.77736134]
 [ 0.8286332  -0.95898831 -1.20938829]
 [-1.41229201  0.54154683  0.7519394 ]
 [-0.65876032 -1.22867499  0.25755777]
 [ 0.31290292 -0.13081169  1.26998312]
 [-0.09296246 -0.06615089 -1.10821447]
 [ 0.13595685  1.34707776  0.06114402]
 [ 0.0709146   0.43365454  0.27748366]]

weight_W2:
(3, 1), float64
[[0.53025239]
 [0.53672097]
 [0.61835001]]



#### Initialize import parameters

**Neural Network Architecture**
- <font color='dodgerblue'>input_size</font>: how many **data points** are in each **feature** (i.e., each node) within the **input layer**
- <font color='dodgerblue'>hidden_size</font>: how many **data points** are in **each node** within the **hidden layer**
- <font color='dodgerblue'>output_size</font>: how many **data points** are in **each node** within the **output layer**

**Training Parameters**
- <font color='dodgerblue'>learning_rate</font>: **step size** for **gradient descent**
- <font color='dodgerblue'>num_epochs</font>: how many **training epochs** to **run** (instead of having a convergence cutoff criteria)

In [15]:
input_size = 10
hidden_size = 3
output_size = 1

learning_rate = 1e-3
num_epochs = 50

### Now Focus on PyTorch

##### Prepare data

- The <font color='dodgerblue'>NumPy-generated input</font> and initial data values need to be <font color='dodgerblue'>converted to torch arrays</font> using **`torch.from_numpy()`**.

- We can also <font color='dodgerblue'>improve upon</font> the original Numpy model by <font color='dodgerblue'>including biases</font>. These will be used in the <font color='dodgerblue'>linear transform</font> (e.g., **`torch.mm(input_X1, weight_W1) + bias_B1`**).

- Care must be given to specify that **`autograd`** should <font color='dodgerblue'>record operations</font> for the <font color='dodgerblue'>weights and biases</font> (i.e., calculation history), using **`requires_grad_(requires_grad=True)`**.
    - Reminder: <font color='dodgerblue'>only the weights and biases</font> need to be <font color='dodgerblue'>updated</font> based on the <font color='dodgerblue'>loss gradient</font>.

In [16]:
input_X1 = torch.from_numpy(input_X1_np)
target_Y2 = torch.from_numpy(target_Y2_np)

weight_W1 = torch.from_numpy(weight_W1_np).requires_grad_(requires_grad=True)
weight_W2 = torch.from_numpy(weight_W2_np).requires_grad_(requires_grad=True)

bias_B1 = torch.zeros(hidden_size, requires_grad=True)
bias_B2 = torch.zeros(output_size, requires_grad=True)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2,
               'weight_W1': weight_W1, 'input_B1': bias_B1,
               'weight_W2': weight_W2, 'input_B2': bias_B2}

print_array_specs(in_arrays=objects_ini)

input_X1:
torch.Size([2, 10]), torch.float64
tensor([[-1.4238,  1.2637, -0.8707, -0.2592, -0.0753, -0.7409, -1.3678,  0.6489,
          0.3611, -1.9529],
        [ 2.3474,  0.9685, -0.7594,  0.9022, -0.4670, -0.0607,  0.7888, -1.2567,
          0.5759,  1.3990]], dtype=torch.float64)

target_Y2:
torch.Size([2, 1]), torch.float64
tensor([[ 1.3223],
        [-0.2997]], dtype=torch.float64)

weight_W1:
torch.Size([10, 3]), torch.float64
tensor([[ 0.8021, -1.6216, -0.2190],
        [ 0.4079, -1.3436, -0.1068],
        [ 1.7574,  2.6182,  0.7970],
        [ 0.7899, -0.9590, -1.2327],
        [-1.3922,  0.5415,  0.7640],
        [-0.6562, -1.2287,  0.2591],
        [ 0.2790, -0.1308,  1.2496],
        [-0.0390, -0.0662, -1.0757],
        [ 0.1112,  1.3471,  0.0462],
        [ 0.0108,  0.4337,  0.2413]], dtype=torch.float64, requires_grad=True)

input_B1:
torch.Size([3]), torch.float32
tensor([0., 0., 0.], requires_grad=True)

weight_W2:
torch.Size([3, 1]), torch.float64
tensor([[0.1698],
   

#### Model Training

**Multiplying two matrices** (dot product/matrix multiplication):
- `torch.mm(mat1, mat2)`
    - https://pytorch.org/docs/stable/generated/torch.mm.html
    - <font color='dodgerblue'>2-D tensors</font> as inputs

- `torch.matmul(mat1, mat2)`
    - https://pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul
    - <font color='dodgerblue'>more versatile</font>: matrix x matrix, matrix x vector and vector x vector operations
        - (see `broadcasting` for more info: https://www.geeksforgeeks.org/understanding-broadcasting-in-pytorch/)

Both functions are equivalent below in ***this*** particular example. 

<br>

**Element-wise Multiplication** (e.g., <font color='dodgerblue'>multiplying a float</font> and a <font color='dodgerblue'>matrix</font>):
- `torch.mul(input, other)`
    - https://pytorch.org/docs/stable/generated/torch.mul.html
    - `input`: tensor
    - `other`: tensor or number

- Could also use `*`

Both functions are demonstrated below. 

<br>

**Further Explanations**
- `activation = torch.nn.ReLU()`: specify a <font color='dodgerblue'>**callable**</font> for the <font color='dodgerblue'>ReLU</font> activation function
    - https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

<br>

- `torch.autograd.backward` computes the **gradient** (<font color='dodgerblue'>backward pass</font>) in the entire neural network for objects that have **`requires_grad=True`**
    - https://www.geeksforgeeks.org/python-pytorch-backward-function/

<br>

- `with torch.no_grad()`: required because the weights and biases require grad
    - https://pytorch.org/docs/stable/generated/torch.no_grad.html
    - <font color='dodgerblue'>Reduce memory consumption</font> for computations versus those that `requires_grad=True` 
    - If you tried to assign `weight_W1`, `bias_B1`, `weight_W2` and `bias_B2` without this `with torch.no_grad()` you would obtain the following error:
        - `RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.`

<br>

- `torch.Tensor.zero_`: fills a given tensor with zeros
    - https://pytorch.org/docs/stable/generated/torch.Tensor.zero_.html
    - If this was **not done**, the gradients <font color='dodgerblue'>would be accumulated</font> during `.backwards()`, which would not be correct for each <font color='dodgerblue'>forward pass evaluation</font>
    - The **`_`** indicates an **`inplace`** operation (like what we know from Pandas)

In [14]:
for epoch in range(num_epochs):
    # Forward pass
    X2 = torch.mm(input_X1, weight_W1) + bias_B1
    
    # activation = torch.nn.LeakyReLU(0.1)
    activation = torch.nn.ReLU()
    Y1 = activation(X2)
    
    output_Y2 = torch.matmul(Y1, weight_W2) + bias_B2

    loss = torch.mean(torch.square(torch.subtract(output_Y2, target_Y2))) # mean( (Y2-y_target)^2 )

    # Backward pass
    loss.backward()

    # Optimization: update weights and biases
    with torch.no_grad():
        weight_W1 -= torch.mul(learning_rate, weight_W1.grad)
        bias_B1 -= torch.mul(learning_rate, bias_B1.grad)
        weight_W2 -= learning_rate * weight_W2.grad
        bias_B2 -= learning_rate * bias_B2.grad

        # Reset the gradients to zero
        weight_W1.grad.zero_()
        bias_B1.grad.zero_()
        weight_W2.grad.zero_()
        bias_B2.grad.zero_()

    # print(weight_W1.grad) # visual proof that they are zero

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

objects_ini = {'weight_W1': weight_W1, 'bias_B1': bias_B1,
               'weight_W2': weight_W2, 'bias_B2': bias_B2}
print()
print_array_specs(in_arrays=objects_ini)

Epoch 1: Loss = 1.550
Epoch 2: Loss = 1.530
Epoch 3: Loss = 1.512
Epoch 4: Loss = 1.499
Epoch 5: Loss = 1.488
Epoch 6: Loss = 1.477
Epoch 7: Loss = 1.466
Epoch 8: Loss = 1.455
Epoch 9: Loss = 1.445
Epoch 10: Loss = 1.435
Epoch 11: Loss = 1.425
Epoch 12: Loss = 1.415
Epoch 13: Loss = 1.405
Epoch 14: Loss = 1.396
Epoch 15: Loss = 1.387
Epoch 16: Loss = 1.378
Epoch 17: Loss = 1.369
Epoch 18: Loss = 1.361
Epoch 19: Loss = 1.352
Epoch 20: Loss = 1.344
Epoch 21: Loss = 1.336
Epoch 22: Loss = 1.328
Epoch 23: Loss = 1.320
Epoch 24: Loss = 1.313
Epoch 25: Loss = 1.305
Epoch 26: Loss = 1.298
Epoch 27: Loss = 1.290
Epoch 28: Loss = 1.283
Epoch 29: Loss = 1.276
Epoch 30: Loss = 1.270
Epoch 31: Loss = 1.263
Epoch 32: Loss = 1.256
Epoch 33: Loss = 1.250
Epoch 34: Loss = 1.243
Epoch 35: Loss = 1.237
Epoch 36: Loss = 1.231
Epoch 37: Loss = 1.225
Epoch 38: Loss = 1.219
Epoch 39: Loss = 1.213
Epoch 40: Loss = 1.207
Epoch 41: Loss = 1.202
Epoch 42: Loss = 1.196
Epoch 43: Loss = 1.191
Epoch 44: Loss = 1.1

#### Summary of Basic Example:
- <font color='dodgerblue'>Tensor creation</font>: Using PyTorch's `from_numpy()` and `zeros()`
- <font color='dodgerblue'>Autograd</font>: Utilizing `requires_grad_()` for automatic differentiation
- Matrix operations: <font color='dodgerblue'>Matrix multiplication</font> (`torch.mm` and `torch.matmul`).
- <font color='dodgerblue'>Activation functions</font>: Implementing a **ReLU** activation function
- <font color='dodgerblue'>Gradients</font>: All computed in **one function call** of `backward()`
- <font color='dodgerblue'>Loss function</font>: Calculating **mean squared error loss**
- <font color='dodgerblue'>Optimization</font>: Performing **manual gradient descent**
- <font color='dodgerblue'>Reset</font> the weight and bias <font color='dodgerblue'>gradients</font>: PyTorch's `.grad.zero_()`

<hr style="border:2px solid gray"></hr>

## Advance Example

Create the same neural network, but now make it even better (readable, K.I.S.S., reusable) using PyTorch:

- uses `torch.nn`: **modules/functions** for **building** a **neural networks**
    - https://pytorch.org/docs/stable/nn.html

<br>

- Uses a class
    - the NN is defined as a subclass of **`nn.Module`**: the <font color='dodgerblue'>base class</font> for all <font color='dodgerblue'>neural network modules</font>
        - https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
        - Enables **easier organization** and **management** of **layers** and **parameters**
    - classes are basically a <font color='dodgerblue'>blueprint</font> that can be <font color='dodgerblue'>reused</font>
        - contains a collection of related functions
        - **Personal Opinon**: they are **often unnecessary** - must have a good reason to implement

<br>

- `torch.nn.Linear`: applies a <font color='dodgerblue'>linear transformation</font> to the <font color='dodgerblue'>incoming data</font>
    - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear
    - below, `fc1` and `fc2` represent **"fully connected"** <font color='dodgerblue'>layers</font>
    - **weights** and **biases** are <font color='dodgerblue'>**automatically initialized**</font>

<br>

- `torch.nn.ReLU`: **ReLU** activation function
    -  https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

<br>

- use a **built-in optimizer**

#### Define the neural network

In [8]:
class Net(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()

        self.fc1 = torch.nn.Linear(input_size, hidden_size)
        self.fc2 = torch.nn.Linear(hidden_size, output_size)
        self.ReLU = torch.nn.ReLU()



    def forward(self, x):
        x = self.fc1(x)
        x = self.ReLU(x)
        x = self.fc2(x)

        return x

#### Revisiting the toy data
Some of PyTorch's functions require the numbers to be `float32` (GPUs are optimized for these). Our above `input_X1` and `input_Y2` tensors have numbers that are `float64`.
- `to(torch.float32)`: changes the tensor's `dtype`

In [9]:
# Prepare data
input_X1 = input_X1.to(torch.float32)
target_Y2 = target_Y2.to(torch.float32)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2}

print_array_specs(in_arrays=objects_ini)

input_X1:
torch.Size([2, 10]), torch.float32
tensor([[-1.4238,  1.2637, -0.8707, -0.2592, -0.0753, -0.7409, -1.3678,  0.6489,
          0.3611, -1.9529],
        [ 2.3474,  0.9685, -0.7594,  0.9022, -0.4670, -0.0607,  0.7888, -1.2567,
          0.5759,  1.3990]])

target_Y2:
torch.Size([2, 1]), torch.float32
tensor([[ 1.3223],
        [-0.2997]])



#### Model, Loss and Optimizer
- create the NN model

<br>

- define the **loss function** to use
    - `torch.nn.MSELoss`: mean squared error (a.k.a., Loss2; L2)
        - https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss

<br>

- define the **optimizing function** (i.e., `optim.SGD`) for adjusting the weights and biases
    - Optimization overview: https://pytorch.org/docs/stable/optim.html#module-torch.optim
    - Available algorithms: https://pytorch.org/docs/stable/optim.html#algorithms
        - **gradient decent**: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
     
##### Coding concept: assigning functions to variables
For example: `loss_function = torch.nn.MSELoss()`

Why do this?
- Changing code's behavior: reassign the variable to a different function (e.g., explore different ideas)
- Abstraction: abstract away the specific implementation details
    - more readable
    - more modular
    - easier to maintain

In [10]:
model = Net(input_size, hidden_size, output_size)

loss_function = torch.nn.MSELoss()

optimizer = optim.SGD(model.parameters(), lr=learning_rate)

### Model Training

- `zero_grad()`: reset the gradients of all optimized tensors
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
    - this is the same concept as above when we used `torch.Tensor.zero` in the basic example
        - this is necessary since `.backward()` accumulates the gradients each time it is called
 
- `torch.optim.Optimizer.step`: perform an **optimization step** based on the **current gradients** (stored in `.grad`), which is coming from **`.backward()`** 
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html

In [11]:
for epoch in range(num_epochs):
    # Forward pass
    output_Y2 = model(input_X1)

    loss = loss_function(output_Y2, target_Y2)

    # Backward pass
    optimizer.zero_grad()

    loss.backward()

    # Optimization: update weights and biases
    optimizer.step()

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

# Final outputs, weights and biases
print(f'\nFinal Output: \n {output_Y2}\n')
objects_ini = model.state_dict()

print_array_specs(in_arrays=objects_ini)

Epoch 1: Loss = 0.857
Epoch 2: Loss = 0.842
Epoch 3: Loss = 0.828
Epoch 4: Loss = 0.814
Epoch 5: Loss = 0.801
Epoch 6: Loss = 0.787
Epoch 7: Loss = 0.774
Epoch 8: Loss = 0.761
Epoch 9: Loss = 0.749
Epoch 10: Loss = 0.737
Epoch 11: Loss = 0.724
Epoch 12: Loss = 0.713
Epoch 13: Loss = 0.701
Epoch 14: Loss = 0.690
Epoch 15: Loss = 0.679
Epoch 16: Loss = 0.668
Epoch 17: Loss = 0.657
Epoch 18: Loss = 0.646
Epoch 19: Loss = 0.636
Epoch 20: Loss = 0.626
Epoch 21: Loss = 0.616
Epoch 22: Loss = 0.606
Epoch 23: Loss = 0.597
Epoch 24: Loss = 0.588
Epoch 25: Loss = 0.578
Epoch 26: Loss = 0.569
Epoch 27: Loss = 0.561
Epoch 28: Loss = 0.552
Epoch 29: Loss = 0.543
Epoch 30: Loss = 0.535
Epoch 31: Loss = 0.527
Epoch 32: Loss = 0.519
Epoch 33: Loss = 0.511
Epoch 34: Loss = 0.503
Epoch 35: Loss = 0.496
Epoch 36: Loss = 0.488
Epoch 37: Loss = 0.481
Epoch 38: Loss = 0.474
Epoch 39: Loss = 0.467
Epoch 40: Loss = 0.460
Epoch 41: Loss = 0.453
Epoch 42: Loss = 0.447
Epoch 43: Loss = 0.440
Epoch 44: Loss = 0.4

Notice the shapes of the weights - they are not yet transposed as done in the above basic example.

#### Summary of Advance Example:
- A class (like a blueprint) and `nn.Module`: a structured PyTorch approach for defining a neural network (e.g., better organization and code reusability)
- Built-in Activation: `torch.nn.ReLU`
- Built-in Loss: `torch.nn.MSELoss` for mean squared loss (i.e., Loss2; L2)
- Built-in Optimizer: `optim.SGD` for gradient descent and usage of `.step()`

In [12]:
from torchmetrics.functional.regression import r2_score
from torcheval.metrics import R2Score
metric = R2Score()
metric.update(input, target)
metric.compute()

ModuleNotFoundError: No module named 'torchmetrics'