<div class="alert block alert-info alert">

# <center> Scientific Programming in Python

## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> PyTorch: Simple Neural Network Example

## <center>  with a Perceptron

<hr style="border:2px solid gray"></hr>

This lecture will parallel the perceptron example written using NumPy, allowing you to compare the approaches directly.

<br>

<center><img src="00_images/31_machine_learning/nn_perceptron_example.png" alt="nn_percepton" style="width: 1000px;"/></center>

<br>

The neural network (NN) will be written in two ways:
1. <font color='dodgerblue'>Basic</font> - to explicitly show all of the steps in a neural network training
2. <font color='dodgerblue'>Advance</font> - to show the typical way that PyTorch is implemented

<hr style="border:2px solid gray"></hr>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.optim as optim

Create a helper function that allows us to investigate the different arrays that are used below:

In [2]:
def print_array_specs(in_arrays: dict):
    ''' Helper function for nicely printing NumPy and
        PyTorch arrays.

        Print: shape, data type and values.
    '''
    for key, value in in_arrays.items():
        print(f'{key}:\n{value.shape}, {value.dtype}')
        print(f'{value}\n')

## Basic Example

#### Define the toy data (input values, target values and initial weights):

##### A reminder from the NumPy lecture

A random **seed** will be **explicitly set**, allowing for **reproducible results** (i.e., for teaching purposes). The first epoch data generated below should correspond to the numeric values given in the figure above.

The object naming will also be done to parallel the figure above.

Random Number Generator in NumPY:
- `np.random.default_rng`: https://numpy.org/doc/stable/reference/random/generator.html
- `numpy.random.Generator.normal`: https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html

<br>

**Important Note**: Normally with <font color='dodgerblue'>real-world data</font>, one often should <font color='dodgerblue'>normalize</font> (e.g., **transpose** the date to a range [0, 1]) the <font color='dodgerblue'>input data</font>. This helps the mathematics when different input features have **large magnitude differences** (e.g., 1.5 and 2.5e6).
- https://en.wikipedia.org/wiki/Normalization_(statistics)
- `sklearn.preprocessing.normalize`: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html

In this example, we don't need to worry about this due to how we generate the toy data.

In [3]:
rng = np.random.default_rng(seed=12345)

input_X1_np = rng.normal(size=(2, 10))
target_Y2_np = rng.normal(size=(2, 1))

weight_W1_np = rng.normal(size=(10, 3))
weight_W2_np = rng.normal(size=(3, 1))

Examine the different NumPy arrays:
- shapes (important for matrix multiplication)
- data types (need to be same types)
- values

In [4]:
objects_ini = {'input_X1': input_X1_np, 'target_Y2': target_Y2_np,
               'weight_W1': weight_W1_np, 'weight_W2': weight_W2_np}

print_array_specs(in_arrays=objects_ini)

input_X1:
(2, 10), float64
[[-1.42382504  1.26372846 -0.87066174 -0.25917323 -0.07534331 -0.74088465
  -1.3677927   0.6488928   0.36105811 -1.95286306]
 [ 2.34740965  0.96849691 -0.75938718  0.90219827 -0.46695317 -0.06068952
   0.78884434 -1.25666813  0.57585751  1.39897899]]

target_Y2:
(2, 1), float64
[[ 1.32229806]
 [-0.29969852]]

weight_W1:
(10, 3), float64
[[ 0.90291934 -1.62158273 -0.15818926]
 [ 0.44948393 -1.34360107 -0.08168759]
 [ 1.72473993  2.61815943  0.77736134]
 [ 0.8286332  -0.95898831 -1.20938829]
 [-1.41229201  0.54154683  0.7519394 ]
 [-0.65876032 -1.22867499  0.25755777]
 [ 0.31290292 -0.13081169  1.26998312]
 [-0.09296246 -0.06615089 -1.10821447]
 [ 0.13595685  1.34707776  0.06114402]
 [ 0.0709146   0.43365454  0.27748366]]

weight_W2:
(3, 1), float64
[[0.53025239]
 [0.53672097]
 [0.61835001]]



#### Initialize import parameters

**Neural Network Architecture**
- <font color='dodgerblue'>input_size</font>: how many **data points** are in each **feature** (i.e., each node) within the **input layer**
- <font color='dodgerblue'>hidden_size</font>: how many **data points** are in **each node** within the **hidden layer**
- <font color='dodgerblue'>output_size</font>: how many **data points** are in **each node** within the **output layer**

**Training Parameters**
- <font color='dodgerblue'>learning_rate</font>: **step size** for **gradient descent**
- <font color='dodgerblue'>num_epochs</font>: how many **training epochs** to **run** (instead of having a convergence cutoff criteria)

In [5]:
input_size = 10
hidden_size = 3
output_size = 2

learning_rate = 1e-3
num_epochs = 50

### Now Focus on PyTorch

##### Prepare data

- The <font color='dodgerblue'>NumPy-generated input</font> and initial data values need to be <font color='dodgerblue'>converted to torch arrays</font> using **`torch.from_numpy()`**.

- We can also <font color='dodgerblue'>improve upon</font> the original Numpy model by <font color='dodgerblue'>including biases</font>. These will be used in the <font color='dodgerblue'>linear transform</font> (e.g., **`torch.mm(input_X1, weight_W1) + bias_B1`**).

- Care must be given to specify that **`torch.autograd.backwards()`** (done below) should <font color='dodgerblue'>record operations</font> for the <font color='dodgerblue'>weights and biases</font> (i.e., **calculation history**), using **`requires_grad_(requires_grad=True)`**.
    - Reminder: <font color='dodgerblue'>only the weights and biases</font> need to be <font color='dodgerblue'>updated</font> based on the <font color='dodgerblue'>loss gradient</font>.

In [6]:
input_X1 = torch.from_numpy(input_X1_np)
target_Y2 = torch.from_numpy(target_Y2_np)

weight_W1 = torch.from_numpy(weight_W1_np).requires_grad_(requires_grad=True)
weight_W2 = torch.from_numpy(weight_W2_np).requires_grad_(requires_grad=True)

bias_B1 = torch.zeros(hidden_size, requires_grad=True)
bias_B2 = torch.zeros(output_size, requires_grad=True)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2,
               'weight_W1': weight_W1, 'input_B1': bias_B1,
               'weight_W2': weight_W2, 'input_B2': bias_B2}

print_array_specs(in_arrays=objects_ini)

input_X1:
torch.Size([2, 10]), torch.float64
tensor([[-1.4238,  1.2637, -0.8707, -0.2592, -0.0753, -0.7409, -1.3678,  0.6489,
          0.3611, -1.9529],
        [ 2.3474,  0.9685, -0.7594,  0.9022, -0.4670, -0.0607,  0.7888, -1.2567,
          0.5759,  1.3990]], dtype=torch.float64)

target_Y2:
torch.Size([2, 1]), torch.float64
tensor([[ 1.3223],
        [-0.2997]], dtype=torch.float64)

weight_W1:
torch.Size([10, 3]), torch.float64
tensor([[ 0.9029, -1.6216, -0.1582],
        [ 0.4495, -1.3436, -0.0817],
        [ 1.7247,  2.6182,  0.7774],
        [ 0.8286, -0.9590, -1.2094],
        [-1.4123,  0.5415,  0.7519],
        [-0.6588, -1.2287,  0.2576],
        [ 0.3129, -0.1308,  1.2700],
        [-0.0930, -0.0662, -1.1082],
        [ 0.1360,  1.3471,  0.0611],
        [ 0.0709,  0.4337,  0.2775]], dtype=torch.float64, requires_grad=True)

input_B1:
torch.Size([3]), torch.float32
tensor([0., 0., 0.], requires_grad=True)

weight_W2:
torch.Size([3, 1]), torch.float64
tensor([[0.5303],
   

#### Model Training

**Multiplying two matrices** (dot product/matrix multiplication):
- `torch.mm(mat1, mat2)`
    - https://pytorch.org/docs/stable/generated/torch.mm.html
    - <font color='dodgerblue'>2-D tensors</font> as inputs

- `torch.matmul(mat1, mat2)`
    - https://pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul
    - <font color='dodgerblue'>more versatile</font>: matrix x matrix, matrix x vector and vector x vector operations
        - (see `broadcasting` for more info: https://www.geeksforgeeks.org/understanding-broadcasting-in-pytorch/)

Both functions are equivalent below in ***this*** particular example. 

<br>

**Element-wise Multiplication** (e.g., <font color='dodgerblue'>multiplying a float</font> and a <font color='dodgerblue'>matrix</font>):
- `torch.mul(input, other)`
    - https://pytorch.org/docs/stable/generated/torch.mul.html
    - `input`: tensor
    - `other`: tensor or number

- Could also use `*`

Both functions are demonstrated below. 

<br>

**Further Explanations**
- `activation = torch.nn.ReLU()`: specify a <font color='dodgerblue'>**callable**</font> for the <font color='dodgerblue'>ReLU</font> activation function
    - https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

<br>

- `torch.autograd.backward` (i.e., `loss.backward()`):
    - https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html
    - https://www.geeksforgeeks.org/python-pytorch-backward-function/
    - a major <font color='dodgerblue'>**workhorse**</font> in PyTorch
    - computes the **gradient** (<font color='dodgerblue'>in the backward pass</font>) in the **entire neural network** for objects that have **`requires_grad=True`**

<br>

- `with torch.no_grad()`: required because the weights and biases require grad
    - https://pytorch.org/docs/stable/generated/torch.no_grad.html
    - <font color='dodgerblue'>Reduce memory consumption</font> for computations versus those that `requires_grad=True` 
    - If you tried to assign `weight_W1`, `bias_B1`, `weight_W2` and `bias_B2` without this `with torch.no_grad()` you would obtain the following error:
        - `RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.`

<br>

- `torch.Tensor.zero_`: fills a given tensor with zeros
    - https://pytorch.org/docs/stable/generated/torch.Tensor.zero_.html
    - If this was **not done**, the gradients <font color='dodgerblue'>would be accumulated</font> during `.backwards()`, which would not be correct for each <font color='dodgerblue'>forward pass evaluation</font>
    - The **`_`** indicates an **`inplace`** operation (like what we know from Pandas)

In [7]:
for epoch in range(num_epochs):
    # Forward pass
    X2 = torch.mm(input_X1, weight_W1) + bias_B1
    
    # activation = torch.nn.LeakyReLU(0.1)
    activation = torch.nn.ReLU()
    Y1 = activation(X2)
    
    output_Y2 = torch.matmul(Y1, weight_W2) + bias_B2

    loss = torch.mean(torch.square(torch.subtract(output_Y2, target_Y2))) # mean( (Y2-y_target)^2 )

    # Backward pass
    loss.backward()

    # Optimization: update weights and biases
    with torch.no_grad():
        weight_W1 -= torch.mul(learning_rate, weight_W1.grad)
        bias_B1 -= torch.mul(learning_rate, bias_B1.grad)
        weight_W2 -= learning_rate * weight_W2.grad
        bias_B2 -= learning_rate * bias_B2.grad

        # Reset the gradients to zero
        weight_W1.grad.zero_()
        bias_B1.grad.zero_()
        weight_W2.grad.zero_()
        bias_B2.grad.zero_()

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

    # objects_ini = {'weight_W1': weight_W1, 'bias_B1': bias_B1,
    #            'weight_W2': weight_W2, 'bias_B2': bias_B2}
    # print()
    # print_array_specs(in_arrays=objects_ini)

Epoch 1: Loss = 3.320
Epoch 2: Loss = 3.225
Epoch 3: Loss = 3.135
Epoch 4: Loss = 3.049
Epoch 5: Loss = 2.968
Epoch 6: Loss = 2.890
Epoch 7: Loss = 2.817
Epoch 8: Loss = 2.746
Epoch 9: Loss = 2.679
Epoch 10: Loss = 2.615
Epoch 11: Loss = 2.554
Epoch 12: Loss = 2.495
Epoch 13: Loss = 2.439
Epoch 14: Loss = 2.385
Epoch 15: Loss = 2.334
Epoch 16: Loss = 2.285
Epoch 17: Loss = 2.238
Epoch 18: Loss = 2.193
Epoch 19: Loss = 2.149
Epoch 20: Loss = 2.107
Epoch 21: Loss = 2.067
Epoch 22: Loss = 2.032
Epoch 23: Loss = 2.007
Epoch 24: Loss = 1.982
Epoch 25: Loss = 1.958
Epoch 26: Loss = 1.935
Epoch 27: Loss = 1.912
Epoch 28: Loss = 1.890
Epoch 29: Loss = 1.869
Epoch 30: Loss = 1.848
Epoch 31: Loss = 1.828
Epoch 32: Loss = 1.809
Epoch 33: Loss = 1.790
Epoch 34: Loss = 1.771
Epoch 35: Loss = 1.754
Epoch 36: Loss = 1.736
Epoch 37: Loss = 1.719
Epoch 38: Loss = 1.703
Epoch 39: Loss = 1.687
Epoch 40: Loss = 1.671
Epoch 41: Loss = 1.656
Epoch 42: Loss = 1.641
Epoch 43: Loss = 1.627
Epoch 44: Loss = 1.6

In [8]:
## Adding arrays
display(input_X1_np)

weight_example = np.full((10), 10)

input_X1_np + weight_example

array([[-1.42382504,  1.26372846, -0.87066174, -0.25917323, -0.07534331,
        -0.74088465, -1.3677927 ,  0.6488928 ,  0.36105811, -1.95286306],
       [ 2.34740965,  0.96849691, -0.75938718,  0.90219827, -0.46695317,
        -0.06068952,  0.78884434, -1.25666813,  0.57585751,  1.39897899]])

array([[ 8.57617496, 11.26372846,  9.12933826,  9.74082677,  9.92465669,
         9.25911535,  8.6322073 , 10.6488928 , 10.36105811,  8.04713694],
       [12.34740965, 10.96849691,  9.24061282, 10.90219827,  9.53304683,
         9.93931048, 10.78884434,  8.74333187, 10.57585751, 11.39897899]])

#### Summary of Basic Example:
- <font color='dodgerblue'>Tensor creation</font>: Using PyTorch's `from_numpy()` and `zeros()`
- <font color='dodgerblue'>Autograd</font>: Utilizing `requires_grad_()` for automatic differentiation
- Matrix operations: <font color='dodgerblue'>Matrix multiplication</font> (`torch.mm` and `torch.matmul`).
- <font color='dodgerblue'>Activation functions</font>: Implementing a **ReLU** activation function
- <font color='dodgerblue'>Gradients</font>: All computed in **one function call** of `backward()`
- <font color='dodgerblue'>Loss function</font>: Calculating **mean squared error loss**
- <font color='dodgerblue'>Optimization</font>: Performing **manual gradient descent**
- <font color='dodgerblue'>Reset</font> the weight and bias <font color='dodgerblue'>gradients</font>: PyTorch's `.grad.zero_()`

<hr style="border:2px solid gray"></hr>

## Advance Example

Create the same neural network, but now make it even better (readable, K.I.S.S., reusable) using PyTorch:

- uses `torch.nn`: **modules/functions** for **building** a **neural networks**
    - https://pytorch.org/docs/stable/nn.html

<br>

- uses a class
    - the NN is defined as a subclass of **`nn.Module`**: the <font color='dodgerblue'>base class</font> for all <font color='dodgerblue'>neural network modules</font>
        - https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
        - Enables **easier organization** and **management** of **layers** and **parameters**
    - classes are basically a <font color='dodgerblue'>blueprint</font> that can be <font color='dodgerblue'>reused</font>
        - contains a collection of related functions
        - **Personal Opinon**: they are **often unnecessary** - must have a good reason to implement

<br>

- `torch.nn.Linear`: applies a <font color='dodgerblue'>linear transformation</font> to the <font color='dodgerblue'>incoming data</font>
    - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear
    - below, `fc1` and `fc2` represent **"<font color='dodgerblue'>f</font>ully <font color='dodgerblue'>c</font>onnected"** <font color='dodgerblue'>layers</font> <font color='dodgerblue'>**1**</font> and <font color='dodgerblue'>**2**</font>
    - **weights** and **biases** are <font color='dodgerblue'>**automatically initialized**</font>

<br>

- `torch.nn.ReLU`: **ReLU** activation function
    -  https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

<br>

- use a **built-in optimizer**

#### Define the neural network

In [9]:
class Net(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()

        self.fc1 = torch.nn.Linear(input_size, hidden_size)
        self.fc2 = torch.nn.Linear(hidden_size, output_size)
        self.ReLU = torch.nn.ReLU()


    def forward(self, x):
        x = self.fc1(x)
        x = self.ReLU(x)
        x = self.fc2(x)

        return x

#### Revisiting the toy data
Some of PyTorch's functions require the numbers to be **`float32`** (GPUs are optimized for these). Our above **`input_X1`** and **`input_Y2`** tensors have numbers that are **`float64`**.

- `to(torch.float32)`: changes the tensor item's **type** (`dtype`)

Alter the existing data type:

In [10]:
input_X1 = input_X1.to(torch.float32)
target_Y2 = target_Y2.to(torch.float32)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2}
print_array_specs(in_arrays=objects_ini)

input_X1:
torch.Size([2, 10]), torch.float32
tensor([[-1.4238,  1.2637, -0.8707, -0.2592, -0.0753, -0.7409, -1.3678,  0.6489,
          0.3611, -1.9529],
        [ 2.3474,  0.9685, -0.7594,  0.9022, -0.4670, -0.0607,  0.7888, -1.2567,
          0.5759,  1.3990]])

target_Y2:
torch.Size([2, 1]), torch.float32
tensor([[ 1.3223],
        [-0.2997]])



#### Model, Loss and Optimizer
- create the <font color='dodgerblue'>NN model</font>


<br>

- define the **optimizing function** (i.e., `optim.SGD`) for adjusting the **weights** and **biases**
    - Optimization overview: https://pytorch.org/docs/stable/optim.html#module-torch.optim
    - **Available algorithms**: https://pytorch.org/docs/stable/optim.html#algorithms
        - **gradient decent**: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
     
<br>

- define the **loss function** to use
    - `torch.nn.MSELoss`: <font color='dodgerblue'>mean squared error</font> (a.k.a., Loss2; L2)
        - https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss

<hr style="border:2px solid gray"></hr>

##### Sidenote: Coding concept concerning the assignment of a function to a variable/object

<br>

For example: `loss_function = torch.nn.MSELoss()` that is given in the next code cell

<br>

Why do this?

- Quickly and easily change an overall code's behavior: **reassign** the **variable** to a **different function**

    - <font color='dodgerblue'>explore different ideas</font>

- **Abstraction**: abstract away the specific implementation details
    - Idea: <font color='dodgerblue'>**Focus** on the **what**, **not** the **how**</font>
        - more <font color='dodgerblue'>readable</font>
        - easier to understand **concepts** (e.g., the <font color='dodgerblue'>science</font>) - don't get lost in the details
        - easier to <font color='dodgerblue'>maintain</font>

    - Related terms:
        - <font color='dodgerblue'>encapsulation</font>: **grouping data** (information) and the **methods** (functions) that are **related** within a single unit (e.g. a class)
        - <font color='dodgerblue'>modularity/decomposition</font>: **breaking down** a **large program** into **smaller**, **independent** components (e.g., **functions**)

<hr style="border:2px solid gray"></hr>

In [11]:
model = Net(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

optimizer = optim.SGD(params=model.parameters(), lr=learning_rate)

loss_function = torch.nn.MSELoss()

### Model Training

- `zero_grad()`: **set/reset** the **gradients** of all **optimized tensors** (i.e, for the **weights** and **biases**)
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
    - this is the same concept as above when we used `torch.Tensor.zero` in the basic example
        - this is necessary since <font color='dodgerblue'>`.backward()` accumulates the gradients</font> **each time** it is **called**

<br>

- `torch.optim.Optimizer.step`: perform an **optimization step** based on the **current gradients** (stored in `.grad`), which is coming from **`.backward()`** 
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html

In [12]:
for epoch in range(num_epochs):
    # Forward pass
    output_Y2 = model(input_X1)

    loss = loss_function(output_Y2, target_Y2)

    # Backward pass
    optimizer.zero_grad()

    loss.backward()

    # Optimization: update weights and biases
    optimizer.step()
    
    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

Epoch 1: Loss = 0.805
Epoch 2: Loss = 0.802
Epoch 3: Loss = 0.799
Epoch 4: Loss = 0.797
Epoch 5: Loss = 0.794
Epoch 6: Loss = 0.792
Epoch 7: Loss = 0.789
Epoch 8: Loss = 0.787
Epoch 9: Loss = 0.784
Epoch 10: Loss = 0.782
Epoch 11: Loss = 0.779
Epoch 12: Loss = 0.777
Epoch 13: Loss = 0.774
Epoch 14: Loss = 0.772
Epoch 15: Loss = 0.769
Epoch 16: Loss = 0.767
Epoch 17: Loss = 0.764
Epoch 18: Loss = 0.762
Epoch 19: Loss = 0.759
Epoch 20: Loss = 0.757
Epoch 21: Loss = 0.754
Epoch 22: Loss = 0.752
Epoch 23: Loss = 0.749
Epoch 24: Loss = 0.747
Epoch 25: Loss = 0.744
Epoch 26: Loss = 0.742
Epoch 27: Loss = 0.739
Epoch 28: Loss = 0.737
Epoch 29: Loss = 0.735
Epoch 30: Loss = 0.732
Epoch 31: Loss = 0.730
Epoch 32: Loss = 0.727
Epoch 33: Loss = 0.725
Epoch 34: Loss = 0.722
Epoch 35: Loss = 0.720
Epoch 36: Loss = 0.718
Epoch 37: Loss = 0.715
Epoch 38: Loss = 0.713
Epoch 39: Loss = 0.710
Epoch 40: Loss = 0.708
Epoch 41: Loss = 0.706
Epoch 42: Loss = 0.703
Epoch 43: Loss = 0.701
Epoch 44: Loss = 0.6

  return F.mse_loss(input, target, reduction=self.reduction)


In [13]:
model.state_dict()

OrderedDict([('fc1.weight',
              tensor([[ 0.1087, -0.0106, -0.1440,  0.0698, -0.3013, -0.2559,  0.0785,  0.1367,
                       -0.0565, -0.1980],
                      [ 0.2036,  0.2056, -0.2069, -0.2334, -0.1308, -0.2825,  0.2414, -0.0818,
                       -0.1652,  0.1382],
                      [-0.0404, -0.1710,  0.0843, -0.1609, -0.0976, -0.0028, -0.1329, -0.0250,
                       -0.0731,  0.2081]])),
             ('fc1.bias', tensor([-0.0226,  0.0600,  0.2099])),
             ('fc2.weight',
              tensor([[ 0.3153, -0.5392,  0.0384],
                      [ 0.2865, -0.1018,  0.4779]])),
             ('fc2.bias', tensor([-0.0148,  0.1339]))])

In [14]:
# Final outputs, weights and biases
print(f'\nFinal Output: \n {output_Y2}\n')
objects_ini = model.state_dict()

print_array_specs(in_arrays=objects_ini)


Final Output: 
 tensor([[ 0.1333,  0.2687],
        [-0.5959,  0.0544]], grad_fn=<AddmmBackward0>)

fc1.weight:
torch.Size([3, 10]), torch.float32
tensor([[ 0.1087, -0.0106, -0.1440,  0.0698, -0.3013, -0.2559,  0.0785,  0.1367,
         -0.0565, -0.1980],
        [ 0.2036,  0.2056, -0.2069, -0.2334, -0.1308, -0.2825,  0.2414, -0.0818,
         -0.1652,  0.1382],
        [-0.0404, -0.1710,  0.0843, -0.1609, -0.0976, -0.0028, -0.1329, -0.0250,
         -0.0731,  0.2081]])

fc1.bias:
torch.Size([3]), torch.float32
tensor([-0.0226,  0.0600,  0.2099])

fc2.weight:
torch.Size([2, 3]), torch.float32
tensor([[ 0.3153, -0.5392,  0.0384],
        [ 0.2865, -0.1018,  0.4779]])

fc2.bias:
torch.Size([2]), torch.float32
tensor([-0.0148,  0.1339])



### Creating a customized activation function

In [15]:
class modified_relu(torch.nn.Module):
    ''' Modified ReLU activation function.

        Class that implements a modified ReLU function that adds
        1.0 to the input. 
        
        Attributes:
            input (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: modified ReLU activation output
    ''' 
    def forward(self, input: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of the modified_relu activation function.

        Args:
            input (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output tensor after applying the modified
            ReLU activation.
        """
        if not isinstance(input, torch.Tensor):
            raise TypeError("Input must be a torch.Tensor") 
        else:
            mod_relu = torch.maximum(input+1.0, torch.zeros_like(input))
    
            ## Comment out to see how mod_relu operates on the input
            # print(input, "\n", mod_relu)
            return mod_relu

In [16]:
class modified_nn(torch.nn.Module):
    """ Modified Neural Network.

        This class defines a simple feedforward neural network with 
        one hidden layer and uses the ModifiedReLU activation function.

        Attributes:
            input_size (int): Size of the input layer.
            hidden_size (int): Size of the hidden layer.
            output_size (int): Size of the output layer.
    """
    def __init__(self, input_size, hidden_size, output_size):
        super(modified_nn, self).__init__()
        
        if not all(isinstance(param, int) for param in [input_size, hidden_size, output_size]):
            raise TypeError("All input parameters must be an integer")
        else:
            self.fc1 = torch.nn.Linear(input_size, hidden_size)
            self.fc2 = torch.nn.Linear(hidden_size, output_size)
            self.modified_relu = modified_relu()


    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """ Forward pass of the ModifiedNN.

            Args:
                x: Input data tensor (i.e., features)

            Returns:
                torch.Tensor: Output data tensor after neural network forward pass
        """
        if not isinstance(x, torch.Tensor):
            raise TypeError("Input must be a torch.Tensor")
        else:
            x = self.fc1(x)
            x = self.modified_relu(x)
            x = self.fc2(x)

            return x

1. Create a new model using the modified_nn architecture
2. pass it to the gradient descent optimizer

In [17]:
new_model = modified_nn(input_size=input_size, hidden_size=hidden_size, output_size=output_size)
optimizer = optim.SGD(params=new_model.parameters(), lr=learning_rate)

In [18]:
for epoch in range(num_epochs):
    output_Y2 = new_model(input_X1)

    loss = loss_function(output_Y2, target_Y2)
    
    optimizer.zero_grad()

    loss.backward()

    optimizer.step()
    
    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')
    
print(f'\nFinal Output: \n {output_Y2}\n')

objects_ini = model.state_dict()
print_array_specs(in_arrays=objects_ini)

Epoch 1: Loss = 1.706
Epoch 2: Loss = 1.679
Epoch 3: Loss = 1.652
Epoch 4: Loss = 1.626
Epoch 5: Loss = 1.600
Epoch 6: Loss = 1.575
Epoch 7: Loss = 1.550
Epoch 8: Loss = 1.526
Epoch 9: Loss = 1.502
Epoch 10: Loss = 1.479
Epoch 11: Loss = 1.456
Epoch 12: Loss = 1.433
Epoch 13: Loss = 1.410
Epoch 14: Loss = 1.388
Epoch 15: Loss = 1.367
Epoch 16: Loss = 1.346
Epoch 17: Loss = 1.325
Epoch 18: Loss = 1.304
Epoch 19: Loss = 1.284
Epoch 20: Loss = 1.264
Epoch 21: Loss = 1.245
Epoch 22: Loss = 1.225
Epoch 23: Loss = 1.206
Epoch 24: Loss = 1.188
Epoch 25: Loss = 1.169
Epoch 26: Loss = 1.151
Epoch 27: Loss = 1.134
Epoch 28: Loss = 1.116
Epoch 29: Loss = 1.099
Epoch 30: Loss = 1.082
Epoch 31: Loss = 1.065
Epoch 32: Loss = 1.049
Epoch 33: Loss = 1.033
Epoch 34: Loss = 1.017
Epoch 35: Loss = 1.001
Epoch 36: Loss = 0.986
Epoch 37: Loss = 0.971
Epoch 38: Loss = 0.956
Epoch 39: Loss = 0.941
Epoch 40: Loss = 0.927
Epoch 41: Loss = 0.912
Epoch 42: Loss = 0.898
Epoch 43: Loss = 0.884
Epoch 44: Loss = 0.8

Notice the shapes of the weights - they are not yet transposed as done in the above basic example.

Once a neural network model has been trained, it is ready to be used (i.e., to make predictions).

Reusability: how can others use the trained model? What is required is:
1. the neural network architecture
    - number of layers and number of nodes in each layer
    - how the nodes are connected
    - the activation functions and their placement within the network
3. the optimized parameters
    - the optimized weights
    - the optimized biases
4. other parameters (i.e., called hyperparameters)
    - learning rate
    - optimization cutoff thresholds or maximum number of epochs

   
<hr style="border:2px solid gray"></hr>

#### Summary of Advance Example:
- A <font color='dodgerblue'>class</font> (like a blueprint) and <font color='dodgerblue'>`nn.Module`</font>: a structured PyTorch approach for **defining a neural network**
    - e.g., architecture, activation functions
    - allows for easy/better organization and code reusability
- Built-in <font color='dodgerblue'>Activation</font>: `torch.nn.ReLU`
- Built-in <font color='dodgerblue'>Loss</font>: `torch.nn.MSELoss` for mean squared loss (i.e., Loss2; L2)
- All <font color='dodgerblue'>gradients</font> needed in backward propagation done using `autograd.backwards()`
- Built-in <font color='dodgerblue'>Optimizer</font>: `optim.SGD` for gradient descent and usage of `.step()`

In [19]:
from torchmetrics.functional.regression import r2_score
from torcheval.metrics import R2Score
metric = R2Score()
metric.update(input, target)
metric.compute()

ModuleNotFoundError: No module named 'torchmetrics'