<div class="alert block alert-info alert">

# <center> Scientific Programming in Python

## <center>Karl N. Kirschner<br>Bonn-Rhein-Sieg University of Applied Sciences<br>Sankt Augustin, Germany

# <center> PyTorch: Simple Neural Network Example

## <center>  with a Perceptron

<hr style="border:2px solid gray"></hr>

In [6]:
import matplotlib # for the version
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.optim as optim

In [7]:
print(f'matplotlib v.: {matplotlib.__version__}')
print(f'NumPy v.: {np.__version__}')
print(f'Torch v.: {torch.__version__}')

matplotlib v.: 3.9.2
NumPy v.: 1.24.3
Torch v.: 2.5.1


This lecture will parallel the perceptron example written using NumPy, allowing you to compare the approaches directly.

<br>
<center><img src="00_images/31_machine_learning/nn_perceptron_example_nodes.png" alt="nn_percepton" style="width: 500px;"/></center>

<center><img src="00_images/31_machine_learning/nn_perceptron_example.png" alt="nn_percepton" style="width: 1000px;"/></center>

<br>

#### Terminology for describing a neural network:
- <font color='dodgerblue'>**Width**</font>: number of nodes in a specific layer
- <font color='dodgerblue'>**Depth**</font>: number of layers in a neural network
- <font color='dodgerblue'>**Architecture**</font>: specific arrangement of the layers and nodes within the network, and their connectivity.

<br>

<hr style="border:2px solid gray"></hr>

#### Normalization

Normally with <font color='dodgerblue'>real-world data</font>, one often should <font color='dodgerblue'>normalize</font> (e.g., **transpose** the date to a range [0, 1]) or <font color='dodgerblue'>scale</font> the <font color='dodgerblue'>input data</font>. This helps the mathematics when different input features have **large magnitude differences** (e.g., 1.5 and 2.5e6).
- https://en.wikipedia.org/wiki/Normalization_(statistics)
- `sklearn.preprocessing.normalize`: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.normalize.html

The goal is to ensure that no single feature dominates the others due to its magnitude.

**Normalizing** transforms the data to a standard scale, typically between 0 and 1.
- adjust each feature's values based on their minimum and maximum values.
- mathematically, there are multiple approaches for this
    - <font color='dodgerblue'>Minimum-Maximum</font> (a.k.a. rescaling): $x' = \frac{x − x_{min}}{x_{max} − x_{min}}$
    - <font color='dodgerblue'>Absolute Maximum</font>: $x' = \frac{x}{∣x_{max}∣}$
    - <font color='dodgerblue'>Mean</font>: $x' = \frac{x − \bar{x}}{x_{max} − x_{min}}$ centers the data about the mean, with a range from [-1, 1].
    - <font color='dodgerblue'>Z-score</font> (a.k.a Standardization): $x' = \frac{x − \bar{x}}{\sigma}$ ($\sigma$ is the standard deviation) good for when original data follows a normal distribution 
    - <font color='dodgerblue'>Log</font>: used to reduce the effects of extreme values

<br>

**Sources**:
- https://www.geeksforgeeks.org/normalization-and-scaling/
- https://en.wikipedia.org/wiki/Feature_scaling

In [None]:
from sklearn import preprocessing

Generate some data that we can normalize/scale:

In [None]:
example_data = np.linspace(100.0, 140.0, 5)
print(example_data.shape)
example_data

This is a 1D array. However, `preprocessing.normalize` requires 2D arrays.

`reshape` the array:
- https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
    - `-1`: the value is inferred from the length of the array and remaining dimensions.

In [None]:
example_data = example_data.reshape(-1, 1)
print(example_data.shape)
example_data

In [None]:
example_data_norm = preprocessing.normalize(example_data, norm='max', axis=0)
print(example_data_norm.shape)
example_data_norm

We can now
- reproduce what `preprocessing.normalize` manually, and
- reverse the data back to the unnormalized values

In [None]:
print(example_data/np.max(example_data))
print()
print(example_data_norm*np.max(example_data))

<hr style="border:2px solid gray"></hr>

Create a helper function that allows us to investigate the different arrays that are used below:

In [None]:
def print_array_specs(in_arrays: dict):
    ''' Helper function for nicely printing NumPy and
        PyTorch arrays.

        Print: shape, data type and values.
    '''
    for key, value in in_arrays.items():
        print(f'{key}:\n{value.shape}, {value.dtype}')
        print(f'{value}\n')

## Basic PyTorch Example (showing some of the details)

#### Define the toy data (input values, target values and initial weights):

##### A reminder from the NumPy lecture

A random **seed** will be **explicitly set**, allowing for **reproducible results** (i.e., for teaching purposes). The first epoch data generated below should correspond to the numeric values given in the figure above.

The object naming will also be done to parallel the figure above.

Random Number Generator in NumPy:
- `np.random.default_rng`: https://numpy.org/doc/stable/reference/random/generator.html
- `numpy.random.Generator.normal`: https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html

<font color='dodgerblue'>Should the data be normalized?</font>

In this example, we don't need to worry about normalizing since we generate the toy data that has the same magnitude using NumPy's random number generator.

- `input_X1_np = rng.normal(size=(2, 10))`
    - `2`: **number** of <font color='dodgerblue'>**input**</font> data **samples** (e.g., 2 houses)
    - `10`: **number of features** (width) that describe each sample (e.g., number of rooms, size, etc.)

<br>

- `target_Y2_np = rng.normal(size=(2, 1))`
    - `2`: **number** of <font color='dodgerblue'>**output** data **samples**</font> (i.e., **every input sample needs an output**)
    - `1`: number of **predicted features** (width) (e.g., house price)

In [None]:
rng = np.random.default_rng(seed=12345)

input_X1_np = rng.normal(size=(2, 10))
target_Y2_np = rng.normal(size=(2, 1))

weight_W1_np = rng.normal(size=(10, 3))
weight_W2_np = rng.normal(size=(3, 1))

Examine the different NumPy arrays:
- shapes (important for matrix multiplication)
- data types (need to be same types)
- values

In [None]:
objects_ini = {'input_X1': input_X1_np, 'target_Y2': target_Y2_np,
               'weight_W1': weight_W1_np, 'weight_W2': weight_W2_np}

print_array_specs(in_arrays=objects_ini)

#### Initialize import parameters

**Neural Network Specification**
- <font color='dodgerblue'>input_width</font>: how many **features** (i.e., nodes) are in each **data sample**  within the **input layer**
    - 10 features that describe the input data 
- <font color='dodgerblue'>hidden_width</font>: how many **learned features** are within the **hidden layer**
    - 3 learned features
- <font color='dodgerblue'>output_width</font>: how many **features** are within the **output layer**
    - 1 feature that is predicted from the 10 input features

**Training Parameters**
- <font color='dodgerblue'>learning_rate</font>: **step size** for **gradient descent**
- <font color='dodgerblue'>num_epochs</font>: how many **training epochs** to **run** (instead of having a convergence cutoff criteria)

In [None]:
input_width = 10
hidden_width = 3
output_width = 1

learning_rate = 1e-3
num_epochs = 50

### Now Focus on PyTorch

##### Prepare data

- The <font color='dodgerblue'>NumPy-generated input</font> arrays need to be <font color='dodgerblue'>converted to torch tensors</font> using **`torch.from_numpy()`**.

- <font color='dodgerblue'>Including biases</font> - these will be used in the <font color='dodgerblue'>linear transform</font> (e.g., **`torch.matmul(input_X1, weight_W1) + bias_B1`**).

- Care must be given to specify that **`torch.autograd.backwards()`** should <font color='dodgerblue'>record the operations</font> for the <font color='dodgerblue'>weights and biases</font> (i.e., **calculation history**), using **`requires_grad_(requires_grad=True)`**.
    - Reminder: <font color='dodgerblue'>only the weights and biases</font> need to be <font color='dodgerblue'>updated</font>, which is done based on the <font color='dodgerblue'>loss gradient</font>.

In [None]:
input_X1 = torch.from_numpy(input_X1_np)
target_Y2 = torch.from_numpy(target_Y2_np)

weight_W1 = torch.from_numpy(weight_W1_np).requires_grad_(requires_grad=True)
weight_W2 = torch.from_numpy(weight_W2_np).requires_grad_(requires_grad=True)

bias_B1 = torch.zeros(hidden_width, requires_grad=True)
bias_B2 = torch.zeros(output_width, requires_grad=True)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2,
               'weight_W1': weight_W1, 'input_B1': bias_B1,
               'weight_W2': weight_W2, 'input_B2': bias_B2}

print_array_specs(in_arrays=objects_ini)

#### Model Training

**Multiplying two matrices** (dot product/matrix multiplication):
- `torch.matmul(mat1, mat2)`
    - <font color='dodgerblue'>versatile</font>: $[matrix]\times[matrix]$, $[matrix]\times(vector)$, and $(vector)\times(vector)$ operations
        - (advance: see `broadcasting` for more info - https://www.geeksforgeeks.org/understanding-broadcasting-in-pytorch)
    - https://pytorch.org/docs/stable/generated/torch.matmul.html#torch.matmul

<br>

**Element-wise Multiplication** (e.g., <font color='dodgerblue'>multiplying a float</font> and a <font color='dodgerblue'>matrix</font>):
- `torch.mul(input, other)`
    - `input`: tensor
    - `other`: tensor or number
    - https://pytorch.org/docs/stable/generated/torch.mul.html

- Could also use `*`

Both functions are demonstrated below. 

<br>

**Further Explanations**
- `activation = torch.nn.ReLU()`:
    - specify a <font color='dodgerblue'>**callable object**</font> (i.e., `activation`) for the <font color='dodgerblue'>ReLU</font> activation function
    - https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html

<br>

- `torch.autograd.backward`:
    - a major <font color='dodgerblue'>**workhorse**</font> in PyTorch
    - computes the **gradient** (<font color='dodgerblue'>during the backward pass</font>) in the **entire neural network** for objects that have **`requires_grad=True`**
    - https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html
    - https://www.geeksforgeeks.org/python-pytorch-backward-function

<br>

- `with torch.no_grad()`:
    - required because the weights and biases require grad
    - <font color='dodgerblue'>Reduce memory consumption</font> for computations versus those that have `requires_grad=True` 
    - If you tried to assign `weight_W1`, `bias_B1`, `weight_W2` and `bias_B2` without using `with torch.no_grad()`, the following error would occur:
        - `RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.`
    - https://pytorch.org/docs/stable/generated/torch.no_grad.html

<br>

- `torch.grad.zero_()`:
    - fills a tensor with zeros
    - If this was **not done**, the gradients <font color='dodgerblue'>would be accumulated</font> during `.backwards()`
        - <font color='dodgerblue'>Why would the be incorrect?</font> The **previous** iterations' gradients would be **added** to the **current** computed **gradients**.
    - The **`_`** indicates an **`inplace`** operation (like what we know from Pandas)
    - https://pytorch.org/docs/stable/generated/torch.Tensor.zero_.html

In [None]:
for epoch in range(num_epochs):
    # Forward pass
    X2 = torch.matmul(input_X1, weight_W1) + bias_B1

    activation = torch.nn.ReLU()
    Y1 = activation(X2)

    output_Y2 = torch.matmul(Y1, weight_W2) + bias_B2

    loss = torch.mean(torch.square(torch.subtract(output_Y2, target_Y2))) # mean( (Y2 - y_target)^2 )

    # Backward pass
    loss.backward()

    # Optimization: update weights and biases, don't record operations
    with torch.no_grad():
        weight_W1 -= torch.mul(learning_rate, weight_W1.grad)
        bias_B1 -= torch.mul(learning_rate, bias_B1.grad)
        weight_W2 -= learning_rate * weight_W2.grad
        bias_B2 -= learning_rate * bias_B2.grad

        # Reset the gradients to zero
        weight_W1.grad.zero_()
        bias_B1.grad.zero_()
        weight_W2.grad.zero_()
        bias_B2.grad.zero_()

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

    # objects_ini = {'weight_W1': weight_W1, 'bias_B1': bias_B1,
    #            'weight_W2': weight_W2, 'bias_B2': bias_B2}
    # print()
    # print_array_specs(in_arrays=objects_ini)

#### Summary of Basic Example:
- <font color='dodgerblue'>Tensor creation</font>: Using PyTorch's `from_numpy()` and `zeros()`
- <font color='dodgerblue'>backward (autograd)</font>: Will know what differentiation to compute based on objects with `requires_grad_()`
- Matrix operations:
    - <font color='dodgerblue'>Matrix multiplication</font> - (`torch.matmul`)
    - <font color='dodgerblue'>Element-wise Multiplication</font> (e.g., multiplying a float and a matrix) - (`torch.mul`)
- <font color='dodgerblue'>Activation functions</font>: Implementing a **ReLU** activation function
- <font color='dodgerblue'>Gradients</font>: **All** computed in **one function call** of `backward()`
- <font color='dodgerblue'>Loss function</font>: Calculating **mean squared error loss** (manually encoded)
- <font color='dodgerblue'>Optimization</font>: Performing **manual gradient descent** (manually encoded)
- <font color='dodgerblue'>Reset</font> the weight and bias <font color='dodgerblue'>gradients</font>: PyTorch's `.grad.zero_()`

<hr style="border:2px solid gray"></hr>

## Advance Example (How it is actually done)

Create the same neural network, but now make it even better (readable, K.I.S.S., reusable) using PyTorch:

**Note**: Because of the Numpy neural network **lecture** and the **above example**, we can **understand** what is happening **"under-the-hood"** in the following.

- uses `torch.nn`: **modules/functions** for **building** a **neural networks**
    - https://pytorch.org/docs/stable/nn.html

<br>

- uses a <font color='dodgerblue'>**class**</font>
    - the NN is defined as a subclass of **`nn.Module`**: the <font color='dodgerblue'>base class</font> for all <font color='dodgerblue'>neural network modules</font>
        - https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module
        - Enables **easier organization** and **management** of **layers** and **parameters**
    - <font color='dodgerblue'>**classes**</font> are basically a <font color='dodgerblue'>blueprint</font> that can be <font color='dodgerblue'>reused</font>
        - contains a collection of related functions
        - **Personal Opinon**: they are **often unnecessary** - must have a good reason to implement

<br>

- `torch.nn.Linear`: applies a <font color='dodgerblue'>linear transformation</font> to the <font color='dodgerblue'>incoming data</font>
    - https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear
    - below, `fc1` and `fc2` represent **"<font color='dodgerblue'>f</font>ully <font color='dodgerblue'>c</font>onnected"** <font color='dodgerblue'>layers</font> <font color='dodgerblue'>**1**</font> and <font color='dodgerblue'>**2**</font>
    - **weights** and **biases** are <font color='dodgerblue'>**automatically initialized**</font>

<br>

- `torch.nn.ReLU`: **ReLU** activation function
    -  https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU

<br>

- use a **built-in optimizer**

#### Define the neural network

##### <font color='dodgerblue'>Classes</font>

Coding Convention:
- **PascalCase** naming convention
- type hinting
- <font color='dodgerblue'>docstrings</font>
- isinstance
- **methods** separated by single blank lines
    - <font color='dodgerblue'>methods</font> are <font color='dodgerblue'>functions</font> that <font color='dodgerblue'>**belong to a class**</font>
    - versus **functions** (e.g., user-defined functions) that are **entirely independent**

Brief description:
- `__init__` ("Constructor"): initialize the attributes (variables) of the class's (e.g., `SimpleNN`) object
- `super(SimpleNN, self).__init__()`: calls the `__init__` method (for properly initializing `SimpleNN`) of the parent class (i.e., `torch.nn.Module`).
    - `super()` is a Python built-in function: https://docs.python.org/3/library/functions.html#super
- `self.fc1` and `self.fc2`: create the fully linearly connected layers
- `self.ReLU = torch.nn.ReLU()`: creates an instance of the ReLU
- `def forward`: function for PyTorch's forward pass mechanic

In [None]:
class SimpleNN(torch.nn.Module):
    ''' This class defines a simple feedforward perceptron neural
        network with one input, one hidden and one output layer,
        making use of a ReLU activation function.

        Attributes:
            input_width (int): number of nodes in the input layer
            hidden_width (int): number of nodes in the hidden layer
            output_width (int): number of nodes in the output layer
    '''
    def __init__(self, input_width: int, hidden_width: int, output_width: int):
        ''' Initialize the attributes (i.e. variables).

            Defines fully connected layers 1 and 2's input and output width,
            and the activation function.
        '''
        if not all(isinstance(param, int) for param in [input_width, hidden_width, output_width]):
            raise TypeError("All input parameters must be an integer")
        else:
            super(SimpleNN, self).__init__()
    
            self.fc1 = torch.nn.Linear(input_width, hidden_width)
            self.fc2 = torch.nn.Linear(hidden_width, output_width)
            self.activate_function = torch.nn.ReLU()

    def forward(self, in_data: torch.Tensor) -> torch.Tensor:
        ''' Forward pass of the SimpleNN.

            Args:
                in_data: Input data tensor (i.e., feature data)

            Returns:
                forward_data: Output data tensor after neural network forward pass
        '''
        if not isinstance(in_data, torch.Tensor):
            raise TypeError("Input must be a torch.Tensor")
        else:
            forward_data = self.fc1(in_data)
            forward_data = self.activate_function(forward_data)
            forward_data = self.fc2(forward_data)

            return forward_data

#### Revisiting the toy data
Some of **PyTorch's functions require** the numbers to be **`float32`** (<font color='dodgerblue'>GPUs are optimized for these</font>).

Our above **`input_X1`** and **`input_Y2`** tensors have numbers that are **`float64`**.
- `to(torch.float32)`: changes the tensor item's **type** (i.e., `dtype`)

Alter the existing data type:

In [None]:
input_X1 = input_X1.to(torch.float32)
target_Y2 = target_Y2.to(torch.float32)

objects_ini = {'input_X1': input_X1, 'target_Y2': target_Y2}
print_array_specs(in_arrays=objects_ini)

#### Model, Loss and Optimizer
- create the <font color='dodgerblue'>NN model</font>

<br>

- define the **optimizing function** (i.e., `optim.SGD`) for adjusting the **weights** and **biases**
    - Optimization overview: https://pytorch.org/docs/stable/optim.html#module-torch.optim
    - **Available algorithms**: https://pytorch.org/docs/stable/optim.html#algorithms
        - **gradient decent**: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD
        - **adam**: https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam
     
<br>

- define the **loss function** to use
    - `torch.nn.MSELoss`: <font color='dodgerblue'>mean squared error</font> (a.k.a., Loss2; L2)
        - https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss

<hr style="border:2px solid gray"></hr>

##### Sidenote
<font color='dodgerblue'>**Coding concept**</font>: **assigning** a **function** to a **variable**

<br>

For example:

`loss_function = torch.nn.MSELoss()` that is given in the next code cell

`self.activate_function = torch.nn.ReLU()` given in the above code

<br>

**Why do this?**

- Quickly and easily change an overall code's behavior: **reassign** the **variable** to a **different function**

    - <font color='dodgerblue'>explore different ideas</font> (e.g., different loss functions)

<br>

- **Abstraction**: abstract away the specific implementation details
    - Idea: <font color='dodgerblue'>**Focus** on the **what**, **not** the **how**</font>
        - more <font color='dodgerblue'>readable</font>
        - easier to understand **concepts** (e.g., the <font color='dodgerblue'>science</font>) - don't get lost in the details
        - easier to <font color='dodgerblue'>maintain</font>
        - can be **harder** to **understand** the **details**
 
    - Related terms:
        - <font color='dodgerblue'>**encapsulation**</font>: **grouping data** (information) and the **methods** (functions) that are **related** within a single unit (e.g. a class)
        - <font color='dodgerblue'>**modularity/decomposition**</font>: **breaking down** a **large program** into **smaller**, **independent** components (e.g., **functions**)

<hr style="border:2px solid gray"></hr>

In [None]:
model = SimpleNN(input_width=input_width, hidden_width=hidden_width, output_width=output_width)

optimizer = optim.SGD(params=model.parameters(), lr=learning_rate)

loss_function = torch.nn.MSELoss()

### Model Training

- `zero_grad()`: **set/reset** the **gradients** of all **optimized tensors** (i.e, for the **weights** and **biases**)
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
    - this is the <font color='dodgerblue'>same concept as above</font> when we used `torch.Tensor.zero` in the basic example
        - this is necessary since <font color='dodgerblue'>`.backward()` accumulates the gradients</font> **each time** it is **called**

<br>

- `torch.optim.Optimizer.step`: perform an **optimization step** based on the **current gradients** (i.e., those stored in `.grad`), which is coming from **`.backward()`** 
    - https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html

In [None]:
for epoch in range(num_epochs):
    # Forward pass
    output_Y2 = model(input_X1)

    loss = loss_function(output_Y2, target_Y2)

    # Backward pass
    optimizer.zero_grad()

    loss.backward()

    # Optimization: update weights and biases
    optimizer.step()

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

Final output can be given in an dict using `state_dict()`:

In [None]:
model.state_dict()

Nicely printed out, including the tensor sizes the `dtype`:

In [None]:
objects_ini = model.state_dict()

print_array_specs(in_arrays=objects_ini)

#### Using the trained (optimized) model

Once a neural network model has been well-trained, it is ready to be used (i.e., to make predictions).

**Important for science!**

<font color='dodgerblue'>**Usability**</font> and <font color='dodgerblue'>**Reproducibility**</font>: how can others use the trained model?

(In other words, **users** must perform a **forward pass** on your **network** to make a **prediction**.)

<br>

What is required is (must be given - e.g., on a GitHub repo, in a paper, etc.):
1. the neural network architecture
    - <font color='dodgerblue'>number of layers</font> and <font color='dodgerblue'>number of nodes</font> in each layer
    - <font color='dodgerblue'>how the nodes are connected</font> (linearly)
    - the <font color='dodgerblue'>activation functions</font> and their placement within the network
    - the <font color='dodgerblue'>optmizer</font>
3. the optimized parameters
    - the <font color='dodgerblue'>optimized weights</font>
    - the <font color='dodgerblue'>optimized biases</font>
4. other parameters (i.e., called hyperparameters)
    - <font color='dodgerblue'>learning rate</font>
    - optimization cutoff thresholds or maximum number of epochs
    - model-specific unique details


**Note**: often (natural) scientists are **publishing** their developed neural network research, and **releasing** their **code**. However, they **do not include** the optimized **weights**, **biases**, and **hyperparameters** details. Consequently, if anyone wants to use their models or reproduce their work, they must **redo the training** (expensive).

<br>

##### Create random new data:
- 5 new samples
    - **Important note**: one would never train on such a small data samples (2) and then make predictions - it is done here to simplify the teaching example.
- 10 features in each new sample (as required by the model specification)

In [None]:
new_data = torch.randn(5, 10)

Make predictions by passing the new input data to the trained model:

In [None]:
model(new_data)

<hr style="border:2px solid gray"></hr>

### Advance Concept: Creating a customized activation function

- Create a **class** that contains a new activation function, which contains a **module** named `forward` (as needed by PyTorch).

In [None]:
class ModifiedRelu(torch.nn.Module):
    ''' Modified ReLU activation function.

        Class that implements a modified ReLU function that adds
        1.0 to the input. 

        Attributes:
            input (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: modified ReLU activation output
    ''' 
    def forward(self, input: torch.Tensor) -> torch.Tensor:
        """
        Forward pass of the modified_relu activation function.

        Args:
            input (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: Output tensor after applying the modified
            ReLU activation.
        """
        if not isinstance(input, torch.Tensor):
            raise TypeError("Input must be a torch.Tensor") 
        else:
            mod_relu = torch.maximum(input+1.0, torch.zeros_like(input))

            ## Comment out to see how mod_relu operates on the input
            # print(f'{input}\n{mod_relu}')

            return mod_relu

Specify our neural network archetecture
- layers
- modified activation function

In [None]:
class ModifiedNN(torch.nn.Module):
    """ Modified Neural Network.

        This class defines a simple feedforward neural network with 
        one hidden layer and uses the ModifiedReLU activation function.

        Attributes:
            input_width (int): number of nodes in the input layer.
            hidden_width (int): number of nodes in the hidden layer.
            output_width (int): number of nodes in the output layer.
    """
    def __init__(self, input_width: int, hidden_width: int, output_width: int):
        ''' Initialize the attributes (i.e. variables).

            Defines fully connected layers 1 and 2's input and output width,
            and the modified activation function.
        '''        
        if not all(isinstance(param, int) for param in [input_width, hidden_width, output_width]):
            raise TypeError("All input parameters must be an integer")
        else:
            super(ModifiedNN, self).__init__()
            
            self.fc1 = torch.nn.Linear(input_width, hidden_width)
            self.fc2 = torch.nn.Linear(hidden_width, output_width)
            self.modified_relu = ModifiedRelu()

    def forward(self, in_data: torch.Tensor) -> torch.Tensor:
        """ Forward pass of the ModifiedNN.

            Args:
                in_data: Input data tensor (i.e., features)

            Returns:
                forward_data: Output data tensor after neural network forward pass
        """
        if not isinstance(in_data, torch.Tensor):
            raise TypeError("Input must be a torch.Tensor")
        else:
            forward_data = self.fc1(in_data)
            forward_data = self.modified_relu(forward_data)
            forward_data = self.fc2(forward_data)

            return forward_data

1. Create a new model using the `ModifiedNN` architecture
2. Specify the type of optimizer to use

In [None]:
new_model = ModifiedNN(input_width=input_width, hidden_width=hidden_width, output_width=output_width)
optimizer = optim.SGD(params=new_model.parameters(), lr=learning_rate)

Train the model:

In [None]:
for epoch in range(num_epochs):
    output_Y2 = new_model(input_X1)

    loss = loss_function(output_Y2, target_Y2)

    optimizer.zero_grad()
    loss.backward()

    optimizer.step()

    print(f'Epoch {epoch + 1}: Loss = {loss.item():.3f}')

print(f'\nFinal Output: \n {output_Y2}\n')

objects_ini = model.state_dict()
print_array_specs(in_arrays=objects_ini)

#### Training and Testing
Recall that in our shallow learning lecture, we discussed the concept of splitting a dataset into a training dataset and a test dataset. The same things idea is still utilized for neural networks.

<hr style="border:2px solid gray"></hr>

#### Summary of PyTorch Example:
- A <font color='dodgerblue'>class</font> (like a blueprint) and <font color='dodgerblue'>`nn.Module`</font>: a structured PyTorch approach for **defining a neural network**
    - e.g., architecture, activation functions
    - allows for easy/better organization and code reusability
    - module vs. function
    - PEP8 rules for naming and blank lines
- Built-in <font color='dodgerblue'>Activation</font>: `torch.nn.ReLU`
- Built-in <font color='dodgerblue'>Loss</font>: `torch.nn.MSELoss` for mean squared loss (i.e., Loss2; L2)
- All <font color='dodgerblue'>gradients</font> needed in backward propagation done using `autograd.backwards()`
- Built-in <font color='dodgerblue'>Optimizer</font>: `optim.SGD` for gradient descent and usage of `.step()`
- Model training and using
- Create a <font color='dodgerblue'>customized activation</font> function and implement it