# Feed-Forward Neural Network
> Building and training of the feed-forward neural network
>
> **Table of Contents**
>> 1. Import Libraries
>>
>> 2. Configure Device
>>
>> 3. Load Dataset
>>
>> 4. Define Model
>>
>> 5. Define Loss Function and Optimizer
>>
>> 6. Train Model
>>
>> 7. Test Model
>>
>> 8. Save Model

## Import Libraries

> Before beginning, we must first import all libraries needed for the program

In [8]:
#import torch for machine learning capabilities
import torch

#import torchvision for image dataset (MNIST) and image transformations
import torchvision

## Configure Device

> Since we plan to use a neural network, we can use the CPU (via CUDA) if it is available

In [9]:
#Set the default device to be the CPU
device = torch.device("cpu")

#If CUDA is available...
if(torch.cuda.is_available()):
    #...utillize the GPU by setting the device to CUDA
    device = torch.device("cuda")

## Load Dataset

> First we must download and make a reference to the training and testing data from the MNIST dataset

In [10]:
#Download and make a reference to the training data
training_dataset = torchvision.datasets.MNIST(root="./data",
                                              train=True,
                                              transform=torchvision.transforms.ToTensor(),
                                              download=True)

#Download and make a reference to the testing data
testing_dataset = torchvision.datasets.MNIST(root="./data",
                                              train=False,
                                              transform=torchvision.transforms.ToTensor(),
                                              download=True)

> After downloading and referencing the data, we must specify the input pipeline via the DataLoader
>
> As part of the specification of loading the data, we will need to define the hyperparameter for the batch size

In [11]:
#Specify the hyperparameter for the batch size
#Batch size: the amount of data points to look through before making an update of the model parameters
batch_size = 100

#Instantiate a DataLoader object to specify how to load the training data
#Shuffling the data and not dropping the last batch (last batch size <= batch size)
training_loader = torch.utils.data.DataLoader(dataset=training_dataset,
                                              batch_size=batch_size,
                                              shuffle=True,
                                              drop_last=False)

#Instantiate a DataLoader object to specify how to load the testing data
#Not shuffling the data and not dropping the last batch (last batch size <= batch size)
testing_loader = torch.utils.data.DataLoader(dataset=testing_dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             drop_last=False)

## Define Model

> With the data formatted by the DataLoader object, the next step is the define our model
>
> We will first need to build our feed-forward neural network class by combining modules together

In [12]:
#Crete a neural network class, derived from the torch.nn.Module class
class NeuralNetwork(torch.nn.Module):
    #In the constructor, specify the neural network architecture by integers in an iterable object
    def __init__(self, layers_nn):
        #First, we need to call the constructor of the parent class for the current instance of the class
        super(NeuralNetwork, self).__init__()
        #Then define a list of modules that can be appended to
        self.module_list = list()
        #Then iterate though the number of nodes in each layer, connecting them together alongside using a ReLU non-linear function
        #Only perform pattern up to second to last layer
        for i in range(len(layers_nn)-2):
            #Connect one layer to the next
            self.module_list.append(torch.nn.Linear(layers_nn[i], layers_nn[i+1]))
            #And then apply a ReLU activation function
            self.module_list.append(torch.nn.ReLU())
        #For the final layer, we will only map the second to last layer to the last (no ReLU function needed as it will be replaced by softmax)
        self.module_list.append(torch.nn.Linear(layers_nn[-2], layers_nn[-1]))
        #Log-Softmax used for numberical stability
        self.module_list.append(torch.nn.LogSoftmax(dim=1))
        #Lastly, convert the list object of modules into a ModuleList object (used so that PyTorch can detect the modules/parameters contained in the list)
        self.layers = torch.nn.ModuleList(self.module_list)
    
    #Define how to apply forwrd propagation to the neural network
    def forard(self, x):
        #Define the output of each layer which will be propogated from one layer to the next
        out = x
        #Iterate through each layer, propogating the output as we go
        for layer in self.layers:
            out = layer(out)
        #Specify for the output to require the use the gradient for backward propogation
        #out.require_grad_()
        #return the output
        return(out)


> Now that we have defined the neural network class, we can instantiate the class to define our model
>
> We will need to specify the layer sizes as a hyperprameter for our neural network model. This includes the input size, hidden layers, and num classes (output size)

In [13]:
#Define the hyperparameter for the neural network layer sizes
input_size = 28*28 #For 28x28 image
num_classes = 10 #For decision between 10 digits (0-9)
layers = [input_size, 500, 100, num_classes] #2 hidden layers

#We will then instantiate the model
model = NeuralNetwork(layers)

#Last, we will send the model to the appropriate device (CPU or GPU (via CUDA))
model = model.to(device)

## Define Loss Function and Optimizer

> With the model defined, we must now define the Loss function and optimizer to support the model in learning form data
>
> First will be the Loss function, which is dependendent on our implementation
>
> Since we are applying the logSoftmax as the last layer of the model, our loss function will be the Negative Log Likelihood (NLL) Loss function

In [14]:
#Define the loss function to use when training the model
lossFunction = torch.nn.NLLLoss()

> After defining the loss function, we will then define the optimizer to update the weights
>
> Our optimizer will use the Adam (adaptive moment estimation) optimizer, which is an extension of stochastic graient descent, combining the benefits of Momentum (moving quickly toward general region of minimum) and RMSprop (perform a more direct movement toward to minimum) (https://www.youtube.com/watch?v=Syom0iwanHo & https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/)
>
> For this optimizer, the main hyperparameter that needs to be specified is the learning rate (step size) while all others are left at their recommended/default values (https://www.youtube.com/watch?v=JXQT_vxqwIs)

In [None]:
#Specify the hyperparameter for the learning rate (step size)
learning_rate = 0.001

#Define the optimizer to use when training this model (Adaptive Moment Estimation (Adam) optimizer)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

## Train Model

## Test Model

## Save Model