# Feed-Forward Neural Network
> Building and training of the feed-forward neural network
>
> **Table of Contents**
>> 1. Import Libraries
>>
>> 2. Load Dataset
>>
>> 3. Define Model
>>
>> 4. Configure Device to Execute Model
>>
>> 5. Define Loss Function and Optimizer
>>
>> 6. Train Model
>>
>> 7. Test Model
>>
>> 8. Save Model

## Import Libraries

> Before beginning, we must first import all libraries needed for the program

In [23]:
#import torch for machine learning capabilities
import torch

#import torchvision for image dataset (MNIST) and image transformations
import torchvision

## Load Dataset

> First we must download and make a reference to the training and testing data from the MNIST dataset

In [24]:
#Download and make a reference to the training data
training_dataset = torchvision.datasets.MNIST(root="./data",
                                              train=True,
                                              transform=torchvision.transforms.ToTensor(),
                                              download=True)

#Download and make a reference to the testing data
testing_dataset = torchvision.datasets.MNIST(root="./data",
                                              train=False,
                                              transform=torchvision.transforms.ToTensor(),
                                              download=True)

> After downloading and referencing the data, we must specify the input pipeline via the DataLoader
>
> As part of the specification of loading the data, we will need to define the hyperparameter for the batch size

In [25]:
#Specify the hyperparameter for the batch size
#Batch size: the amount of data points to look through before making an update of the model parameters
batch_size = 100

#Instantiate a DataLoader object to specify how to load the training data
#Shuffling the data and not dropping the last batch (last batch size <= batch size)
training_loader = torch.utils.data.DataLoader(dataset=training_dataset,
                                              batch_size=batch_size,
                                              shuffle=True,
                                              drop_last=False)

#Instantiate a DataLoader object to specify how to load the testing data
#Not shuffling the data and not dropping the last batch (last batch size <= batch size)
testing_loader = torch.utils.data.DataLoader(dataset=testing_dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             drop_last=False)

## Define Model

> With the data formatted by the DataLoader object, the next step is the define our model
>
> We will first need to build our feed-forward neural network class by combining modules together

In [26]:
#Crete a neural network class, derived from the torch.nn.Module class
class NeuralNetwork(torch.nn.Module):
    #In the constructor, specify the neural network architecture by integers in an iterable object
    def __init__(self, layers_nn):
        #First, we need to call the constructor of the parent class for the current instance of the class
        super(NeuralNetwork, self).__init__()
        #Then define a list of modules that can be appended to
        self.module_list = list()
        #Then iterate though the number of nodes in each layer, connecting them together alongside using a ReLU non-linear function
        #Only perform pattern up to second to last layer
        for i in range(len(layers_nn)-2):
            #Connect one layer to the next
            self.module_list.append(torch.nn.Linear(layers_nn[i], layers_nn[i+1]))
            #And then apply a ReLU activation function
            self.module_list.append(torch.nn.ReLU())
        #For the final layer, we will only map the second to last layer to the last (no ReLU function needed as it will be replaced by softmax)
        self.module_list.append(torch.nn.Linear(layers_nn[-2], layers_nn[-1]))
        #Log-Softmax used for numberical stability
        self.module_list.append(torch.nn.LogSoftmax(dim=1))
        #Lastly, convert the list object of modules into a ModuleList object (used so that PyTorch can detect the modules/parameters contained in the list)
        self.layers = torch.nn.ModuleList(self.module_list)
    
    #Define how to apply forwrd propagation to the neural network
    def forward(self, x):
        #Define the output of each layer which will be propogated from one layer to the next
        out = x
        #Iterate through each layer, propogating the output as we go
        for layer in self.layers:
            out = layer(out)
        #Specify for the output to require the use the gradient for backward propogation
        #out.require_grad_()
        #return the output
        return(out)


> Now that we have defined the neural network class, we can instantiate the class to define our model
>
> We will need to specify the layer sizes as a hyperprameter for our neural network model. This includes the input size, hidden layers, and num classes (output size)

In [27]:
#Define the hyperparameter for the neural network layer sizes
input_size = 28*28 #For 28x28 image
num_classes = 10 #For decision between 10 digits (0-9)
layers = [input_size, 500, 100, num_classes] #2 hidden layers

#We will then instantiate the model
model = NeuralNetwork(layers)

## Configure Device to Execute Model

> Since we are using a neural network, we select whether we use the CPU or GPU (via CUDA if it is available)
>
> The fist step is to determine which device to use. This will be determined by having it on the CPU by default, but use the GPU (via CUDA) if it is avilable
>
> **What is CUDA?**
>> CUDA (Compute Unified Device Architecture) is a package of libraries and software for Machine Learning (ML), Deep Learning (DL), and High Performance Computing (HPC) applications. CUDA enables interfacing directly with the GPU via its own APIs and compiler. Communication with the GPU is essential for ML, DL, and HPC applications because the many cores present on the GPU allow for parallel computation, which is useful when performing matrix arithmetic
>>
>> In this sense, CUDA can be thought of as libraries with their own systems software for compilation into code that can efficiently talk to the GPU
>>
>> The core of CUDA is written in C++; however, wrappers for the core C++ code enable use of other programming languages such as Python

In [28]:
#Set the default device to be the CPU
device = torch.device("cpu")

#If CUDA is available...
if(torch.cuda.is_available()):
    #...utillize the GPU by setting the device to CUDA
    device = torch.device("cuda")

> Now that device has been selected, we can configure the model to use that device

In [29]:
#Set the model to use the resources of the specified device (CPU or GPU (via CUDA))
model = model.to(device)

## Define Loss Function and Optimizer

> With the model defined, we must now define the Loss function and optimizer to support the model in learning form data
>
> First will be the Loss function, which is dependendent on our implementation
>
> Since we are applying the logSoftmax as the last layer of the model, our loss function will be the Negative Log Likelihood (NLL) Loss function

In [30]:
#Define the loss function to use when training the model
lossFunction = torch.nn.NLLLoss()

> After defining the loss function, we will then define the optimizer to update the weights
>
> Our optimizer will use the Adam (adaptive moment estimation) optimizer, which is an extension of stochastic graient descent, combining the benefits of Momentum (moving quickly toward general region of minimum) and RMSprop (perform a more direct movement toward to minimum) (https://www.youtube.com/watch?v=Syom0iwanHo & https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-on-deep-learning-optimizers/)
>
> For this optimizer, the main hyperparameter that needs to be specified is the learning rate (step size) while all others are left at their recommended/default values (https://www.youtube.com/watch?v=JXQT_vxqwIs)

In [31]:
#Specify the hyperparameter for the learning rate (step size)
learning_rate = 0.001

#Define the optimizer to use when training this model (Adaptive Moment Estimation (Adam) optimizer)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

## Train Model

> Before we begin training the model, one last step is to define the final hyperparameter: the number of epochs which specify the amount of times to look through the training data before we stop training the model (i.e., updating the model parameters)
>
> After this, we now have the data loaded, the model defined, and all hyperparameters specified; so the last thing to do it to train the model on our training data

In [32]:
#Define num_epochs hyperparameter (i.e., the number of times to loop through the training data for the training process)
num_epochs = 5

#Now to begin the training process
#Loop through the training data for as many epochs as we specified
for epoch in range(num_epochs):
    #Loop through each batch
    for i, (images, labels) in enumerate(training_loader):
        #Reshape images in th batch into a feature matrix (image pixel data on one row and different images on different rows)
        images = images.reshape(-1, 28*28) #last batch size can be <= batch_size so -1 is a variable number of rows; 28*28 columns for 28x28 pixels in image
        #Move the images tensor to the same device that the model is configured to
        images = images.to(device)
        #Move the labels tensor to the same device that the model is configured to
        labels = labels.to(device)

        #Perform forward propogation on the model from the data in the current batch
        outputs = model(images)

        #Compute the current loss of the model
        loss = lossFunction(outputs, labels)

        #Reset the gradient to prepare for backards propogation
        optimizer.zero_grad()
        #Perform backards propogation to get the gradient vector
        loss.backward()
        #Take a step corresponding to the negative of the gradient using the specified optimizer algorithm (Adam: Adaptive Movement)
        optimizer.step()

        #On each 100 batches...
        if((i+1)%100 == 0):
            #print the current epoch, batch (step), and current model loss
            print(f"Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(training_loader)}], Loss: {loss.item():.4f}")

Epoch [1/5], Step [100/600], Loss: 0.3002
Epoch [1/5], Step [200/600], Loss: 0.3072
Epoch [1/5], Step [300/600], Loss: 0.2662
Epoch [1/5], Step [400/600], Loss: 0.2975
Epoch [1/5], Step [500/600], Loss: 0.1537
Epoch [1/5], Step [600/600], Loss: 0.1433
Epoch [2/5], Step [100/600], Loss: 0.2455
Epoch [2/5], Step [200/600], Loss: 0.0337
Epoch [2/5], Step [300/600], Loss: 0.1128
Epoch [2/5], Step [400/600], Loss: 0.1420
Epoch [2/5], Step [500/600], Loss: 0.0640
Epoch [2/5], Step [600/600], Loss: 0.0612
Epoch [3/5], Step [100/600], Loss: 0.0295
Epoch [3/5], Step [200/600], Loss: 0.0888
Epoch [3/5], Step [300/600], Loss: 0.1330
Epoch [3/5], Step [400/600], Loss: 0.0888
Epoch [3/5], Step [500/600], Loss: 0.0898
Epoch [3/5], Step [600/600], Loss: 0.1092
Epoch [4/5], Step [100/600], Loss: 0.0884
Epoch [4/5], Step [200/600], Loss: 0.0600
Epoch [4/5], Step [300/600], Loss: 0.0283
Epoch [4/5], Step [400/600], Loss: 0.0196
Epoch [4/5], Step [500/600], Loss: 0.0406
Epoch [4/5], Step [600/600], Loss:

## Test Model

> With the model no properly trained, we can test the model's performance on testing data (i.e., data that have never been seen/learned by the model)

In [33]:
#For memory efficeincy, perform computations without computing the gradient
with torch.no_grad():
    #Initiallize counters for the total tests and total correct
    total = 0
    correct = 0
    #Loop through each batch in the testing dataset
    for images, labels in testing_loader:
        #Reshape the images in the batch to create a feature matrix (with a variable number of rows)
        images = images.reshape(-1, 28*28)

        #Move the images tensor to the device configured for the model
        images = images.to(device)
        #Move the labels tensor to the device configured for the model
        labels = labels.to(device)

        #Perform a forward propogation of the testing data on the model
        outputs = model(images)

        #Get the probability values and the class predictions (that contain the highest probability) for each image
        probability_values, class_predictions = torch.max(outputs.data, dim=1)

        #Add the total number of predictions made on this batch
        total += labels.size(0)
        #Add the total number of correct predictions made on this batch
        correct += (class_predictions == labels).sum().item()

    #Print the total accuracy of the trained model on the testing data
    print(f"Accuracy on 10,000 test images: {100*correct/total}%")

Accuracy on 10,000 test images: 98.06%


## Save Model

> Now, if we are satisfied with the accuracy of the model on the testing data, we can save the trained model for reloading and use on other machines/applications

In [34]:
#Save the model to a checkpoint file
torch.save(model.state_dict(), "model.ckpt")