# Convolutional Neural Network
> Building an training of the convolutional neural network
>
> **Table of Contents**
>> 1. Import Libraries
>>
>> 2. Load Dataset
>>
>> 3. Define Model
>>
>> 4. Configure Device to Execute Model
>>
>> 5. Define Loss Function and Optimizer
>>
>> 6. Train Model
>>
>> 7. Test Model
>>
>> 8. Save Model

# Import Libraries

> Import libraries needed for the program

In [71]:
#import torch for machine learning capabilities
import torch

#import torchvision for image dataset (MNIST) and image transformations
import torchvision

# Load Dataset

> With the libraries imported, we will now load the data to be used for training

In [72]:
#Download and reference the training dataset
training_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True,
                                           transform=torchvision.transforms.ToTensor(),
                                           download=True)

#Download and reference the testing dataset
testing_dataset = torchvision.datasets.MNIST(root='./data',
                                             train=False,
                                             transform=torchvision.transforms.ToTensor(),
                                             download=True)

> After loading the dataset, the next step is to specify the dataload to load the data in batches
>
> Because data will be being loaded in batches, we will define a hyperparameter for the batch size of the model

In [73]:
#Define the hyperparameter for the batch size (i.e., chunck size of the data being loaded)
batch_size = 100

#Define the DataLoader for the training dataset
training_loader = torch.utils.data.DataLoader(dataset=training_dataset,
                                              batch_size=batch_size,
                                              shuffle=True,
                                              drop_last=False)

#Define the DataLoader for the training dataset
testing_loader = torch.utils.data.DataLoader(dataset=testing_dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             drop_last=False)

# Define Model

> We will now specify a convolutional neural network to take in some input vector, apply convolution operations, and output a probability vector
>
> When defining the model, we will use the module list (which is a list of module) to define the order of layers and use the sequential list (which is a cascading of modules) to define each convolutional layer
>
> The Convolutional Neural Network (CNN) consists of a pattern of different operations:
>> **Convolution Operation** - colvolve around image by applying some filter (https://www.youtube.com/watch?v=eMXuk97NeSI&list=PLZDCDMGmelH-pHt-Ij0nImVrOmj8DYKbB)
>>> **in_channels**: the number of features each pixel contains (will usually start at either 1 for B&W images or 3 for RGB images)
>>>
>>> **out_channels**: the number of features for each pixel in the output image (usually goot to start at a relatively small power of 2 such at 8 or 16 and then double the features on each layer until around 512 or 1024)
>>>
>>> **kernel_size**: the size of the filter; usually will always use a 3x3 filter except for the first filter which may be 3x3, 5x5, or even 7x7.
>>>
>>> **stride**: how big of a step the filter takes. A rule to not have lost data is to have the $ stride \leq kernel_size $
>>>
>>> **padding**: the addition of 0-value features at the border of the input. Generally, to not have data lost, padding is added so that bigger kernel sizes/strides do not go out of bounds before convolving over all values in a row/column.
>>
>> **Batch Normalization**: perform normalization of the data within the layers rather than on the data itself to speed up the training process (https://www.baeldung.com/cs/batch-normalization-cnn)
>>
>> **Non-Linear Activation Function (ReLU)**: apply a non-linear activation so that the neural network can approximate any function (https://www.quora.com/Why-do-we-need-non-linear-activation-functions-in-a-neural-network)
>>
>> **Max Pooling**: Pooling of the layer, taking only the max pixel of a certain region; done to limit overall computation of the model. However, can achieve similar effect with a convolution operation with a larger stride (https://stats.stackexchange.com/questions/288261/why-is-max-pooling-necessary-in-convolutional-neural-networks)

In [74]:
#Define the Convolutional Neural Network (CNN) Class
class CNN(torch.nn.Module):
    #CNN constructor function
    def __init__(self, num_classes):
        #First, call the constructor of the parent class
        super(CNN, self).__init__()
        #Specify the module list for the sequence of layers in the CNN
        self.module_list = list()
        #Define the first convolutional layer
        layer1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            torch.nn.BatchNorm2d(16),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2, stride=2)
        )
        #Append the convolutional layer to the module list
        self.module_list.append(layer1)
        #Define the second convolutional layer
        layer2 = torch.nn.Sequential(
            torch.nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            torch.nn.BatchNorm2d(32),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2, stride=2)
        )
        #Append the convolutional layer to the module list
        self.module_list.append(layer2)
        #Flatten the output into a single vector for input into a linear model
        #Default parameters for Flattening layer leaves the first dimension untouched (i.e., the batches) and flattens everything else (i.e., the features)
        self.module_list.append(torch.nn.Flatten())
        #Append the linear layer to the model
        self.module_list.append(torch.nn.Linear(7*7*32, num_classes))
        #Log-Softmax used for numberical stability
        self.module_list.append(torch.nn.LogSoftmax(dim=1))
        #Lastly, convert the list object of modules into a ModuleList object (used so that PyTorch can detect the modules/parameters contained in the list)
        self.layers = torch.nn.ModuleList(self.module_list)

    #Define how the model applies forward propogation
    def forward(self, x):
        #Initially have the output be the passed data
        out = x
        #Loop through each layer in the model
        for layer in self.layers:
            #get the output for each layer, passing it to the next layer
            out = layer(out)
        #return the output
        return(out)

> With the CNN model now defined, we must instantiate the class to then train
>
> For this model, all we need to specify is the output size

In [75]:
#Define hyperparameter for the output size
num_classes = 10

#Instantiate the model
model = CNN(num_classes)

# Configure Device to Execute Model

> With the model defined, we can select the device to train the model on
>
> We will opt for the GPU via the CUDA library, but if it is unavailable, we will use the CPU

In [76]:
#Specify the device for the model based on availability
device = torch.device("cuda" if(torch.cuda.is_available()) else "cpu")

> Now that the device is specified, we will move our instantiated model to that device

In [77]:
#Move the instantiated model to the best available device
model = model.to(device)

# Define Loss Function and Optimizer

> With the data acquired and the model specified, the next step if to define the loss function which we wish to minimize
>
> Since the model returns the log-softmax, we will use the NLL loss function

In [78]:
#Will use the NLL loss function since the model returns the log-softmax vector
lossFunction = torch.nn.NLLLoss()

> Next, we will define the optimizer algorithm which defines how we will take steps to minimize the loss function
>
> Our optimizer will be the Adam optimizer
>
> For this optimizer, a hyperparameter we need to specify is the learning rate, which is essentially the step-size for the optimizer

In [79]:
#Specify the hyperparameter for the learning rate (step size)
learning_rate = 0.001

#Use the Adam optimizer as the optimizer for minimizing the loss function
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train Model

> Now, with our model defined, we can train it on our training data
>
> For our training data, we will need to define one last hyperparameter: the number of epochs to train with

In [80]:
#Define the hyperparameter for the number of epochs
num_epochs = 5

#Specify the number of steps total in each epoch
total_steps = len(training_loader)

#Loop through each epoch
for epoch in range(num_epochs):
    #loop through each batch within each epoch
    for i, (images, labels) in enumerate(training_loader):
        #move images to configured device for best performance
        images = images.to(device)
        #move labels to configured device for best performance
        labels = labels.to(device)

        #perform a forward propogation on the model using the data in the batch
        outputs = model(images)
        #compute the current loss if the model using the outputs from the forward propogation
        loss = lossFunction(outputs, labels)

        #reset the gradient vector of the optimizer
        optimizer.zero_grad()
        #perform backward propogation to compute the gradient
        loss.backward()
        #perform a single step in accordance to our hyperparameter (learning rate = 0.001) and optimizer algorithm (Adam)
        optimizer.step()

        #print the new loss for the model after every 100 steps are taken
        if((i+1)%100 == 0):
            #print relevent information about the current progress made during training to the console
            print(f"Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{total_steps}], Loss: {loss.item():4f}")

Epoch [1/5], Step [100/600], Loss: 0.114906
Epoch [1/5], Step [200/600], Loss: 0.039810
Epoch [1/5], Step [300/600], Loss: 0.062347
Epoch [1/5], Step [400/600], Loss: 0.070658
Epoch [1/5], Step [500/600], Loss: 0.048792
Epoch [1/5], Step [600/600], Loss: 0.070185
Epoch [2/5], Step [100/600], Loss: 0.056447
Epoch [2/5], Step [200/600], Loss: 0.013132
Epoch [2/5], Step [300/600], Loss: 0.018252
Epoch [2/5], Step [400/600], Loss: 0.047630
Epoch [2/5], Step [500/600], Loss: 0.034676
Epoch [2/5], Step [600/600], Loss: 0.050215
Epoch [3/5], Step [100/600], Loss: 0.005318
Epoch [3/5], Step [200/600], Loss: 0.009604
Epoch [3/5], Step [300/600], Loss: 0.065311
Epoch [3/5], Step [400/600], Loss: 0.040965
Epoch [3/5], Step [500/600], Loss: 0.017391
Epoch [3/5], Step [600/600], Loss: 0.009358
Epoch [4/5], Step [100/600], Loss: 0.019390
Epoch [4/5], Step [200/600], Loss: 0.028005
Epoch [4/5], Step [300/600], Loss: 0.099064
Epoch [4/5], Step [400/600], Loss: 0.023662
Epoch [4/5], Step [500/600], Los

# Test Model

> Now that we have trained our model, the next step is to test the model against data it has never seen before (i.e., the testing data)
>
> Before testing, set the model to evaluation mode
>
> Model evaluation mode is used to change the forward method in the model to a form that is more proper for testing rather than training
>
> Model evaluation mode will disable drop out (random eactivation of neurons) and computes statistics over the entire dataset rather than each batch
>
> To return to training mode, use model.train()

In [81]:
#Set the model to evaluation mode
model.eval()

CNN(
  (layers): ModuleList(
    (0): Sequential(
      (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
      (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
    (1): Sequential(
      (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
    (2): Flatten(start_dim=1, end_dim=-1)
    (3): Linear(in_features=1568, out_features=10, bias=True)
    (4): LogSoftmax(dim=1)
  )
)

> Model Evaluation mode does not turn off gradient computation, so we will still need to turn it off while training for increased testing speed

In [82]:
#Turn gradient compuation off while testing (not needed - wasted computation)
with torch.no_grad():
    #initiallize counter variables for the total and correct number of predictions
    total_predictions = 0
    correct_predictions = 0
    
    #loop through each batch and make predictions according to the trained model
    for images, labels in testing_loader:
        #move images to configured device for best performance
        images = images.to(device)
        #move labels to configured device for best performance
        labels = labels.to(device)

        #perform forward propogation using the testing data on the model
        outputs = model(images)

        #the classs with the highest probability will be the classification by the model
        probability, predicted_class = torch.max(outputs.data, dim=1)

        #add to the total predictions counter
        total_predictions += labels.size(dim=0)
        #add to the total predictions that were correct
        #the number of correct predictions are the sum of components in a boolean list, here the predicted class matched the specified label
        correct_predictions += (predicted_class == labels).sum().item()

    #lastly, print the accuracy of the model
    print(f"Test accuracy of the model on the 10,000 images: {100*correct_predictions/total_predictions}%")

Test accuracy of the model on the 10,000 images: 98.81%


# Save Model

> Now that we have trained and tested the model, the final step is to save the trained model to load and run at another time or on another device

In [83]:
#Save the trained model checkpoint
torch.save(model.state_dict(), "model.ckpt")