# Convolutional Neural Network
> Building an training of the convolutional neural network
>
> **Table of Contents**
>> 1. Import Libraries
>>
>> 2. Load Dataset
>>
>> 3. Define Model
>>
>> 4. Configure Device to Execute Model
>>
>> 5. Define Loss Function and Optimizer
>>
>> 6. Train Model
>>
>> 7. Test Model
>>
>> 8. Save Model

# Import Libraries

> Import libraries needed for the program

In [1]:
#import torch for machine learning capabilities
import torch

#import torchvision for image dataset (MNIST) and image transformations
import torchvision

# Load Dataset

> With the libraries imported, we will now load the data to be used for training

In [2]:
#Download and reference the training dataset
training_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True,
                                           transform=torchvision.transforms.ToTensor(),
                                           download=True)

#Download and reference the testing dataset
testing_dataset = torchvision.datasets.MNIST(root='./data',
                                             train=False,
                                             transform=torchvision.transforms.ToTensor(),
                                             download=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100.0%


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw


100.0%


Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz
Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz



100.0%


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100.0%


Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw



> After loading the dataset, the next step is to specify the dataload to load the data in batches
>
> Because data will be being loaded in batches, we will define a hyperparameter for the batch size of the model

In [3]:
#Define the hyperparameter for the batch size (i.e., chunck size of the data being loaded)
batch_size = 100

#Define the DataLoader for the training dataset
training_loader = torch.utils.data.DataLoader(dataset=training_dataset,
                                              batch_size=batch_size,
                                              shuffle=True,
                                              drop_last=False)

#Define the DataLoader for the training dataset
testing_loader = torch.utils.data.DataLoader(dataset=testing_dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             drop_last=False)

# Define Model

> We will now specify a convolutional neural network to take in some input vector, apply convolution operations, and output a probability vector
>
> When defining the model, we will use the module list (which is a list of module) to define the order of layers and use the sequential list (which is a cascading of modules) to define each convolutional layer
>
> **There are additional hyperparmeters for the CNN that I do not yet understand such as kernel size and stride. I will need to investigate into them an why there is a difference hen using them for the conv2d and maxpool2d layers**

In [4]:
#Define the Convolutional Neural Network (CNN) Class
class CNN(torch.nn.Module):
    #CNN constructor function
    def __init__(self, num_classes):
        #First, call the constructor of the parent class
        super(CNN, self).__init__()
        #Specify the module list for the sequence of layers in the CNN
        self.module_list = list()
        #Define the first convolutional layer
        layer1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            torch.nn.BatchNorm2d(16),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2, stride=2)
        )
        #Append the convolutional layer to the module list
        self.module_list.append(layer1)
        #Define the second convolutional layer
        layer2 = torch.nn.Sequential(
            torch.nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            torch.nn.BatchNorm2d(32),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2, stride=2)
        )
        #Append the convolutional layer to the module list
        self.module_list.append(layer2)
        #Append the linear layer to the model
        self.module_list.append(torch.nn.Linear(7*7*32, num_classes))
        #Log-Softmax used for numberical stability
        self.module_list.append(torch.nn.LogSoftmax(dim=1))
        #Lastly, convert the list object of modules into a ModuleList object (used so that PyTorch can detect the modules/parameters contained in the list)
        self.layers = torch.nn.ModuleList(self.module_list)

    #Define how the model applies forward propogation
    def forward(self, x):
        #Initially have the output be the passed data
        out = x
        #Loop through each layer in the model
        for layer in self.layers:
            #get the output for each layer, passing it to the next layer
            out = layer(out)
        #Specify for the output to require the use the gradient for backward propogation
        out.require_grad_()
        #return the output
        return(out)

> With the CNN model now defined, we must instantiate the class to then train
>
> For this model, all we need to specify is the output size

In [5]:
#Define hyperparameter for the output size
num_classes = 10

#Instantiate the model
model = CNN(num_classes)

# Configure Device to Execute Model

> With the model defined, we can select the device to train the model on
>
> We will opt for the GPU via the CUDA library, but if it is unavailable, we will use the CPU

In [6]:
#Specify the device for the model based on availability
device = torch.device("cuda" if(torch.cuda.is_available()) else "cpu")

> Now that the device is specified, we will move our instantiated model to that device

In [7]:
#Move the instantiated model to the best available device
model = model.to(device)

# Define Loss Function and Optimizer

> With the data acquired and the model specified, the next step if to define the loss function which we wish to minimize
>
> Since the model returns the log-softmax, we will use the NLL loss function

In [8]:
#Will use the NLL loss function since the model returns the log-softmax vector
lossFunction = torch.nn.NLLLoss()

> Next, we will define the optimizer algorithm which defines how we will take steps to minimize the loss function
>
> Our optimizer will be the Adam optimizer
>
> For this optimizer, a hyperparameter we need to specify is the learning rate, which is essentially the step-size for the optimizer

In [9]:
#Specify the hyperparameter for the learning rate (step size)
learning_rate = 0.001

#Use the Adam optimizer as the optimizer for minimizing the loss function
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train Model

# Test Model

# Save Model