<b>The Dataset </b>

The [CIFAR10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) consists of 60000 32x32 color images split evenly across 10 different categories:
- airplane
- automobile
- bird
- cat
- deer
- dog
- frog
- horse
- ship
- truck

The training set has 50000 images, 5000 from each class. The testing data consists of 1000 images from each class. Our goal is to train a neural network to correctly classify images from these 10 types.

In [None]:
#We will be using PyTorch to construct our neural net
import torch
from torch import nn
from torch.utils.data import DataLoader
#the CIFAR10 dataset is included in a torchvision module
from torchvision import datasets
from torchvision.transforms import ToTensor, Normalize, Compose
#we can use os to make os-independent file paths to save the model
import os

In [None]:
"""
For computer vision tasks, it is generally a good idea to normalize the images. We will specifically use a normalization
method called standardization, in which, for every x in a set, we subtract the mean of the set and divide by the standard
deviation. For images, the set we calculate the mean and standard deviation of is the testing data images, and each color
channel is standardized seperately. Note this standardization transform must be applied to all other data that goes
through the model.
"""
#First, we load the training images.
image_set = datasets.CIFAR10(root='CIFAR10data', train = True, transform = ToTensor(), download = True)

"""
For small datasets, we could just immediately calculate the mean and standard deviation using built in PyTorch methods,
but here we will use a more general method that could be used for larger datasets without ever running into memory problems.
"""
#We create a DataLoader object for batch processing
image_loader = DataLoader(image_set, batch_size = 128)

#each channel contains 50000 32x32 images, so total number of pixels (set_size) is:
set_size = len(image_set.data )*32*32

#We need to loop through the images once to calculate the means for each channel
means = torch.zeros(3)
for batch, _ in image_loader:
    #sum all pixel values in all images of the three color channels, then add them to means
    means+=batch.sum([0,2,3])
means /= set_size
print(f'Means: {means}')

#These means can be used to calculate the standard deviations (abbreviated stds)
stds = torch.zeros(3)

#Makes a tensor of compatible dimension to be broadcast up to the size of each batch for elementwise subtraction in the next loop
broadcastable_means = torch.clone(means).unsqueeze(1).unsqueeze(2)

for batch, _ in image_loader:
    #to subtract off means we need to unsqueeze
    batch = batch-broadcastable_means
    #.mul() is elementwise multiplication, so this is just elementwise squaring
    batch = torch.mul(batch,batch)
    stds += batch.sum([0,2,3])
#now we just divide by the number of elements in each set, and take the square root of each of the three values
stds /= set_size
stds = torch.sqrt(stds)
print(f'Standard Deviations: {stds}')


In [None]:
#Now we load the testing and training sets, standardizing both of them
transform_chain = Compose([ToTensor(), Normalize(means, stds)])

training_data = datasets.CIFAR10(root = 'CIFAR10data', train = True, transform = ToTensor(), download = True)
testing_data = datasets.CIFAR10(root = 'CIFAR10data', train = False, transform = ToTensor(), download = True)

#Next we create DataLoader objects for the data so we can easily process the data in batches.
#pin_memory=True speeds the transfer of data from CPU to GPU
training_loader = DataLoader(training_data, batch_size = 128, pin_memory=True, shuffle = True)
testing_loader = DataLoader(testing_data, batch_size = 128, pin_memory=True, shuffle = False)


<b>Model Architecture </b>

For image classification tasks we will use a Convolutional Neural Network. This will make it easier to pick up on features which are characteristic of each image type than if we simply flattened the inputs and used fully-connected layers. Our proposed model has the following layers:

1.  - Input Dimension: 3x32x32 (32x32 images with 3 color channels R,G, and B)
    - Layer Specifications: convolutional kernel with size 11x11, zero padding of 5 on all sides
    - Activation Function: ReLU
    - Output Dimensions: 8x32x32 
2.  - Input Dimension: 8x32x32
    - Layer Specifications: convolutional kernel with size 7x7, zero padding of 3 on all sides
    - Activation Function: ReLU
    - Output Dimensions: 16x32x32
3.  - Input Dimension: 16x32x32
    - Layer Specifications: convolutional kernel with size 5x5, zero padding of 2 on all sides
    - Activation Function: ReLU
    - Output Dimensions: 16x28x28
4.  - Input Dimension: 16x32x32
    - Layer Specifications: convolutional kernel with size 5x5
    - Activation Function: ReLU
    - Output Dimensions: 16x28x28
5.  - Input Dimension: 16x28x28
    - Layer Specifications: MaxPooling with 2x2 kernel
    - Activation Function: N/A
    - Output Dimensions: 16x14x14
6.  - Input Dimension: 16x14x14
    - Layer Specifications: convolutional kernel with size 5x5
    - Activation Function: ReLU
    - Output Dimensions: 16x10x10
7.  - Input Dimension: 16x10x10
    - Layer Specifications: MaxPooling with 2x2 kernel
    - Activation Function: N/A
    - Output Dimensions: 16x5x5
8.  - Input Dimension: 16x5x5
    - Layer Specifications: convolutional kernel with size 5x5
    - Activation Function: ReLU
    - Output Dimensions: 160x1x1 (Flatten to 1D)
9.  - Input Dimension: 160
    - Layer Specifications: fully-connected layer
    - Activation Function: ReLU
    - Output Dimensions: 160
10. - Input Dimension: 160
    - Layer Specifications: Dropout layer, zeros 50% of nodes during training (to prevent overfitting). This layer does nothing during testing
    - Activation Function: N/A
    - Output Dimensions: 160
11. - Input Dimension: 160
    - Layer Specifications: fully-connected layer
    - Activation Function: Softmax
    - Output Dimensions: 10

In [None]:
#Here we will define our Neural Network using the architecture described above
class CNN(nn.Module):
  def __init__(self):
    super(CNN,self).__init__()
    self.layer1 = nn.Conv2d(in_channels = 3, out_channels = 8, kernel_size = 11, padding = 5)
    self.layer2 = nn.Conv2d(in_channels = 8, out_channels = 16, kernel_size = 7, padding = 3)
    self.layer3 = nn.Conv2d(in_channels = 16, out_channels = 16, kernel_size = 5, padding = 2)
    self.layer4 = nn.Conv2d(in_channels = 16, out_channels = 16, kernel_size = 5)
    self.layer5 = nn.MaxPool2d(kernel_size = 2)
    self.layer6 = nn.Conv2d(in_channels = 16, out_channels = 16, kernel_size = 5)
    self.layer7 = nn.MaxPool2d(kernel_size = 2)
    self.layer8 = nn.Conv2d(in_channels = 16, out_channels = 160, kernel_size = 5)
    self.flatten = nn.Flatten()
    self.layer9 = nn.Linear(in_features = 160, out_features = 160)
    self.layer10 = nn.Dropout(p = 0.5)
    self.layer11 = nn.Linear(in_features = 160, out_features = 16)

  def forward(self, x):
    x = nn.functional.relu(self.layer1(x))
    x = nn.functional.relu(self.layer2(x))
    x = nn.functional.relu(self.layer3(x))
    x = nn.functional.relu(self.layer4(x))
    x = self.layer5(x)
    x = nn.functional.relu(self.layer6(x))
    x = self.layer7(x)
    x = nn.functional.relu(self.layer8(x))
    x = self.flatten(x)
    x = nn.functional.relu(self.layer9(x))
    x = self.layer10(x)
    x = nn.functional.softmax(self.layer11(x), dim = 1)
    return x

In [None]:
#This is a convolutional network with significantly more parameters than the first.
class HeavyCNN(nn.Module):
  def __init__(self):
    super(HeavyCNN,self).__init__()
    self.layer1 = nn.Conv2d(in_channels = 3, out_channels = 8, kernel_size = 11, padding = 5)
    self.layer2 = nn.Conv2d(in_channels = 8, out_channels = 16, kernel_size = 7, padding = 3)
    self.layer3 = nn.Conv2d(in_channels = 16, out_channels = 128, kernel_size = 5)
    self.layer4 = nn.MaxPool2d(kernel_size = 2)
    self.layer5 = nn.Conv2d(in_channels = 128, out_channels = 256, kernel_size = 5)
    self.layer6 = nn.MaxPool2d(kernel_size = 2)
    self.layer7 = nn.Conv2d(in_channels = 256, out_channels = 512, kernel_size = 5)
    self.flatten = nn.Flatten()
    self.layer8 = nn.Linear(in_features = 512, out_features = 160)
    self.layer9 = nn.Dropout(p = 0.5)
    self.layer10 = nn.Linear(in_features = 160, out_features = 10)

  def forward(self, x):
    x = nn.functional.relu(self.layer1(x))
    x = nn.functional.relu(self.layer2(x))
    x = nn.functional.relu(self.layer3(x))
    x = self.layer4(x)
    x = nn.functional.relu(self.layer5(x))
    x = self.layer6(x)
    x = nn.functional.relu(self.layer7(x))
    x = self.flatten(x)
    x = nn.functional.relu(self.layer8(x))
    x = self.layer9(x)
    x = nn.functional.softmax(self.layer10(x), dim = 1)
    return x

In [None]:
#This loop will be used to actually train the model
def train_loop(dataloader, model, loss_fn, optimizer, device):
    #size will be used later to display progress to the user
    size = len(dataloader.dataset)
    
    #X is a batch of inputs, y are the associated ground-truths
    for batch, (X, y) in enumerate(dataloader):
        #First we push X and y to the GPU (if available)
        X = X.to(device)
        y = y.to(device)

        #generate the model's predictions based on input X
        predictions = model(X)
        #calculate how wrong those predictions are
        loss = loss_fn(predictions, y)

        #Use backpropagation to tweak the model, hopefully increasing accuracy
        #First we want to clear the optimizer for new loop iterations
        optimizer.zero_grad()
        #Actual backpropagation occurs here
        loss.backward()
        #Take one step in a direction informed by the partial derivatives calculated in the backpropagation
        optimizer.step()

        #Show the user the model's progress
        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test_loop(dataloader, model, loss_fn, device):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    #do not need to calculate the gradient when testing since we will not update model parameters here
    with torch.no_grad():
        #X is a batch of input data, y is the corresponding ground-truths
        for X, y in dataloader:
            #first we push X and y to GPU
            X = X.to(device)
            y = y.to(device)
            #calculate the models predicted outputs
            predictions = model(X)
            #calculate the loss generated by predictions and add it to test_loss
            test_loss += loss_fn(predictions, y).item()
            
            """
            predictions consists of length 10 vectors of percentage values. The value in the
            ith position of a prediction is to be roughly interpreted as the confidence that our model
            would have classifying the corresponding image as a member of the ith category. For example,
            if the seventh component of a prediction is .95, then the model is "95% confident" that the 
            corresponging image is of the seventh type (a frog). So we say the network makes a correct guess
            when its highest confidence guess is the correct category.
            """
            correct += (predictions.argmax(1) == y).type(torch.float).sum().item()

    #output information for the user
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [None]:
#optionally, switch to model=HeavyCNN to use the model with more parameters
model = CNN()
#determine if there is a GPU available for use
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#if GPU is available, push model to it.
model.to(device)
learning_rate = .01

#cross entropy is a pretty basic loss function choice
loss_fn = nn.CrossEntropyLoss()
#SGD without momentum was not improving fast enough for my liking. Adagrad uses some second-order
#information to rescale the learning rate.
optimizer = torch.optim.Adagrad(model.parameters(), lr = learning_rate)

epochs = 50
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(training_loader, model, loss_fn, optimizer, device)
    test_loop(testing_loader, model, loss_fn, device)
print("Done!")
#This saves the model for later use
save_path = os.path.join(os.path.curdir, 'CIFAR10classifier.pt')
torch.save(model, save_path)

**Results**

Neither network architecture performs spectacularly (CNN averages about 50% accuracy, HeavyCNN averages about 57%), however this is not a problem, as optimization of performance was not the goal; the goal was to figure out the basics of PyTorch. I would consider that goal achieved, and feel far more confident now moving onto harder tasks.