# Introduction

This tutorial demonstrates the basic usage in PyTorch. PyTorch is an open source machine learning library for python that have gained tremendous popularity due to a shallow learning curve, even for programmers with little experience in python. PyTorch uses several modules that makes it easy to define computational graphs, take gradients and optimaize neural networks. Currently, it is the fastest growing framework in terms of published papers in top conferences and industries alike.

PyTorch is availble for all popular operating systems. It is recommended that you install it locally using [Conda](https://conda.io/docs/) (Python package manager) without GPU support, unless you know for sure that your system supports GPU acceleration and you know how to install Cuda and CudNN (Not recommended and not nessecary for this course). Make sure you install both `pytorch` and `torchvision` using the commands in the official [PyTorch](https://pytorch.org/) site.

In [1]:
import torch
import torchvision
import torchvision.transforms as transforms

PyTorch comes with several datasets ready for use in the torchvision package. This feature is highly usefull since obtaining and preprocessing datasets can be tedious and time consuming. The following commands will download the famous [MNIST](http://yann.lecun.com/exdb/mnist/) dataset to your computer. Notice the parameters for the `DataLoader` function: we specify the root folder in which the dataset will be downloaded to. We use two different datasets - one for training and one for testing. Since neural networks can easily learn complex functions, we need to test the generalization capabilities of our network using data it have not seen before. Most mertics use the test data to measure how well the network perform. The `batch_size` parameter determines the number of images and their corresponding labels in each batch. For example, 

In [2]:
# Transforms is a usefull library containing many operations on images.
# Since the MNIST dataset is stored as PIL images, we need to transform it into tensors
transform = transforms.Compose([transforms.ToTensor()])

mnist_dataset_train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
mnist_trainloader = torch.utils.data.DataLoader(mnist_dataset_train, batch_size=4, shuffle=True, num_workers=2)
mnist_dataset_test = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
mnist_testloader = torch.utils.data.DataLoader(mnist_dataset_test, batch_size=4, shuffle=True, num_workers=2)

classes = classes = [x for x in range(10)] # We use this notation for easy code recycling

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Processing...
Done!


We will use a simple function to display the images in our dataset. Since PyTorch saves images in a channels first format, we need to rearrange the tensor using `np.transpose()` function. The following code snippet 

In [3]:
import matplotlib.pyplot as plt
import numpy as np

def imshow(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    
dataiter = iter(mnist_trainloader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('Ground Truth:',' '.join('%5s' % classes[labels[j]] for j in range(4)))

Ground Truth:     8     7     1     3


In [None]:
import torch.nn as nn
import torch.nn.functional as F

class MNIST_Net(nn.Module):
    '''
    Every network in PyTorch can (and should) be defined as a class. 
    Every class should have an init method containing the layers and
    a forward method that defines how the layers are connected. 
    This is often called the architecture of the network. 
    PyTorch handles all backprogation automatically.
    
    Define the basic architecture of your network. For now, you should only use fully connected layers.
    In PyTorch, fully connected layers are called Linear. You can read about them in :
    https://pytorch.org/docs/stable/nn.html#linear-layers
    
    '''
    def __init__(self):
        super(MNIST_Net, self).__init__()
        #############################################################################
        # TO DO:                                                                    #
        # Define the basic architecture of your network. For now, only use fully    #
        # connected layers. Read about calling fully connected layers at:           #
        # https://pytorch.org/docs/stable/nn.html#linear-layers                     #
        # In this function, you should only define the layers you intend to use.    #
        # Save each layer as a different *self variable*.                           #
        # This function has no return value.                                        #
        #############################################################################
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################

        
    def forward(self, x):
        
        #############################################################################
        # TO DO:                                                                    #
        # Define the forward propagation of your network. Connect each layer to     #
        # the next and experiment with different activations, number of parameters  #
        # and depths. You can read about different activations in PyTorch at        #
        # https://pytorch.org/docs/stable/nn.html#non-linear-activation-functions   #
        # Return a single tensor after passing it through your network.             #
        # Hint: Shaping a multidimensional tensor into a vector can be achieved by: #
        # the method x.view()                                                       #
        #############################################################################
        pass
        #############################################################################
        #                             END OF YOUR CODE                              #
        #############################################################################
mnist_net = MNIST_Net() # This line should instanciate your network as an object

In [None]:
import torch.optim as optim

#############################################################################
# TO DO:                                                                    #
# Pick a loss function and optimizer from your network. Start with a cross- #
# entropy loss and stochastic gradient descent with 0.001 learning rate and #
# test the effect of different learning rates and momentum.                 #
# Use the documentation and create variables to hold the loss function and  #
# optimizer. The model will take them as inputs for the training process    #
#############################################################################
criterion = None
optimizer = None
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################

In [None]:
def train_network(net, critertion, optimizer, trainloader):
    #############################################################################
    # TO DO:                                                                    #
    # Train your network. Use the train loader to fetch a batch of data and     #
    # labels. Then, zero the parameter gradients by using optimizer.zero_grad() #
    # and perform a forward propagation and calculate the loss. Afterwards,     #
    # calculate the gradients and backprob using loss.backward(), and perform   #
    # the optimization step by using optimizer.step(). Use the provided         #
    # statistics function to print useful information during training.          #
    # Two iteration over the entire dataset (2 epochs) should be enough.        #
    # Print the loss every 2000 batches to babysit the learning process.        #
    #############################################################################
    pass
    #############################################################################
    #                             END OF YOUR CODE                              #
    #############################################################################

In [None]:
%%time
train_network(mnist_net, criterion, optimizer, mnist_trainloader)

If everything works, the loss of the network should improved over time as it learns to classify hand-written digits.
In order to measure how well the network performs, we need the test dataset.
We classify each image in the test dataset (that the network never saw) and calculate the accuracy of the network. A good model should generalize and perform well even on data that was not seen during training.

In [None]:
def calc_net_accuracy(net, testloader):
    with torch.no_grad():
        correct = 0
        total = 0
        for data in testloader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
    
calc_net_accuracy(mnist_net, mnist_testloader)

A simple network can achieve results of over 95%. Try several network architectures until you reach at least 93% accuracy.

We can also take a batch and visualize the predictions:

In [None]:
dataiter = iter(mnist_testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('Ground Truth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

outputs = mnist_net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted   : ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))

We can also check the individual classification scores for each class in the dataset. 

In [None]:
def calc_class_accuracy(net, testloader):
    class_correct = list(0. for i in range(10))
    class_total = list(0. for i in range(10))
    with torch.no_grad():
        for data in testloader:
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs, 1)
            c = (predicted == labels).squeeze()
            for i in range(4):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1
    for i in range(10):
        print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))
        
calc_class_accuracy(mnist_net, mnist_testloader)

We have implemented a simple neural network in PyTorch that predicts hand-written characters that scores over 95% in a matter of seconds. Next, we try to use the same network with more complicated data. This time, we will use [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html). This dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. 

# CIFAR-10

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

cifar_trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar_trainloader = torch.utils.data.DataLoader(cifar_trainset, batch_size=4, shuffle=True, num_workers=2)

cifar_testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
cifar_testloader = torch.utils.data.DataLoader(cifar_testset, batch_size=4, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [None]:
# functions to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    
# get some random training images
dataiter = iter(cifar_trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%10s' % classes[labels[j]] for j in range(4)))

We will try to use the same network we defined earlier. although both MNIST and CIFAR-10 output a vector of size 10, we need to adjust the input of the network, since MNIST images are 1x28x28 and CIFAR-10 images are 3x32x32. The optimizer and loss function are the same.

In [None]:
class CIFAR_Net(nn.Module):
#############################################################################
# TO DO:                                                                    #
# Copy the architecture you used for the MNIST dataset. Change the size     #
# of the input to support images from the CIFAR-10 dataset.                 #
#############################################################################
    pass
#############################################################################
#                             END OF YOUR CODE                              #
#############################################################################

    
cifar_net = CIFAR_Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(cifar_net.parameters(), lr=0.001, momentum=0.9)

In [None]:
train_network(cifar_net, criterion, optimizer, cifar_trainloader)

Let's check the results on the entire CIFAR-10 test dataset.

In [None]:
calc_net_accuracy(cifar_net, cifar_testloader)

In [None]:
dataiter = iter(cifar_testloader)
images, labels = dataiter.next()

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%7s' % classes[labels[j]] for j in range(4)))

outputs = cifar_net(images)
_, predicted = torch.max(outputs, 1)

print('Predicted:   ', ' '.join('%7s' % classes[predicted[j]]for j in range(4)))

In [None]:
calc_class_accuracy(cifar_net, cifar_testloader)

While we can predicting MNIST data with 95%+ accuracy, we are only able to predict less than half of the CIFAR-10 dataset using the same architecture. The reason is that CIFAR-10 data is more complicated, and we need a network that can better capture spatial invariant features.