# Week 0: Introduction to Deep Learning Frameworks

## Notebook 2: MNIST Classification with a Convolutional Neural Network on PyTorch

Welcome to the second notebook of deep learning frameworks week! In this notebook, we are going to build a convolutional neural network in PyTorch to classify MNIST images. Objective of this week is to get you acquainted with PyTorch basics.

## 0. Problem Definition

In this notebook, once again we are classifying handwritten digits with the MNIST dataset! This time, however, we are going to be using a convolutional neural network (CNN) instead of a fully connected one as in the previous notebook.

Let's begin by installing PyTorch:

## 1. Install PyTorch

Follow the [official guidelines](https://pytorch.org/get-started/locally/) to install PyTorch.

You can find the updated instructions for the latest versions there. If you wish to install an older version, you can also install using the instructions on the website.

## 2. Imports

Let's start by importing the necessary modules:

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR

## 3. Data Preparation

We are going to directly use `torchvision.datasets` to quickly obtain the MNIST data. This packages contains many datasets that are ready to use. Feel free to go through its documentation.

First we define the necessary transformations as the preprocess steps. Namely, we convert the images to tensors and normalize them with mean 0.1307 and standard deviation 0.3081. These values are known for the MNIST dataset.

In [2]:
transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])

Let's get the MNIST data by applying our transforms. Setting `download=True` will download the dataset if it's not downloaded already.

In [3]:
dataset1 = datasets.MNIST('../data', train=True, download=True,
                   transform=transform)
dataset2 = datasets.MNIST('../data', train=False,
                   transform=transform)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data\MNIST\raw\train-images-idx3-ubyte.gz


9913344it [00:03, 2498471.69it/s]                             


Extracting ../data\MNIST\raw\train-images-idx3-ubyte.gz to ../data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data\MNIST\raw\train-labels-idx1-ubyte.gz


29696it [00:00, 9921463.40it/s]          


Extracting ../data\MNIST\raw\train-labels-idx1-ubyte.gz to ../data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data\MNIST\raw\t10k-images-idx3-ubyte.gz


1649664it [00:00, 2647786.22it/s]                             


Extracting ../data\MNIST\raw\t10k-images-idx3-ubyte.gz to ../data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data\MNIST\raw\t10k-labels-idx1-ubyte.gz


5120it [00:00, ?it/s]                   

Extracting ../data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ../data\MNIST\raw






Below we prepare the data loaders to be used in training:

In [4]:
train_loader = torch.utils.data.DataLoader(dataset1, batch_size=128)
test_loader = torch.utils.data.DataLoader(dataset2, batch_size=1000)

## 4. Model Creation

Now let's define our convolution neural network using an object oriented approach. Custom networks in PyTorch are designed as classes derived from `nn.Module`. Below we see an example of that:

In [5]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # nn.conv2d(in_channels, out_channels, kernel_size, stride)
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

In PyTorch, we only write the `__init__` method and the `forward` method. We do not have to write the backward pass operations since PyTorch supports automatic differentiation.

Our model contains two convolutional layers followed by a max pooling operation, then two fully connected layers with dropouts. If you look at the forward function, you can clearly see the sequential operations applied in the neural network.

Next up, we create our model, optimizer, and scheduler. Optimizer starts with learning rate 1.0 and after each epoch, decreases the learning rate with a factor of 0.7.

In [6]:
model = Net()

optimizer = optim.Adadelta(model.parameters(), lr=1.0)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)

## 5. Train and Evaluate

Below we train our model for 14 epochs and evaluate the accuracy on the test set after each epoch. Notice how we change the mode of the model with `model.train()` and `model.eval()`. We use the train dataloader for training and test dataloader for testing.

One thing to note is that using `F.nll_loss()` makes the model work with negative log-likelihood loss. Therefore, we applied `log_softmax` at the end of our convolutional neural network. 

In [7]:
log_interval = 10
num_epochs = 14
for epoch in range(1, num_epochs + 1):
    
    # Train for one epoch
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            
    
    # Evaluate after the epoch
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

    scheduler.step()


Test set: Average loss: 0.0491, Accuracy: 9835/10000 (98%)


Test set: Average loss: 0.0385, Accuracy: 9867/10000 (99%)


Test set: Average loss: 0.0332, Accuracy: 9885/10000 (99%)


Test set: Average loss: 0.0296, Accuracy: 9903/10000 (99%)


Test set: Average loss: 0.0288, Accuracy: 9902/10000 (99%)


Test set: Average loss: 0.0281, Accuracy: 9908/10000 (99%)


Test set: Average loss: 0.0281, Accuracy: 9909/10000 (99%)


Test set: Average loss: 0.0275, Accuracy: 9905/10000 (99%)


Test set: Average loss: 0.0279, Accuracy: 9907/10000 (99%)


Test set: Average loss: 0.0275, Accuracy: 9910/10000 (99%)


Test set: Average loss: 0.0272, Accuracy: 9914/10000 (99%)


Test set: Average loss: 0.0269, Accuracy: 9914/10000 (99%)


Test set: Average loss: 0.0272, Accuracy: 9911/10000 (99%)


Test set: Average loss: 0.0270, Accuracy: 9915/10000 (99%)



Congratulations on finishing this notebook. In our next notebook, we will be looking at our final framework Keras, and we will use it to classify Cifar-10 images.

**Bonus - Try to:**

- Get a test image
- Plot the image
- Make a model prediction on the image
- Print the predicted label and the actual label!