# CNN on MNIST Data 
---

**Best Test Accuracy: 99.50%**

**Model Test Accuracy: 99.48%**

### Importing libraries
---

In [0]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

### Creating model architecture
---

The model consist of 2 convolution block and 1 output block as follows:

1) **Block 1**: It is a convolution block consisting of 2 convolution layer each of 8 filters with 3x3 convolution. Each convolution layer is followed by batch normalization and dropout of 0.1 . After the convolution layer is Max Pooling layer.

2) **Block 2**: It is a convolution block consisting of 2 convolution layer each of 16 filters with 3x3 convolution. Each convolution layer is followed by batch normalization and dropout of 0.1 . After the convolution layer is Max Pooling layer.

3) **Output Block**: It consist of 2 convolution layer each of 16 and 32 filters respectively. Then a 1x1 convolution to reduce the number of channels to 10 followed by another 3x3 convolution of 10 filters.

In [0]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        # block 1
        self.conv1 = nn.Conv2d(1, 8, 3, padding=1,bias=False)   #Input : 28*28*1 - Output : 28*28*16 -  RF : 3*3
        self.bn1 = nn.BatchNorm2d(8)
        self.dropout1 = nn.Dropout(0.1)
        self.conv2 = nn.Conv2d(8, 8, 3, padding=1,bias=False)  #Input : 28*28*16 - Output : 28*28*16 -  RF : 5*5
        self.bn2 = nn.BatchNorm2d(8)
        self.dropout2 = nn.Dropout(0.1)
        self.pool1 = nn.MaxPool2d(2, 2)                          #Input : 28*28*16 - Output : 14*14*16 -  RF : 10*10

        # block 2
        self.conv3 = nn.Conv2d(8, 16, 3, padding=1,bias=False)  #Input : 14*14*16 - Output : 14*14*32 -  RF : 12*12
        self.bn3 = nn.BatchNorm2d(16)
        self.dropout3 = nn.Dropout(0.1)
        self.conv4 = nn.Conv2d(16, 16, 3, padding=1,bias=False)  #Input : 14*14*32 - Output : 14*14*32 -  RF : 15*15
        self.bn4 = nn.BatchNorm2d(16)
        self.dropout4 = nn.Dropout(0.1)
        self.pool2 = nn.MaxPool2d(2, 2)                          #Input : 14*14*32 - Output : 7*7*32 -  RF : 30*30

        # Output block
        self.conv5 = nn.Conv2d(16, 16, 3,bias=False)             #Input : 7*7*32 - Output : 5*5*64 -  RF : 32*32
        self.bn5 = nn.BatchNorm2d(16)
        self.conv6 = nn.Conv2d(16, 32, 3,bias=False)             #Input : 5*5*64 - Output : 3*3*128 -  RF : 34*34
        self.conv7 = nn.Conv2d(32, 10, 1,bias=False)
        
        # Output Layer
        self.conv8 = nn.Conv2d(10, 10, 3,bias=False)           #Input : 3*3*128 - Output : 1*1*10 -  RF : 36*36

    def forward(self, x):
        x = self.pool1(self.dropout2(self.bn2(F.relu(self.conv2(self.dropout1(self.bn1(F.relu(self.conv1(x)))))))))
        x = self.pool2(self.dropout4(self.bn4(F.relu(self.conv4(self.dropout3(self.bn3(F.relu(self.conv3(x)))))))))
        x = F.relu(self.conv6(self.bn5(F.relu(self.conv5(x))))) #(self.dropout5(self.bn5(F.relu(self.conv5(x))))))
        x = self.conv8(self.conv7(x))
        x = x.view(-1, 10)
        return F.log_softmax(x)

#### Steps taken to arrive at the final model
---

1) Using the base model, I removed relu from the output layer because it was preventing negative values from reaching the prediction layer. This improved the accuracy to around 96%.

2) Then I reduced the number of filters and tried with different numbers until I brought down the number of parameters below 20k. While doing this, I tried to maintain the framework of 2 blocks of 2 convolution layer each. I got accuracy of around 98.6% and was able to bring the number of parameters to around 12k.

3) To further increase the accuracy, I used data augmentation which enhanced the validation accuracy to around 99.34% .

4) I noticed that there was quite some difference between train and test loss. So, I used dropout with 0.1 and batch normalization. Finally, I got accuracy of 99.48%.



### View model summary
---

In [0]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              72
       BatchNorm2d-2            [-1, 8, 28, 28]              16
           Dropout-3            [-1, 8, 28, 28]               0
            Conv2d-4            [-1, 8, 28, 28]             576
       BatchNorm2d-5            [-1, 8, 28, 28]              16
           Dropout-6            [-1, 8, 28, 28]               0
         MaxPool2d-7            [-1, 8, 14, 14]               0
            Conv2d-8           [-1, 16, 14, 14]           1,152
       BatchNorm2d-9           [-1, 16, 14, 14]              32
          Dropout-10           [-1, 16, 14, 14]               0
           Conv2d-11           [-1, 16, 14, 14]           2,304
      BatchNorm2d-12           [-1, 16, 14, 14]              32
          Dropout-13           [-1, 16, 14, 14]               0
        MaxPool2d-14             [-1, 1



## Create Data Loaders with Augmentation
---

In [0]:
# set seed
torch.manual_seed(1)
batch_size = 128

# creating train data loader with transformations
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        #transforms.ToPILImage(),
                        transforms.RandomAffine(degrees=20, translate=(0.1,0.1), scale=(0.9, 1.1)),
                        transforms.ColorJitter(brightness=0.2, contrast=0.2),
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)

# creating test data loader with transformations
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw
Processing...
Done!


In [0]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        pbar.set_description(desc= f'Epoch={epoch} loss={loss.item()} batch_id={batch_idx}')


def test(epoch, model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.02f}%)\n'.format(#epoch,
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

## Training and Testing of the model
---

In [0]:
# Instantiate the model
model = Net().to(device)

# Define the optimizer
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

epochs = 20

for epoch in range(1, epochs+1):
    print(f'\nEpoch {epoch}/{epochs}')
    train(model, device, train_loader, optimizer, epoch)
    test(epoch, model, device, test_loader)

  0%|          | 0/469 [00:00<?, ?it/s]


Epoch 1/20


Epoch=1 loss=0.23246441781520844 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 17.02it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0646, Accuracy: 9789/10000 (97.89%)


Epoch 2/20


Epoch=2 loss=0.08990276604890823 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.47it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0331, Accuracy: 9900/10000 (99.00%)


Epoch 3/20


Epoch=3 loss=0.13187465071678162 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.38it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0301, Accuracy: 9901/10000 (99.01%)


Epoch 4/20


Epoch=4 loss=0.10323920845985413 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.07it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0278, Accuracy: 9904/10000 (99.04%)


Epoch 5/20


Epoch=5 loss=0.07981861382722855 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.75it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0265, Accuracy: 9914/10000 (99.14%)


Epoch 6/20


Epoch=6 loss=0.2029215544462204 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.38it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0242, Accuracy: 9924/10000 (99.24%)


Epoch 7/20


Epoch=7 loss=0.07452548295259476 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.62it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0208, Accuracy: 9928/10000 (99.28%)


Epoch 8/20


Epoch=8 loss=0.04572306200861931 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 17.31it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0244, Accuracy: 9929/10000 (99.29%)


Epoch 9/20


Epoch=9 loss=0.03571651875972748 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.65it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0225, Accuracy: 9933/10000 (99.33%)


Epoch 10/20


Epoch=10 loss=0.05741700530052185 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 16.98it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0234, Accuracy: 9924/10000 (99.24%)


Epoch 11/20


Epoch=11 loss=0.028517575934529305 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.42it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0243, Accuracy: 9923/10000 (99.23%)


Epoch 12/20


Epoch=12 loss=0.07614418119192123 batch_id=468: 100%|██████████| 469/469 [00:28<00:00, 16.66it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0209, Accuracy: 9925/10000 (99.25%)


Epoch 13/20


Epoch=13 loss=0.054434746503829956 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.48it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0192, Accuracy: 9935/10000 (99.35%)


Epoch 14/20


Epoch=14 loss=0.05982019379734993 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 16.75it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0213, Accuracy: 9934/10000 (99.34%)


Epoch 15/20


Epoch=15 loss=0.030997460708022118 batch_id=468: 100%|██████████| 469/469 [00:26<00:00, 17.51it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0195, Accuracy: 9932/10000 (99.32%)


Epoch 16/20


Epoch=16 loss=0.11825542896986008 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 14.40it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0183, Accuracy: 9945/10000 (99.45%)


Epoch 17/20


Epoch=17 loss=0.0329522080719471 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 17.34it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0163, Accuracy: 9950/10000 (99.50%)


Epoch 18/20


Epoch=18 loss=0.03957851231098175 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 17.04it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0187, Accuracy: 9939/10000 (99.39%)


Epoch 19/20


Epoch=19 loss=0.016662806272506714 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 17.16it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0190, Accuracy: 9943/10000 (99.43%)


Epoch 20/20


Epoch=20 loss=0.03207915648818016 batch_id=468: 100%|██████████| 469/469 [00:27<00:00, 17.14it/s]



Test set: Average loss: 0.0163, Accuracy: 9948/10000 (99.48%)

