

```
Target:
        2) Try to achive the accuracy consistently in range 99.2-99.3 for atleaset few epochs.
        3) Can go with 20 epochs for that but will see only for 15th epochs.
        4) Will also add GAP to reduce the params further.
Result:
        1) Best Training Accuracy: 99.14
        2) Best Test Accuracy: 99.36
        3) Params: 9k
Analysis:
        1) GAP and BatchNorm only is not helping much with this model as the accuracy was not that good(was aroung 99.1).
        So i added dropout and then accuracy went too good. Dropout made out model rugged as each kernel prediction 
        becomes strong enough and have less dependency if other kernel did not give correct prediction. 
        So will keep it.
        2) Also due to model is small, GAP is not helping much as i can remove this and add few more kernel to make it more efficient.
        3) I should add Data Augmantation and LR to our model to make it more rugged.
```


In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [2]:
class Net(nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.convolution1 = nn.Sequential(
        nn.Conv2d(1, 10, 3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2d(10),
        nn.Conv2d(10, 20, 3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2d(20),
        nn.MaxPool2d(2,2),
        nn.Dropout(0.15)
    )
    self.convolution2 = nn.Sequential(
        nn.Conv2d(20, 16, 3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2d(16),
        nn.Conv2d(16, 12, 3, padding=1),
        nn.ReLU(),
        nn.BatchNorm2d(12),
        nn.MaxPool2d(2,2),
        nn.Dropout(0.15)
    )
    self.convolution3 = nn.Sequential(
        nn.Conv2d(12, 12, 3),
        nn.Conv2d(12, 10, 3),
    )
    self.gap = nn.Sequential(
        nn.AvgPool2d(kernel_size=3)
    )
  def forward(self, x):
    x = self.convolution1(x)
    x = self.convolution2(x)
    x = self.convolution3(x)
    x = self.gap(x)
    x = x.view(-1,10)
    return F.log_softmax(x)

In [3]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]             100
              ReLU-2           [-1, 10, 28, 28]               0
       BatchNorm2d-3           [-1, 10, 28, 28]              20
            Conv2d-4           [-1, 20, 28, 28]           1,820
              ReLU-5           [-1, 20, 28, 28]               0
       BatchNorm2d-6           [-1, 20, 28, 28]              40
         MaxPool2d-7           [-1, 20, 14, 14]               0
           Dropout-8           [-1, 20, 14, 14]               0
            Conv2d-9           [-1, 16, 14, 14]           2,896
             ReLU-10           [-1, 16, 14, 14]               0
      BatchNorm2d-11           [-1, 16, 14, 14]              32
           Conv2d-12           [-1, 12, 14, 14]           1,740
             ReLU-13           [-1, 12, 14, 14]               0
      BatchNorm2d-14           [-1, 12,



In [4]:
torch.manual_seed(1)
batch_size=128
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

train_loader = torch.utils.data.DataLoader(
                datasets.MNIST("../data", train=True, download=True,
                               transform = transforms.Compose([
                                            transforms.ToTensor(),
                                            transforms.Normalize((0.1307,), (0.3081,))
                               ])),
                batch_size=batch_size, shuffle=True, **kwargs
              )
test_loader = torch.utils.data.DataLoader(
              datasets.MNIST("../data", train=False, download=True,
                             transform = transforms.Compose([
                                          transforms.ToTensor(),
                                          transforms.Normalize((0.1307,),(0.3081,))
                             ])),
              batch_size=batch_size, shuffle=True, **kwargs
              )

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



In [5]:
from tqdm import tqdm
train_loss = []
train_acc = []
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    correct=0
    processed=0
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        train_loss.append(loss)
        loss.backward()
        optimizer.step()
        pred = output.argmax(dim=1, keepdim=True)
        correct += pred.eq(target.view_as(pred)).sum().item()
        processed += len(data)

        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx} Accuracy={100*correct/processed}')
        train_acc.append(100*correct/processed)

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [6]:
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(0, 15):
  print (f"epoch : {epoch}")
  train(model, device, train_loader, optimizer, epoch)
  test(model, device, test_loader)

epoch : 0


loss=0.06128153204917908 batch_id=468 Accuracy=91.99: 100%|██████████| 469/469 [01:29<00:00,  5.26it/s]



Test set: Average loss: 0.0645, Accuracy: 9789/10000 (98%)

epoch : 1


loss=0.08107100427150726 batch_id=468 Accuracy=97.725: 100%|██████████| 469/469 [01:23<00:00,  5.58it/s]



Test set: Average loss: 0.0407, Accuracy: 9859/10000 (99%)

epoch : 2


loss=0.06796777248382568 batch_id=468 Accuracy=98.27166666666666: 100%|██████████| 469/469 [01:24<00:00,  5.57it/s]



Test set: Average loss: 0.0340, Accuracy: 9888/10000 (99%)

epoch : 3


loss=0.012733998708426952 batch_id=468 Accuracy=98.41166666666666: 100%|██████████| 469/469 [01:23<00:00,  5.65it/s]



Test set: Average loss: 0.0331, Accuracy: 9894/10000 (99%)

epoch : 4


loss=0.01742406375706196 batch_id=468 Accuracy=98.60333333333334: 100%|██████████| 469/469 [01:22<00:00,  5.68it/s]



Test set: Average loss: 0.0268, Accuracy: 9916/10000 (99%)

epoch : 5


loss=0.10884016007184982 batch_id=468 Accuracy=98.67666666666666: 100%|██████████| 469/469 [01:23<00:00,  5.62it/s]



Test set: Average loss: 0.0271, Accuracy: 9913/10000 (99%)

epoch : 6


loss=0.0036987075582146645 batch_id=468 Accuracy=98.835: 100%|██████████| 469/469 [01:23<00:00,  5.62it/s]



Test set: Average loss: 0.0294, Accuracy: 9904/10000 (99%)

epoch : 7


loss=0.014194909483194351 batch_id=468 Accuracy=98.85666666666667: 100%|██████████| 469/469 [01:23<00:00,  5.60it/s]



Test set: Average loss: 0.0243, Accuracy: 9925/10000 (99%)

epoch : 8


loss=0.004705922212451696 batch_id=468 Accuracy=98.95166666666667: 100%|██████████| 469/469 [01:23<00:00,  5.63it/s]



Test set: Average loss: 0.0255, Accuracy: 9921/10000 (99%)

epoch : 9


loss=0.11769477277994156 batch_id=468 Accuracy=99.03: 100%|██████████| 469/469 [01:23<00:00,  5.64it/s]



Test set: Average loss: 0.0231, Accuracy: 9925/10000 (99%)

epoch : 10


loss=0.02261842042207718 batch_id=468 Accuracy=98.97333333333333: 100%|██████████| 469/469 [01:23<00:00,  5.62it/s]



Test set: Average loss: 0.0221, Accuracy: 9928/10000 (99%)

epoch : 11


loss=0.012839148752391338 batch_id=468 Accuracy=99.07166666666667: 100%|██████████| 469/469 [01:23<00:00,  5.60it/s]



Test set: Average loss: 0.0207, Accuracy: 9931/10000 (99%)

epoch : 12


loss=0.023574301972985268 batch_id=468 Accuracy=99.08666666666667: 100%|██████████| 469/469 [01:23<00:00,  5.62it/s]



Test set: Average loss: 0.0247, Accuracy: 9923/10000 (99%)

epoch : 13


loss=0.018996460363268852 batch_id=468 Accuracy=99.13166666666666: 100%|██████████| 469/469 [01:23<00:00,  5.64it/s]



Test set: Average loss: 0.0228, Accuracy: 9936/10000 (99%)

epoch : 14


loss=0.001450885902158916 batch_id=468 Accuracy=99.145: 100%|██████████| 469/469 [01:23<00:00,  5.64it/s]



Test set: Average loss: 0.0249, Accuracy: 9917/10000 (99%)

