<a href="https://colab.research.google.com/github/gkdivya/EVA/blob/main/4_ArchitecturalBasics/Experiments/MNIST_Exp1_WithLessParams.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Experiment - 1

**Objective** : To train MNIST model with less parameters.

[MNIST reference model](https://colab.research.google.com/drive/1uJZvJdi5VprOQHROtJIHy0mnY2afjNlx) is trained with 6,379,786 params wth 98% accuracy in 2 epochs by just removing the relu used before conv7 layer.

In [None]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [None]:
!pip install torchsummary
from torchsummary import summary



In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1) #input -? OUtput? RF
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv4 = nn.Conv2d(128, 256, 3, padding=1)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.conv5 = nn.Conv2d(256, 512, 3)
        self.conv6 = nn.Conv2d(512, 1024, 3)
        self.conv7 = nn.Conv2d(1024, 10, 3)

    def forward(self, x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))
        x = F.relu(self.conv6(F.relu(self.conv5(x))))
        x = self.conv7(x)
        x = x.view(-1, 10)
        return F.log_softmax(x)

In [None]:
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
            Conv2d-2           [-1, 64, 28, 28]          18,496
         MaxPool2d-3           [-1, 64, 14, 14]               0
            Conv2d-4          [-1, 128, 14, 14]          73,856
            Conv2d-5          [-1, 256, 14, 14]         295,168
         MaxPool2d-6            [-1, 256, 7, 7]               0
            Conv2d-7            [-1, 512, 5, 5]       1,180,160
            Conv2d-8           [-1, 1024, 3, 3]       4,719,616
            Conv2d-9             [-1, 10, 1, 1]          92,170
Total params: 6,379,786
Trainable params: 6,379,786
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.51
Params size (MB): 24.34
Estimated Total Size (MB): 25.85
-------------------------------------



### Iteration 1

MNIST dataset images are just 28*28 in size. And to train the MNIST model with 6,379,786 params is just an overkill.

Reducing the number of parameters in model by using Convolution Blocks with less number of output channels (removed 64, 128, 256 and 512 for every layer and used just 10 channels instead). Intuition behing using 10 channels is to represent each number by one.

In [14]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, 3, padding=1)       #Input: 28*28*1   Output:26*26*10 GRF:3*3 
        self.conv2 = nn.Conv2d(10, 10, 3, padding=1)      #Input: 26*26*10  Output:24*24*10 GRF:5*5
        self.pool1 = nn.MaxPool2d(2, 2)                   #Input: 24*24*10  Output:12*12*10 GRF:10*10
        self.conv3 = nn.Conv2d(10, 10, 3, padding=1)      #Input: 28*28*10  Output:26*26*10 GRF:12*12
        self.conv4 = nn.Conv2d(10, 10, 3, padding=1)      #Input: 28*28*10  Output:26*26*10 GRF:14*14
        self.pool2 = nn.MaxPool2d(2, 2)                   #Input: 28*28*10  Output:26*26*10 GRF:28*28
        self.conv5 = nn.Conv2d(10, 10, 3)                 #Input: 28*28*10  Output:26*26*10 GRF:30*30
        self.conv6 = nn.Conv2d(10, 10, 3)                 #Input: 28*28*10  Output:26*26*10 GRF:32*32
        self.conv7 = nn.Conv2d(10, 10, 3)                 #Input: 28*28*10  Output:26*26*10 GRF:34*34

    def forward(self, x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))
        x = F.relu(self.conv6(F.relu(self.conv5(x))))
        x = self.conv7(x)
        x = x.view(-1, 10)
        return F.log_softmax(x)

In [15]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]             100
            Conv2d-2           [-1, 10, 28, 28]             910
         MaxPool2d-3           [-1, 10, 14, 14]               0
            Conv2d-4           [-1, 10, 14, 14]             910
            Conv2d-5           [-1, 10, 14, 14]             910
         MaxPool2d-6             [-1, 10, 7, 7]               0
            Conv2d-7             [-1, 10, 5, 5]             910
            Conv2d-8             [-1, 10, 3, 3]             910
            Conv2d-9             [-1, 10, 1, 1]             910
Total params: 5,560
Trainable params: 5,560
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.17
Params size (MB): 0.02
Estimated Total Size (MB): 0.20
-----------------------------------------------



## Iteration 2

Removing Bias reduced 70 more params from the network

In [20]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, 3, padding=1, bias=False)       #Input: 28*28*1   Output:26*26*10 GRF:3*3 
        self.conv2 = nn.Conv2d(10, 10, 3, padding=1, bias=False)       #Input: 26*26*10  Output:24*24*10 GRF:5*5
        self.pool1 = nn.MaxPool2d(2, 2)                   #Input: 24*24*10  Output:12*12*10 GRF:10*10
        self.conv3 = nn.Conv2d(10, 10, 3, padding=1, bias=False)       #Input: 28*28*10  Output:26*26*10 GRF:12*12
        self.conv4 = nn.Conv2d(10, 10, 3, padding=1, bias=False)       #Input: 28*28*10  Output:26*26*10 GRF:14*14
        self.pool2 = nn.MaxPool2d(2, 2)                    #Input: 28*28*10  Output:26*26*10 GRF:28*28
        self.conv5 = nn.Conv2d(10, 10, 3, bias=False)                  #Input: 28*28*10  Output:26*26*10 GRF:30*30
        self.conv6 = nn.Conv2d(10, 10, 3, bias=False)                  #Input: 28*28*10  Output:26*26*10 GRF:32*32
        self.conv7 = nn.Conv2d(10, 10, 3, bias=False)                  #Input: 28*28*10  Output:26*26*10 GRF:34*34

    def forward(self, x):
        x = self.pool1(F.relu(self.conv2(F.relu(self.conv1(x)))))
        x = self.pool2(F.relu(self.conv4(F.relu(self.conv3(x)))))
        x = F.relu(self.conv6(F.relu(self.conv5(x))))
        x = self.conv7(x)
        x = x.view(-1, 10)
        return F.log_softmax(x)

In [21]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]              90
            Conv2d-2           [-1, 10, 28, 28]             900
         MaxPool2d-3           [-1, 10, 14, 14]               0
            Conv2d-4           [-1, 10, 14, 14]             900
            Conv2d-5           [-1, 10, 14, 14]             900
         MaxPool2d-6             [-1, 10, 7, 7]               0
            Conv2d-7             [-1, 10, 5, 5]             900
            Conv2d-8             [-1, 10, 3, 3]             900
            Conv2d-9             [-1, 10, 1, 1]             900
Total params: 5,490
Trainable params: 5,490
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.17
Params size (MB): 0.02
Estimated Total Size (MB): 0.19
-----------------------------------------------



In [22]:
torch.manual_seed(1)
batch_size = 128

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=9912422.0), HTML(value='')))


Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=28881.0), HTML(value='')))


Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=1648877.0), HTML(value='')))


Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=4542.0), HTML(value='')))


Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw

Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [23]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}')


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [24]:

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 10):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

loss=0.3089008033275604 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.59it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.3309, Accuracy: 8982/10000 (90%)



loss=0.09540484100580215 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 44.62it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.1103, Accuracy: 9661/10000 (97%)



loss=0.028697634115815163 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.45it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0803, Accuracy: 9765/10000 (98%)



loss=0.11284641176462173 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.39it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0614, Accuracy: 9789/10000 (98%)



loss=0.007477357983589172 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.49it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0548, Accuracy: 9815/10000 (98%)



loss=0.06160100921988487 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.23it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0514, Accuracy: 9828/10000 (98%)



loss=0.03173014521598816 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.12it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0477, Accuracy: 9840/10000 (98%)



loss=0.12536950409412384 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.44it/s]
  0%|          | 0/469 [00:00<?, ?it/s]


Test set: Average loss: 0.0449, Accuracy: 9858/10000 (99%)



loss=0.05408497154712677 batch_id=468: 100%|██████████| 469/469 [00:10<00:00, 43.57it/s]



Test set: Average loss: 0.0390, Accuracy: 9865/10000 (99%)



# Summary

Model was trained with normalized input images, reduced channel size and without bias. 

With just **5490 params**, MNIST model is trained to achieve **98.5% accuracy** in just 10 epochs

And the difference between train and validation accuracy is also **0.2**

