Target:

Creation of MNIST Basic Neural Network model without adding any regularization methods.

Results:
Basic model is overfitted and hence we will have to apply regularization and model opimization methods to improve the test accuracy.


*   Parameters: 13,160
*   Best Train Accuracy: 99.27%
*   Best Test Accuracy: 98.55%

Analysis:


# Import Libraries

Let's first import all the necessary libraries

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

# Let's visualize some of the images
%matplotlib inline
import matplotlib.pyplot as plt

## Defining the Model




In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()


        #Block 1
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 8, 3,padding=0,bias=False),  # 28x28 output 28x28 RF : 3x3
            nn.ReLU(),

            nn.Conv2d(8, 16, 3,padding=0,bias=False), # 28x28 output 28x28 RF : 5x5
            nn.ReLU(),

            nn.Conv2d(16, 16, 3,padding=0, bias=False), # 9x9 output - 7x7 RF 20x20
            nn.ReLU(),
        
        )

        #Transition Block (MaxPool + 1x1)
        self.trans1 = nn.Sequential(

            # 1x1 convolution
            nn.Conv2d(16, 8, 1,bias=False), # 26x26 output - 26x26 RF 14x14
            nn.ReLU(),

            nn.MaxPool2d(2, 2),  # 26x26 output - 13x13 RF 14x14

        )

        #Block 2
        self.conv2 =  nn.Sequential(

            nn.Conv2d(8, 16, 3,padding=0, bias=False), # 13x13 output - 11x11 RF 16x16
            nn.ReLU(),

            nn.Conv2d(16, 16, 3,padding=0, bias=False),  # 11x11 output - 9x9 RF 18x18
            nn.ReLU(),

            nn.Conv2d(16, 16, 3,padding=0, bias=False), # 9x9 output - 7x7 RF 20x20
            nn.ReLU(),

            nn.Conv2d(16, 16, 3,padding=0, bias=False), # 9x9 output - 7x7 RF 20x20
            nn.ReLU(),

            nn.Conv2d(16, 10, 3,padding=0, bias=False), # 9x9 output - 7x7 RF 20x20
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.trans1(x)
        x = self.conv2(x)
        x = x.view(-1,10)

        return F.log_softmax(x,dim=1)

In [3]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 26, 26]              72
              ReLU-2            [-1, 8, 26, 26]               0
            Conv2d-3           [-1, 16, 24, 24]           1,152
              ReLU-4           [-1, 16, 24, 24]               0
            Conv2d-5           [-1, 16, 22, 22]           2,304
              ReLU-6           [-1, 16, 22, 22]               0
            Conv2d-7            [-1, 8, 22, 22]             128
              ReLU-8            [-1, 8, 22, 22]               0
         MaxPool2d-9            [-1, 8, 11, 11]               0
           Conv2d-10             [-1, 16, 9, 9]           1,152
             ReLU-11             [-1, 16, 9, 9]               0
           Conv2d-12             [-1, 16, 7, 7]           2,304
             ReLU-13             [-1, 16, 7, 7]               0
           Conv2d-14             [-1, 1

## The Model

In [4]:
model.eval()

Net(
  (conv1): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): ReLU()
    (2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (3): ReLU()
    (4): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (5): ReLU()
  )
  (trans1): Sequential(
    (0): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): ReLU()
    (2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (3): ReLU()
    (4): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (5): ReLU()
    (6): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (7): ReLU()
    (8): Conv2d(16, 10, kernel_size=(3, 3), stride=(1, 1), bias=False)
  )
)

## Model Parameters

In [5]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f'The model has {count_parameters(model):,} trainable parameters')

The model has 13,160 trainable parameters


## Load and Prepare Dataset

MNIST contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing. The images are grayscale, 28x28 pixels

We load the PIL images using torchvision.datasets.MNIST, while loading the image we transform he data to tensor and normalize the images with mean and std deviation of MNIST images.

In [6]:
torch.manual_seed(1)
batch_size = 128

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ]))

test = datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ]))
train_loader = torch.utils.data.DataLoader(train, batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(test, batch_size=batch_size, shuffle=False, **kwargs)


## Training and Testing Loop

In [9]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    epoch_loss=0
    correct = 0
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        epoch_loss += loss.item()
        loss.backward()
        optimizer.step()

        pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

        pbar.set_description(desc= f'epoch={epoch} Loss={loss.item()} batch_id={batch_idx:05d}')


    train_loss = epoch_loss / len(train_loader.dataset)

    print('Train set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        train_loss, correct, len(train_loader.dataset),
        100. * correct / len(train_loader.dataset)))
    train_acc=100.*correct/len(train_loader.dataset)
    return train_loss,train_acc


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    test_acc=100. * correct / len(test_loader.dataset)
    return test_loss,test_acc

## Let's write train and test the model

In [10]:
model =  Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.015, momentum=0.9)
EPOCHS = 15
for epoch in range(EPOCHS):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

  0%|          | 0/469 [00:00<?, ?it/s]

EPOCH: 0


epoch=0 Loss=0.40941500663757324 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.54it/s]

Train set: Average loss: 0.0115, Accuracy: 28492/60000 (47.49%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.2381, Accuracy: 9263/10000 (92.63%)

EPOCH: 1


epoch=1 Loss=0.0890578106045723 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 41.17it/s]

Train set: Average loss: 0.0011, Accuracy: 57372/60000 (95.62%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0919, Accuracy: 9708/10000 (97.08%)

EPOCH: 2


epoch=2 Loss=0.02848839946091175 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.57it/s]

Train set: Average loss: 0.0007, Accuracy: 58425/60000 (97.38%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0626, Accuracy: 9804/10000 (98.04%)

EPOCH: 3


epoch=3 Loss=0.0331714004278183 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.67it/s]

Train set: Average loss: 0.0005, Accuracy: 58811/60000 (98.02%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0484, Accuracy: 9851/10000 (98.51%)

EPOCH: 4


epoch=4 Loss=0.02803972363471985 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 41.31it/s]

Train set: Average loss: 0.0004, Accuracy: 59004/60000 (98.34%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0549, Accuracy: 9829/10000 (98.29%)

EPOCH: 5


epoch=5 Loss=0.029558664187788963 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.41it/s]

Train set: Average loss: 0.0004, Accuracy: 59117/60000 (98.53%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0449, Accuracy: 9855/10000 (98.55%)

EPOCH: 6


epoch=6 Loss=0.043841246515512466 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.71it/s]

Train set: Average loss: 0.0003, Accuracy: 59192/60000 (98.65%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0413, Accuracy: 9880/10000 (98.80%)

EPOCH: 7


epoch=7 Loss=0.04157417640089989 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.43it/s]

Train set: Average loss: 0.0003, Accuracy: 59280/60000 (98.80%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0433, Accuracy: 9864/10000 (98.64%)

EPOCH: 8


epoch=8 Loss=0.0003563484351616353 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.85it/s]

Train set: Average loss: 0.0003, Accuracy: 59315/60000 (98.86%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0498, Accuracy: 9855/10000 (98.55%)

EPOCH: 9


epoch=9 Loss=0.007792545482516289 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 41.44it/s]

Train set: Average loss: 0.0003, Accuracy: 59369/60000 (98.95%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0490, Accuracy: 9849/10000 (98.49%)

EPOCH: 10


epoch=10 Loss=0.00971674919128418 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.96it/s]

Train set: Average loss: 0.0002, Accuracy: 59403/60000 (99.00%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0328, Accuracy: 9900/10000 (99.00%)

EPOCH: 11


epoch=11 Loss=0.0062074740417301655 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 41.07it/s]

Train set: Average loss: 0.0002, Accuracy: 59450/60000 (99.08%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0391, Accuracy: 9876/10000 (98.76%)

EPOCH: 12


epoch=12 Loss=0.021829964593052864 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.22it/s]

Train set: Average loss: 0.0002, Accuracy: 59513/60000 (99.19%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0357, Accuracy: 9883/10000 (98.83%)

EPOCH: 13


epoch=13 Loss=0.003929933067411184 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.88it/s]

Train set: Average loss: 0.0002, Accuracy: 59551/60000 (99.25%)




  0%|          | 0/469 [00:00<?, ?it/s]

Test set: Average loss: 0.0430, Accuracy: 9861/10000 (98.61%)

EPOCH: 14


epoch=14 Loss=0.052546340972185135 batch_id=00468: 100%|██████████| 469/469 [00:11<00:00, 40.49it/s]

Train set: Average loss: 0.0002, Accuracy: 59561/60000 (99.27%)






Test set: Average loss: 0.0481, Accuracy: 9855/10000 (98.55%)



## Observation:



