<a href="https://colab.research.google.com/github/rssubramaniyan1/EVA8/blob/main/EVA_8_Assignment_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Problem Statement**

In this assignment with 3 attempts and less than 10k parameters achieve 99.4% accuracy

**Each of the 3 attempts are clearly defined below blocks**

> The test accuracy of 99.4% is met using 10114 paremeters in attempt 2 without GAP/FC 

> Attempt 1 clearly set up the architechture with ~8k parameters and high accuracy of 99.35% without GAP/FC

>Attempt 3 was fun with ~6k parameters with GAP/FC and accuracy of 99.26%

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchsummary import summary
from torch.optim import lr_scheduler


In [2]:

torch.manual_seed(1)
batch_size = 64
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.RandomRotation((-7.0, 7.0), fill=(1,)),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)

from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}')


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.4f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



#**Attempt 1**:

The objective is to set up a network that is a base to achieve the objective of 99.4% accuracy within 15 epochs. In this attempt 1 the goal is not to hit 99.4 but get as close as possible to desired target with architechture using least number of parameters with scope for optimization in the subsequent two attempts.

**Input Data**

Includes normalization and random roatation of (+/- 7deg)

**Architechture**:



> **4 convolution Blocks , 2 max pool, NO GAP, NO FC**

>> **Convolution Block 1** - Two convolutions with batch norm and drop out at 10%

>> **Max pool layer 1**

>> **Convolution Block 2** : Two convolution Layers with 

>> 1.   First layer a 1-d convolution with ReLu, batchnorm and dropout (0.1)
>> 2.   Second layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)



>> **Max pool layer 2**

>> **Convolution Block 3** - Two convolution Layers with 

>> 1.   First layer a 1-d convolution with ReLu, batchnorm and dropout (0.1)
>> 2.   Second layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)

>> **Convolution Block 4** (Output Layer) - 

>> 1.   Output of 10 using kernel size = 5

**TOTAL NUMBER OF PARAMETERS**

>> **Total params: 8,010**


**RESULTS**


1.   Test Accuracy consistently greater than 99.20 from the epoch no.5
2.   Highest accuracy 99.35 epoch no.13


**Learnings for next attempt**


1.   Increase the parameters in convolution blocks 1-3 by adding 1 additional convolution with each block having one 1-D convolution and two 2-D convolution
2.   Shift the Max Pool Layer
3. No GAP or FC layer















In [9]:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 8, 3, padding=1), #input -? OUtput? RF
            nn.ReLU(),
            nn.BatchNorm2d(8),
            nn.Dropout(0.1),

            nn.Conv2d(8, 16, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(0.1),
        ) # output_size = 28, RF = 3
        self.pool1 = nn.MaxPool2d(2, 2) # output_size = 14, RF = 4
        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 8, 1),
            nn.ReLU(),
            nn.BatchNorm2d(8),
            nn.Dropout(0.1),

            nn.Conv2d(8, 16, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(0.1),
        ) # output_size = 14, RF = 12
        self.pool2 = nn.MaxPool2d(2, 2) # output_size = 7, RF = 14
        self.conv3 = nn.Sequential(
            nn.Conv2d(16, 8, 1),
            nn.ReLU(),
            nn.BatchNorm2d(8),
            nn.Dropout(0.1),

            nn.Conv2d(8, 16, 3),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(0.1),
        ) # output_size = 5, RF = 30
        self.conv4 = nn.Sequential(
            nn.Conv2d(16, 10, 5),
        ) # output_size = 1, RF = 62

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.pool2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = x.view(-1, 10)
        return F.log_softmax(x,dim=1)

model = Net().to(device)
summary(model, input_size=(1, 28, 28))


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              80
              ReLU-2            [-1, 8, 28, 28]               0
       BatchNorm2d-3            [-1, 8, 28, 28]              16
           Dropout-4            [-1, 8, 28, 28]               0
            Conv2d-5           [-1, 16, 28, 28]           1,168
              ReLU-6           [-1, 16, 28, 28]               0
       BatchNorm2d-7           [-1, 16, 28, 28]              32
           Dropout-8           [-1, 16, 28, 28]               0
         MaxPool2d-9           [-1, 16, 14, 14]               0
           Conv2d-10            [-1, 8, 14, 14]             136
             ReLU-11            [-1, 8, 14, 14]               0
      BatchNorm2d-12            [-1, 8, 14, 14]              16
          Dropout-13            [-1, 8, 14, 14]               0
           Conv2d-14           [-1, 16,

In [10]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 15):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)



EPOCH: 1


loss=0.010431552305817604 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.79it/s]



Test set: Average loss: 0.0469, Accuracy: 9851/10000 (98.5100%)

EPOCH: 2


loss=0.12349642068147659 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.04it/s]



Test set: Average loss: 0.0467, Accuracy: 9850/10000 (98.5000%)

EPOCH: 3


loss=0.040266796946525574 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.76it/s]



Test set: Average loss: 0.0372, Accuracy: 9873/10000 (98.7300%)

EPOCH: 4


loss=0.009504757821559906 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.60it/s]



Test set: Average loss: 0.0294, Accuracy: 9904/10000 (99.0400%)

EPOCH: 5


loss=0.038410209119319916 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.50it/s]



Test set: Average loss: 0.0268, Accuracy: 9921/10000 (99.2100%)

EPOCH: 6


loss=0.007584562059491873 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.25it/s]



Test set: Average loss: 0.0243, Accuracy: 9925/10000 (99.2500%)

EPOCH: 7


loss=0.009892045520246029 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.43it/s]



Test set: Average loss: 0.0254, Accuracy: 9925/10000 (99.2500%)

EPOCH: 8


loss=0.0025617980863898993 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.50it/s]



Test set: Average loss: 0.0250, Accuracy: 9928/10000 (99.2800%)

EPOCH: 9


loss=0.05960209667682648 batch_id=937: 100%|██████████| 938/938 [00:40<00:00, 23.07it/s]



Test set: Average loss: 0.0217, Accuracy: 9930/10000 (99.3000%)

EPOCH: 10


loss=0.06370197236537933 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.47it/s]



Test set: Average loss: 0.0219, Accuracy: 9924/10000 (99.2400%)

EPOCH: 11


loss=0.225072979927063 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.65it/s]



Test set: Average loss: 0.0228, Accuracy: 9929/10000 (99.2900%)

EPOCH: 12


loss=0.06641525775194168 batch_id=937: 100%|██████████| 938/938 [00:40<00:00, 23.05it/s]



Test set: Average loss: 0.0251, Accuracy: 9920/10000 (99.2000%)

EPOCH: 13


loss=0.028001688420772552 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.59it/s]



Test set: Average loss: 0.0212, Accuracy: 9935/10000 (99.3500%)

EPOCH: 14


loss=0.061014123260974884 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.22it/s]



Test set: Average loss: 0.0220, Accuracy: 9930/10000 (99.3000%)



#**Attempt 2**

The objective is to set up a network based on learnings from attempt 1 along with some additional tweaks to ensure the accuracy target of 99.40% is met.

**The new architechture has consistent accuracy > 99% right from epoch 1 and hit 99.44% in epoch 12.** **Objective of assignment met in attempt2**

Introduced scheduler and removed dropdown/batch norm from the first conv block as part of the tweaks to improve the accuracy in this attempt


**Input Data**

Includes normalization and random roatation of (+/- 7deg)

**Architechture:**

>**4 convolution Blocks , 2 max pool, NO GAP, NO FC**

>**Convolution Block 1:**  Three convolutions with no batch norm and drop out at 10%

>**Max pool layer 1**

>**Convolution Block 2:**  Three convolution Layers with

>>1. First layer a 1-d convolution with ReLu, batchnorm and dropout (0.1)
>>2. Second layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)
>>3. Third layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)

>**Max pool layer 2**

>**Convolution Block 3:** Two convolution Layers with

First layer a 1-d convolution with ReLu, batchnorm and dropout (0.1)
Second layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)

>**Convolution Block 4 (Output Layer)** 

Output of 10 using kernel size = 5

**TOTAL NUMBER OF PARAMETERS**

Total params: 10,114

**RESULTS**

Test Accuracy consistently greater than 99% from the epoch no.1
Highest accuracy 99.44 epoch no.12

**Learnings for next attempt**

> 1. What is the highest possible accuracy that can be attained with lowest parameters
>2. Try to drop 50% of the parameters from attempt 2 and check for highest accuracy achieved

In [17]:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 10, 3, padding=1),  # input -? OUtput? RF
            nn.ReLU(),
            

            nn.Conv2d(10, 16, 3, padding=1),
            nn.ReLU(),
            

            nn.Conv2d(16, 16, 3, padding=1),
            nn.ReLU(),
           
        )

        self.pool1 = nn.MaxPool2d(2, 2)

        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 10, 1),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1),

            nn.Conv2d(10, 10, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1),

            nn.Conv2d(10, 16, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(0.1),
        )

        self.pool2 = nn.MaxPool2d(2, 2)  # output_size = 7, RF = 14

        self.conv3 = nn.Sequential(
            nn.Conv2d(16, 10, 1),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1),

            nn.Conv2d(10, 10, 3, padding=0),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1),
        )
        self.conv4 = nn.Sequential(
            nn.Conv2d(10, 10, 5, padding=0),
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.pool2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = x.view(-1, 10)
        return F.log_softmax(x, dim=1)


model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]             100
              ReLU-2           [-1, 10, 28, 28]               0
            Conv2d-3           [-1, 16, 28, 28]           1,456
              ReLU-4           [-1, 16, 28, 28]               0
            Conv2d-5           [-1, 16, 28, 28]           2,320
              ReLU-6           [-1, 16, 28, 28]               0
         MaxPool2d-7           [-1, 16, 14, 14]               0
            Conv2d-8           [-1, 10, 14, 14]             170
              ReLU-9           [-1, 10, 14, 14]               0
      BatchNorm2d-10           [-1, 10, 14, 14]              20
          Dropout-11           [-1, 10, 14, 14]               0
           Conv2d-12           [-1, 10, 14, 14]             910
             ReLU-13           [-1, 10, 14, 14]               0
      BatchNorm2d-14           [-1, 10,

In [19]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)


scheduler = lr_scheduler.StepLR(optimizer, step_size=16, gamma=0.1)

for epoch in range(1, 15):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)
    scheduler.step()

EPOCH: 1


loss=0.002721893135458231 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 21.91it/s]



Test set: Average loss: 0.0188, Accuracy: 9933/10000 (99.3300%)

EPOCH: 2


loss=0.01180062722414732 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.32it/s]



Test set: Average loss: 0.0204, Accuracy: 9938/10000 (99.3800%)

EPOCH: 3


loss=0.007989857345819473 batch_id=937: 100%|██████████| 938/938 [00:43<00:00, 21.43it/s]



Test set: Average loss: 0.0240, Accuracy: 9919/10000 (99.1900%)

EPOCH: 4


loss=0.009211044758558273 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.26it/s]



Test set: Average loss: 0.0216, Accuracy: 9925/10000 (99.2500%)

EPOCH: 5


loss=0.018710672855377197 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 21.98it/s]



Test set: Average loss: 0.0182, Accuracy: 9939/10000 (99.3900%)

EPOCH: 6


loss=0.0380767323076725 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.11it/s]



Test set: Average loss: 0.0206, Accuracy: 9935/10000 (99.3500%)

EPOCH: 7


loss=0.004930506460368633 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 21.82it/s]



Test set: Average loss: 0.0181, Accuracy: 9938/10000 (99.3800%)

EPOCH: 8


loss=0.09994055330753326 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 21.97it/s]



Test set: Average loss: 0.0202, Accuracy: 9937/10000 (99.3700%)

EPOCH: 9


loss=0.0036679140757769346 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.28it/s]



Test set: Average loss: 0.0186, Accuracy: 9938/10000 (99.3800%)

EPOCH: 10


loss=0.09479501843452454 batch_id=937: 100%|██████████| 938/938 [00:41<00:00, 22.35it/s]



Test set: Average loss: 0.0184, Accuracy: 9935/10000 (99.3500%)

EPOCH: 11


loss=0.0029055357445031404 batch_id=937: 100%|██████████| 938/938 [00:43<00:00, 21.42it/s]



Test set: Average loss: 0.0190, Accuracy: 9932/10000 (99.3200%)

EPOCH: 12


loss=0.0197509303689003 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.22it/s]



Test set: Average loss: 0.0163, Accuracy: 9944/10000 (99.4400%)

EPOCH: 13


loss=0.005630031228065491 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.20it/s]



Test set: Average loss: 0.0200, Accuracy: 9930/10000 (99.3000%)

EPOCH: 14


loss=0.010522186756134033 batch_id=937: 100%|██████████| 938/938 [00:43<00:00, 21.49it/s]



Test set: Average loss: 0.0199, Accuracy: 9932/10000 (99.3200%)



#**Attempt 3**

The objective is to set up a network based on learnings from attempt 1 and attempt 2 to get the number of parameters < 7k and try and achieve highest possible accuracy

**Removed scheduler and increased LR=0.02 from attempt 2**

**Achieved highest accuracy of 99.26% in epoch 14 with ~6k parameters**

**Input Data**

Includes normalization and random roatation of (+/- 7deg)

**Architechture:**

>**3 convolution Blocks , 2 max pool, 1 GAP, 1 FC**

>**Convolution Block 1:** Three convolutions with no batch norm and drop out at 10%

>**Max pool layer 1**

>**Convolution Block 2:** Three convolution Layers with

>>1. First layer a 1-d convolution with ReLu, batchnorm and dropout (0.1)
>>2. Second layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)
>>3. Third layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)

>**Max pool layer 2**

>**Convolution Block 3:** Two convolution Layers with

>>1. First layer a 1-d convolution with ReLu, batchnorm and dropout (0.1)
>>2. Second layer a 2-d convolution with ReLu, batchnorm and dropout (0.1)

>**Gap Layer with kernel size = 5**

>**FC Layer**

**TOTAL NUMBER OF PARAMETERS**

>Total params: 6,086

**RESULTS**

Test Accuracy consistently greater than 99% from the epoch no.1 Highest accuracy 99.44 epoch no.12

**To discuss forbetter understanding**

>1. what is the impact of the lr rate and step size?
>2. Does the decrease in the number of parameters affect the choice of LR and step size

In [24]:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Sequential(
            nn.Conv2d(1, 10, 3, padding=1),  # input -? OUtput? RF
            nn.ReLU(),


            nn.Conv2d(10, 10, 3, padding=1),
            nn.ReLU(),


            nn.Conv2d(10, 16, 3, padding=1),
            nn.ReLU(),

        )

        self.pool1 = nn.MaxPool2d(2, 2)

        self.conv2 = nn.Sequential(
            nn.Conv2d(16, 8, 1),
            nn.ReLU(),
            nn.BatchNorm2d(8),
            nn.Dropout(0.1),

            nn.Conv2d(8, 10, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1),

            nn.Conv2d(10, 16, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(0.1),
        )

        self.pool2 = nn.MaxPool2d(2, 2)  # output_size = 7, RF = 14

        self.conv3 = nn.Sequential(
            nn.Conv2d(16, 10, 1),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1),

            nn.Conv2d(10, 10, 3, padding=0),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout(0.1)
        )

        self.gap = nn.Sequential(
            nn.AvgPool2d(kernel_size=5)
        )

        self.fc = nn.Linear(10, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.pool2(x)
        x = self.conv3(x)
        x = self.gap(x)
        x = x.view(-1, 10)
        x = self.fc(x)
        return F.log_softmax(x, dim=1)

model = Net().to(device)
summary(model, input_size=(1, 28, 28))


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]             100
              ReLU-2           [-1, 10, 28, 28]               0
            Conv2d-3           [-1, 10, 28, 28]             910
              ReLU-4           [-1, 10, 28, 28]               0
            Conv2d-5           [-1, 16, 28, 28]           1,456
              ReLU-6           [-1, 16, 28, 28]               0
         MaxPool2d-7           [-1, 16, 14, 14]               0
            Conv2d-8            [-1, 8, 14, 14]             136
              ReLU-9            [-1, 8, 14, 14]               0
      BatchNorm2d-10            [-1, 8, 14, 14]              16
          Dropout-11            [-1, 8, 14, 14]               0
           Conv2d-12           [-1, 10, 14, 14]             730
             ReLU-13           [-1, 10, 14, 14]               0
      BatchNorm2d-14           [-1, 10,

In [26]:
optimizer = optim.SGD(model.parameters(), lr=0.02, momentum=0.9)


#scheduler = lr_scheduler.StepLR(optimizer, step_size=16, gamma=0.1)

for epoch in range(1, 15):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)
    #scheduler.step()

EPOCH: 1


loss=0.1724795401096344 batch_id=937: 100%|██████████| 938/938 [00:43<00:00, 21.47it/s]



Test set: Average loss: 0.0371, Accuracy: 9878/10000 (98.7800%)

EPOCH: 2


loss=0.17590738832950592 batch_id=937: 100%|██████████| 938/938 [00:46<00:00, 20.17it/s]



Test set: Average loss: 0.0382, Accuracy: 9872/10000 (98.7200%)

EPOCH: 3


loss=0.07141156494617462 batch_id=937: 100%|██████████| 938/938 [00:43<00:00, 21.36it/s]



Test set: Average loss: 0.0345, Accuracy: 9889/10000 (98.8900%)

EPOCH: 4


loss=0.06804748624563217 batch_id=937: 100%|██████████| 938/938 [00:46<00:00, 20.02it/s]



Test set: Average loss: 0.0428, Accuracy: 9861/10000 (98.6100%)

EPOCH: 5


loss=0.04055921733379364 batch_id=937: 100%|██████████| 938/938 [00:44<00:00, 21.32it/s]



Test set: Average loss: 0.0277, Accuracy: 9906/10000 (99.0600%)

EPOCH: 6


loss=0.14137597382068634 batch_id=937: 100%|██████████| 938/938 [00:43<00:00, 21.43it/s]



Test set: Average loss: 0.0264, Accuracy: 9912/10000 (99.1200%)

EPOCH: 7


loss=0.004985594190657139 batch_id=937: 100%|██████████| 938/938 [00:45<00:00, 20.69it/s]



Test set: Average loss: 0.0293, Accuracy: 9904/10000 (99.0400%)

EPOCH: 8


loss=0.04956419765949249 batch_id=937: 100%|██████████| 938/938 [00:46<00:00, 19.98it/s]



Test set: Average loss: 0.0255, Accuracy: 9912/10000 (99.1200%)

EPOCH: 9


loss=0.03941202908754349 batch_id=937: 100%|██████████| 938/938 [00:45<00:00, 20.45it/s]



Test set: Average loss: 0.0274, Accuracy: 9901/10000 (99.0100%)

EPOCH: 10


loss=0.22508695721626282 batch_id=937: 100%|██████████| 938/938 [00:46<00:00, 20.19it/s]



Test set: Average loss: 0.0278, Accuracy: 9901/10000 (99.0100%)

EPOCH: 11


loss=0.06795432418584824 batch_id=937: 100%|██████████| 938/938 [00:48<00:00, 19.15it/s]



Test set: Average loss: 0.0263, Accuracy: 9905/10000 (99.0500%)

EPOCH: 12


loss=0.05637720972299576 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 21.98it/s]



Test set: Average loss: 0.0284, Accuracy: 9907/10000 (99.0700%)

EPOCH: 13


loss=0.004001237917691469 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.14it/s]



Test set: Average loss: 0.0239, Accuracy: 9917/10000 (99.1700%)

EPOCH: 14


loss=0.005103596951812506 batch_id=937: 100%|██████████| 938/938 [00:42<00:00, 22.10it/s]



Test set: Average loss: 0.0224, Accuracy: 9926/10000 (99.2600%)

