### Target:
-  Add Batch-norm to increase model efficiency.

### Results:
-   Parameters: 5,088
-   Best Train Accuracy: 99.02%
-   Best Test Accuracy: 99.03%

### Analysis:
-   There is slight increase in the number of parameters, as batch norm stores a specific mean and std deviation for each layer.
-   Model overfitting problem is rectified to an extent. But, we have not reached the target test accuracy 99.40%.


# Install Libraries


In [None]:
!pip install torchsummary

# Import Libraries

Let's first import all the necessary libraries

In [None]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

from tqdm import tqdm
from torchsummary import summary

# for visualization
%matplotlib inline
import matplotlib.pyplot as plt

from model import Model_3, download_model_data, create_data_loader, train_and_predict
from utils import get_correct_prediction_count, get_device, plot_metrics

# Model_3 - in model.py
 

# Model Summary
 To view and to understand Model Trainable parameteres

In [None]:

device = get_device()
model = Model_3().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 26, 26]              72
       BatchNorm2d-2            [-1, 8, 26, 26]              16
              ReLU-3            [-1, 8, 26, 26]               0
            Conv2d-4           [-1, 16, 24, 24]           1,152
       BatchNorm2d-5           [-1, 16, 24, 24]              32
              ReLU-6           [-1, 16, 24, 24]               0
         MaxPool2d-7           [-1, 16, 12, 12]               0
            Conv2d-8            [-1, 8, 12, 12]             128
       BatchNorm2d-9            [-1, 8, 12, 12]              16
             ReLU-10            [-1, 8, 12, 12]               0
           Conv2d-11           [-1, 10, 10, 10]             720
      BatchNorm2d-12           [-1, 10, 10, 10]              20
             ReLU-13           [-1, 10, 10, 10]               0
           Conv2d-14             [-1, 1

# The Model


In [None]:
model.eval()

Net(
  (conv1): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
  )
  (trans1): Sequential(
    (0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (1): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
    (2): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): ReLU()
  )
  (conv2): Sequential(
    (0): Conv2d(8, 10, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (1): BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(10, 16, kernel_size=(3, 3), stride=(1, 1), bias=False)
    (4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, tr

## Load and Prepare Dataset

MNIST contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing. The images are grayscale, 28x28 pixels

We load the PIL images using torchvision.datasets.MNIST, while loading the image we transform he data to tensor and normalize the images with mean and std deviation of MNIST images.

Data tasks:
- transformers
- data download
- train and test split

In [None]:
torch.manual_seed(1)
batch_size = 128
# CUDA?
use_cuda = torch.cuda.is_available()
print("CUDA Available?", use_cuda)

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
ds_train, ds_test = download_model_data()
train_loader = torch.utils.data.DataLoader(ds_train, batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(ds_test, batch_size=batch_size, shuffle=False, **kwargs)


## Train and test -  Model_3

Let's train and test our model

In [None]:
train_losses = []
test_losses = []
train_acc = []
test_acc = []

# load model to device and start the training 
model =  Model_3().to(device)
epochs = 15
train_losses3, train_acc3, test_losses3, test_acc3 = train_and_predict(model, device,
                                                                   train_loader=train_loader,
                                                                   test_loader=test_loader, 
                                                                   num_epochs=epochs, lr=0.01)

### Plot the train and test losses and accuracies for Model_3

In [None]:
plot_metrics(train_losses3, train_acc3, test_losses3, test_acc3)