<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Batch-Normalization-layer" data-toc-modified-id="Batch-Normalization-layer-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Batch Normalization layer</a></span></li><li><span><a href="#LeNet-5-architecture" data-toc-modified-id="LeNet-5-architecture-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>LeNet-5 architecture</a></span></li></ul></div>

In [1]:
## Batch Normalization

In [2]:
import torch.nn as nn
import matplotlib.pyplot as plt
import torch.nn.functional as f
from torch.utils.data import DataLoader,TensorDataset
import torch
import torch.utils.data as Data
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import numpy as np
%matplotlib inline

# Loading the data

In [3]:
# MNIST Dataset
train_dataset = datasets.MNIST(root='../data/',train=True,transform=transforms.ToTensor(),download=True)
test_dataset = datasets.MNIST(root='../data/',train=False,transform=transforms.ToTensor())

In [4]:
batch_size=265
train_loader=DataLoader(dataset=train_dataset,batch_size=batch_size,shuffle=True)
test_loader=DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=True)


## Batch Normalization layer

Batch normalization layer is <b>used before the activation layer </b> (according to the authors' original paper), instead of after activation layer.


## LeNet-5 architecture
we are going to integrate batch normalization into the LeNet-5 architecture displayed below
<img src='../images/lenet5.jpg'>


<i> (source: Hands-On Computer Vision with TensorFlow 2 (Leverage deep learning to create powerful image processing apps with TensorFlow 2.0 and Keras) by Benjamin Planche Eliot Andres page 94)</i>
  

In [7]:
class Lenet(nn.Module):
    def __init__(self):
        super().__init__()
        self.cov1=nn.Conv2d(1,6,kernel_size=5,padding=2)
        self.b2_1=nn.BatchNorm2d(6)
        self.max=nn.MaxPool2d(kernel_size=2,stride=2)
        
        self.cov2=nn.Conv2d(6,16,kernel_size=5,padding=2)
        self.b2_2=nn.BatchNorm2d(16)
        
        self.f=nn.Flatten()
        self.l1=nn.Linear(16*7*7,120)
        self.b1_1=nn.BatchNorm1d(120)
        
        self.l2=nn.Linear(120,84)
        self.b1_2=nn.BatchNorm1d(84)
        
        self.l3=nn.Linear(84,10)
        self.relu=nn.ReLU()
        
        
    def forward(self,x):
        #x=x.reshape(-1,1, 28, 28)
        h1=self.max(self.relu(self.b2_1(self.cov1(x))))
        h2=self.max(self.relu(self.b2_2(self.cov2(h1))))
        f=self.f(h2)   
        h3=self.relu(self.b1_1(self.l1(f)))
        h4=self.relu(self.b1_2(self.l2(h3)))
        output=self.l3(h4)
        return output
lenet=Lenet()

In [8]:
def evaluate_accuracy(net,data_iterator):
    pred_correct = 0
    for  data,label in data_iterator:
        output=net(data)
        pred = output.argmax(dim=1)
        pred_correct += (pred==label).float().sum().item()
        return pred_correct/len(data)

In [9]:
evaluate_accuracy(lenet,test_loader)

0.07547169811320754

In [10]:
loss_fn=torch.nn.CrossEntropyLoss()
opt=torch.optim.SGD(lenet.parameters(),lr=0.01)

In [11]:
num_epochs = 4
for epoch in range(num_epochs+1):
    test_acc,train_acc=0,0
    for X,y in train_loader:
        lenet.train()
        y_hat=lenet(X)
        l=loss_fn(y_hat,y)
        opt.zero_grad() 
        l.backward() 
        opt.step() 
    acc_tr=evaluate_accuracy(lenet,train_loader)
    lenet.eval()
    acc_te=evaluate_accuracy(lenet,test_loader)
    test_acc+=acc_te
    train_acc+=acc_tr
    print('epoch %d, loss %f, train acc %f, test acc %f'%(epoch,l,train_acc,test_acc))

epoch 0, loss 0.354123,train acc 0.947170,test acc 0.966038
epoch 1, loss 0.259338,train acc 0.947170,test acc 0.962264
epoch 2, loss 0.151144,train acc 0.969811,test acc 0.984906
epoch 3, loss 0.080265,train acc 0.992453,test acc 0.984906
epoch 4, loss 0.140203,train acc 0.973585,test acc 0.984906
