<h3 id="case-study-lenet">Case Study: LeNet</h3>

<p>Let’s put everything we’ve learnt together and analyze one of the very early successes<span id="lenet" class="margin-toggle sidenote-number"></span><span class="sidenote">LeNet is published in 1998! CNNs are not exactly new.</span> of convolutional networks: LeNet. This is the <em>architecture</em> of LeNet:</p>

<p><span class="marginnote">
    <strong>Figure</strong>: LeNet architecture
    <a href="http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf">Source</a>.
</span>
<img src="../../images/lenet.png" />

<p>Let’s go over each of the component layers of LeNet:
<span id="lenet" class="margin-toggle sidenote-number"></span>
<span class="sidenote">
    I actually describe slightly modified version of LeNet. 
</span></p>

<ul>
  <li><strong>Input</strong>: Gray scale image of size 32 x 32.</li>
  <li><strong>C1</strong>: Convolutional layer of 6 feature maps, kernel size (5, 5) and stride 1. Output size therefore is 6 X 28 x 28. Number of trainable parameters is $(5*5 + 1) * 6 = 156$.</li>
  <li><strong>S2</strong>: Pooling/subsampling layer with kernel size (2, 2) and stride 2. Output size is 6 x 14 x 14. Number of trainable parameters = 0.</li>
  <li><strong>C3</strong>: Convolutional layer of 16 feature maps. Each feature map is connected to all the 6 feature maps from the previous layer. Kernel size and stride are same as before. Output size is 16 x 10 x 10. Number of trainable parameters is $(6 * 5 * 5 + 1) * 16 = 2416$.</li>
  <li><strong>S4</strong>: Pooling layer with same <em>hyperparameters</em> as above. Output size = 16 x 5 x 5.</li>
  <li><strong>C5</strong>: Convolutional layer of 120 feature maps and kernel size (5, 5). This amounts to <em>full connection</em> with outputs of previous layer. Number of parameters are $(16 * 5 * 5 + 1)*120 = 48120$.</li>
  <li><strong>F6</strong>: <em>Fully connected layer</em> of 84 units. i.e, All units in this layer are connected to previous layer’s outputs<span id="fc" class="margin-toggle sidenote-number"></span><span class="sidenote">This is same as layers in MLP we’ve seen before.</span>. Number of parameters is $(120 + 1)*84 = 10164$</li>
  <li><strong>Output</strong>: Fully connected layer of 10 units with softmax activation<span id="out" class="margin-toggle sidenote-number"></span><span class="sidenote">Ignore ‘Gaussian connections’. It is for a older loss function no longer in use.</span>.</li>
</ul>

<p>Dataset used was MNIST. It has 60,000 training images and 10,000 testing examples.</p>

In [69]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import torchvision

import visdom


In [70]:
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

PyTorch Version:  0.4.1
Torchvision Version:  0.2.1


In [71]:
def conv_out(W, K, S, P):
    return (W - K + 2 * P) / (S) + 1

def poll_out(W, K, S):
    return np.floor((W - K) / (S)) + 1

In [72]:
out = conv_out(32, 5, 1, 0)
out

28.0

In [73]:
out = poll_out(out, 2, 2)
out 

14.0

In [74]:
out = conv_out(14, 5, 1, 0)
out

10.0

In [75]:
out = poll_out(out, 2, 2)
out 

5.0

In [76]:
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()

        self.convnet = nn.Sequential(
                nn.Conv2d(1, 6, kernel_size=5), 
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2),
                nn.Conv2d(6, 16, kernel_size=5),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2),
                nn.Conv2d(16, 120, kernel_size=5),
                nn.ReLU())

        self.fc = nn.Sequential(
                nn.Linear(120, 84),
                nn.ReLU(),
                nn.Linear(84, 10), nn.LogSoftmax(dim=1))

    def forward(self, img):
        output = self.convnet(img)
        output = output.view(-1, 120)
        output = self.fc(output)
        return output

In [77]:
viz = visdom.Visdom()

# Hyper parameters
num_epochs = 5

# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='../../data', train=True, transform=transforms.Compose([
                       transforms.Resize((32, 32)),
                       transforms.ToTensor()]), 
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data', train=False, transform=transforms.Compose([
                       transforms.Resize((32, 32)),
                       transforms.ToTensor()]))

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=100, shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=100, shuffle=False)

In [81]:
net = LeNet5()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=2e-3)

cur_batch_win = None
cur_batch_win_opts = {
    'title': 'Epoch Loss Trace',
    'xlabel': 'Batch Number',
    'ylabel': 'Loss',
    'width': 600,
    'height': 400,
}


def train(epoch):
    global cur_batch_win
    net.train()
    loss_list, batch_list = [], []
    for i, (images, labels) in enumerate(train_loader):
        images, labels = Variable(images), Variable(labels)

        optimizer.zero_grad()

        output = net(images)

        loss = criterion(output, labels)

        loss_list.append(loss.item())
        batch_list.append(i+1)

        if i % 10 == 0:
            print('Train - Epoch %d, Batch: %d, Loss: %f' % (epoch, i, loss.item()))

        # Update Visualization
        if viz.check_connection():
            cur_batch_win = viz.line(torch.FloatTensor(loss_list), torch.FloatTensor(batch_list),
                                     win=cur_batch_win, name='current_batch_loss',
                                     update=(None if cur_batch_win is None else 'replace'),
                                     opts=cur_batch_win_opts)

        loss.backward()
        optimizer.step()


def test():
    net.eval()
    total_correct = 0
    avg_loss = 0.0
    for i, (images, labels) in enumerate(test_loader):
        images, labels = Variable(images), Variable(labels)
        output = net(images)
        avg_loss += criterion(output, labels).sum()
        pred = output.data.max(1)[1]
        total_correct += pred.eq(labels.data.view_as(pred)).sum()

    avg_loss /= len(data_test)
    print('Test Avg. Loss: %f, Accuracy: %f' % (avg_loss.data[0], float(total_correct) / len(test_loader)))


def train_and_test(epoch):
    train(epoch)
    test()


def main():
    for e in range(1, 16):
        train_and_test(e)


if __name__ == '__main__':
    main()

Train - Epoch 1, Batch: 0, Loss: 2.296790
Train - Epoch 1, Batch: 10, Loss: 1.978823
Train - Epoch 1, Batch: 20, Loss: 0.880726
Train - Epoch 1, Batch: 30, Loss: 0.727522
Train - Epoch 1, Batch: 40, Loss: 0.525052
Train - Epoch 1, Batch: 50, Loss: 0.490573
Train - Epoch 1, Batch: 60, Loss: 0.320637
Train - Epoch 1, Batch: 70, Loss: 0.273253
Train - Epoch 1, Batch: 80, Loss: 0.251605
Train - Epoch 1, Batch: 90, Loss: 0.254625
Train - Epoch 1, Batch: 100, Loss: 0.239103
Train - Epoch 1, Batch: 110, Loss: 0.311514
Train - Epoch 1, Batch: 120, Loss: 0.326570
Train - Epoch 1, Batch: 130, Loss: 0.209927
Train - Epoch 1, Batch: 140, Loss: 0.217677
Train - Epoch 1, Batch: 150, Loss: 0.110118
Train - Epoch 1, Batch: 160, Loss: 0.184218
Train - Epoch 1, Batch: 170, Loss: 0.158139
Train - Epoch 1, Batch: 180, Loss: 0.111515
Train - Epoch 1, Batch: 190, Loss: 0.076582
Train - Epoch 1, Batch: 200, Loss: 0.417697
Train - Epoch 1, Batch: 210, Loss: 0.183179
Train - Epoch 1, Batch: 220, Loss: 0.200185

NameError: name 'data_test' is not defined