# 2. Convolutional Neural Networks (LeNet)

In [2]:
import numpy as np
import pandas as pd

import torch
import torch.nn as nn

## LeNet-5

At a high level, **`LeNet`** (LeNet-5) consists of two parts: 

1. a **convolutional encoder** consisting of **two convolutional layers**

2. a **dense block** consisting of **three fully connected layers**

The architecture of a LeNet is as shown below:

![](http://d2l.ai/_images/lenet.svg)

The input is a **$28 \times 28$ hand-written digit** and the output is a **probability** over 10 possible outcomes.

The basic units in each **convolutional block** are:

1. a convolutional layer (with a $5 \times 5$ kernel)
2. a sigmoid activation function 
3. a subsequent average pooling operation ($2 \times 2$, stride $=2$)

(Note: **ReLU** and **max-pooling** work better but were not yet been made at the time of LeNet)

The convolutional layers map spatially arranged inputs to a number of **2-dimensional feature maps**, typically increasing the **number of channels**. Then, the pooling operations **reduces dimensionality** by a factor of $4$ via spatial downsampling.

The ouput of the **convolutional block** is a **4-dimensional** tensor (batch size, channels, height, width). To pass this to the **dense block**, we need to **flatten** the tensor to **2-dimensional** (sample index in minibatch, flatten vector).

There are **3 fully connected layers** in the dense block of LeNet, each with $120$, $84$ and $10$ outputs.

A simplified representation of the LeNet:

![](http://d2l.ai/_images/lenet-vert.svg)

## Implementing LeNet

In the following implementation, we removed the **gaussian activation** in the original LeNet-5 architecture:

In [3]:
net = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2), nn.Sigmoid(),
                    nn.AvgPool2d(kernel_size=2, stride=2),
                    nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5), nn.Sigmoid(),
                    nn.AvgPool2d(kernel_size=2, stride=2),
                    nn.Flatten(),
                    nn.Linear(in_features=16*5*5, out_features=120), nn.Sigmoid(),
                    nn.Linear(in_features=120, out_features=84), nn.Sigmoid(),
                    nn.Linear(in_features=84, out_features=10)
                   )

In [3]:
X = torch.rand(size=(1, 1, 28, 28), dtype=torch.float32)

for layer in net:
    X = layer(X)
    print(layer.__class__.__name__, 'output shape: \t', X.shape)

Conv2d output shape: 	 torch.Size([1, 6, 28, 28])
Sigmoid output shape: 	 torch.Size([1, 6, 28, 28])
AvgPool2d output shape: 	 torch.Size([1, 6, 14, 14])
Conv2d output shape: 	 torch.Size([1, 16, 10, 10])
Sigmoid output shape: 	 torch.Size([1, 16, 10, 10])
AvgPool2d output shape: 	 torch.Size([1, 16, 5, 5])
Flatten output shape: 	 torch.Size([1, 400])
Linear output shape: 	 torch.Size([1, 120])
Sigmoid output shape: 	 torch.Size([1, 120])
Linear output shape: 	 torch.Size([1, 84])
Sigmoid output shape: 	 torch.Size([1, 84])
Linear output shape: 	 torch.Size([1, 10])
