# Tutorial 7 - CNN & Residual Network


## Outline

+ Convolutional Neural Network (CNN):
    + Hyperparamters in CNN: channels, padding, stride, dilation
    + Pooling
    + CNN in PyTorch
+ Residual Network
+ Batch Normalization

# HW6 - Helper function


You can use the following decorator to report time:

In [13]:
import time

def timeit(f):

    def timed(*args, **kw):

        ts = time.time()
        result = f(*args, **kw)
        te = time.time()

        print(f'func:{f.__name__} took: {te-ts:.4f} sec')
        return result

    return timed

@timeit
def sleep(sec):
    return time.sleep(sec)

sleep(0.1)

func:sleep took: 0.1002 sec


## Convolutional Neural Netwok (CNN)

### CNN general architechture
![](https://cdn-images-1.medium.com/max/800/1*lvvWF48t7cyRWqct13eU0w.jpeg)  


### Convolution Filters help extract features
![](https://qph.fs.quoracdn.net/main-qimg-50915e66f98186a786b3d0344eea9aba-pjlq)  

### Calculating convolution output shape
Here is a [visualiztion](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) for padding, stride and dilation

$$H_{\text {out }}=\left[\frac{H_{\text {in }}+2 \times \text { padding }-\operatorname{dilation} \times(\text { kernel size }-1)-1}{\text { stride }}+1\right]$$


In [14]:
import pickle
import torch
import torch.nn as nn

In [15]:
# init a Conv2d layer
conv = nn.Conv2d(1, 1, kernel_size=2)
data = torch.rand(1, 2, 2)
print(conv(data))
print(conv.weight.data)
conv.bias.data
print(torch.sum(conv.weight.data * data) + conv.bias.data)

tensor([[[0.1888]]], grad_fn=<SqueezeBackward1>)
tensor([[[[ 0.2187, -0.3329],
          [ 0.1750, -0.0343]]]])
tensor([0.1888])


In [16]:
# init a MaxPool layer
max_pool = nn.MaxPool2d(2)
max_pool

MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

In [17]:
# init a Average Pool layer
avg_pool = nn.AvgPool2d(2)
avg_pool

AvgPool2d(kernel_size=2, stride=2, padding=0)

In [18]:
def out_dim(in_dim, kernel_size, padding, stride, dilation):
    return (in_dim + 2 * padding - dilation * (kernel_size - 1) - 1) // stride + 1


# data shape: (N, C, W, H)
data = torch.rand(1, 1, 2, 2)
conv(data)

tensor([[[[0.3643]]]], grad_fn=<ConvolutionBackward0>)

### LeNet architecture
LeCun, Y.; Bottou, L.; Bengio, Y. & Haffner, P. (1998). Gradient-based learning applied to document recognition.Proceedings of the IEEE. 86(11): 2278 - 2324.

|Layer No.|Layer type|#channels/#features|Kernel size|Stride|Activation|
|---|---|---|---|---|---|
|1|2D Convolution|6|5|1|tanh|
|2|Average pooling|6|2|2|\\|
|3|2D Convolution|16|5|1|tanh|
|4|Average pooling|16|2|2|\\|
|5|2D Convolution|120|5|1|tanh|
|6|Flatten|\\|\\|\\|\\|
|7|Fully connected|84|\\|\\|tanh|
|8|Fully connected|10|\\|\\|softmax|

In [19]:
def load_dataset(path):
    with open(path, 'rb') as f:
        train_data, test_data = pickle.load(f)
    
    X_train = torch.tensor(train_data[0], dtype=torch.float).unsqueeze(1)
    y_train = torch.tensor(train_data[1], dtype=torch.long).unsqueeze(1)
    X_test = torch.tensor(test_data[0], dtype=torch.float).unsqueeze(1)
    y_test = torch.tensor(test_data[1], dtype=torch.long).unsqueeze(1)
    return X_train, y_train, X_test, y_test

X_train, y_train, X_test, y_test = load_dataset("Datasets\mnist.pkl")

  X_train, y_train, X_test, y_test = load_dataset("Datasets\mnist.pkl")


In [20]:
X = torch.rand(10, 32, 32)
print(X.shape)
X = X.unsqueeze(1)
print(X.shape)

torch.Size([10, 32, 32])
torch.Size([10, 1, 32, 32])


In [21]:
class LeNet(nn.Module):
    def __init__(self, in_channels=1):
        super().__init__()
        self.conv = nn.ModuleList([
            nn.Conv2d(in_channels, 6, kernel_size=5, stride=1),
            nn.Conv2d(6, 16, kernel_size=5, stride=1),
            nn.Conv2d(16, 120, kernel_size=5, stride=1)
        ])
        self.pool = nn.AvgPool2d(2)
        self.activation = nn.Tanh()
        self.fc = nn.ModuleList([
            nn.Linear(120, 84),
            nn.Linear(84, 10)
        ])
    
    def forward(self, x):
        for i in range(2):
            x = self.pool(self.activation(self.conv[i](x)))
        x = nn.Flatten()(self.activation(self.conv[2](x)))
        x = self.activation(self.fc[0](x))
        x = nn.Softmax(dim=-1)(self.fc[1](x))
        return x

net = LeNet()
net

LeNet(
  (conv): ModuleList(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (2): Conv2d(16, 120, kernel_size=(5, 5), stride=(1, 1))
  )
  (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (activation): Tanh()
  (fc): ModuleList(
    (0): Linear(in_features=120, out_features=84, bias=True)
    (1): Linear(in_features=84, out_features=10, bias=True)
  )
)

In [22]:
# Use torchsummary to print the architecture
# ! pip install torch-summary
from torchsummary import summary

s = summary(net, (1, 32, 32))

Layer (type:depth-idx)                   Output Shape              Param #
├─ModuleList: 1                          []                        --
|    └─Conv2d: 2-1                       [-1, 6, 28, 28]           156
├─Tanh: 1-1                              [-1, 6, 28, 28]           --
├─AvgPool2d: 1-2                         [-1, 6, 14, 14]           --
├─ModuleList: 1                          []                        --
|    └─Conv2d: 2-2                       [-1, 16, 10, 10]          2,416
├─Tanh: 1-3                              [-1, 16, 10, 10]          --
├─AvgPool2d: 1-4                         [-1, 16, 5, 5]            --
├─ModuleList: 1                          []                        --
|    └─Conv2d: 2-3                       [-1, 120, 1, 1]           48,120
├─Tanh: 1-5                              [-1, 120, 1, 1]           --
├─ModuleList: 1                          []                        --
|    └─Linear: 2-4                       [-1, 84]                  10,164
├─T

In [25]:
#net(X_train[:10]).shape

## Residual Network (ResNet)


An example of residual block:

<img src="https://miro.medium.com/v2/resize:fit:868/format:webp/0*sGlmENAXIZhSqyFZ" width="400" />

In [26]:
class ResBlock(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.fc = nn.ModuleList([nn.Linear(dim, dim), nn.Linear(dim, dim)])
        self.activation = nn.ReLU()
    
    def forward(self, x):
        out = self.activation(self.fc[0](x))
        out = self.fc[1](out)
        out += x
        out = self.activation(out)
        return out
    

In [30]:
# Let't modify the LeNet by adding a skip connection at the first fc layer
class LeNetRes(nn.Module):
    def __init__(self, in_channels=1):
        super().__init__()
        self.conv = nn.ModuleList([
            nn.Conv2d(in_channels, 6, kernel_size=5, stride=1),
            nn.Conv2d(6, 16, kernel_size=5, stride=1),
            nn.Conv2d(16, 120, kernel_size=5, stride=1)
        ])
        # Batch norm
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(6), nn.BatchNorm2d(16)])
        self.pool = nn.AvgPool2d(kernel_size=2)
        self.activation = nn.Tanh()
        self.fc = nn.ModuleList([
            nn.Linear(120, 120),
            nn.Linear(120, 84),
            nn.Linear(84, 10)
        ])

    def forward(self, x):
        for i in range(2):
            x = self.pool(self.activation(self.conv[i](x)))

        x = nn.Flatten()(self.activation(self.conv[2](x)))
        x = self.activation(x + self.fc[0](x))
        x = self.activation(self.fc[1](x))
        
        x = nn.Softmax(dim=-1)(self.fc[1](x))
        return x

## Batch Normalization (BN)

For a 4-D input data $X$ with shape $(N,C,W,H)$. For each channel, the data is normalized by:

$$\hat{X}_{ijkl}=\frac{X_{ijkl}-\mathrm{mean}(X_j)}{\sqrt{\mathrm{var}(X_j)+\epsilon}} * \gamma_j + \beta_j$$

where

$$\mathrm{mean}(X_j)=\frac{1}{NWH}\sum_{i}^N\sum_k^W\sum_l^H X_{ikl}$$
$$\mathrm{var}(X_j)=\frac{1}{NWH}\sum_{i}^N\sum_k^W\sum_l^H (X_{ikl}-\mathrm{mean}(X_j))^2$$

$\epsilon$ is a small number (say, $10^{-5}$) to avoid numerical instability. $\boldsymbol{\gamma, \beta}$ are learnable parameters

In [31]:
batch_norm = nn.BatchNorm2d(120)
