# Classic CNNs


## LeNet - 5

number of parameters: ~60 K 

`(5*5*6+5*5*6*16+400*120+120*84+84*10)`

用 conv 接著 pool 形成一個 layer.

```code
               |   dimension       |   setting 
 X:INPUT       |   32 x 32 x 1     | CONV, f:5, s:1, #:6
 C1            |   28 x 28 x  6    | POOL:AVG, f:2, s:2 
 S2            |   14 x 14 x  6    | CONV, f:5, s:1, #:16
 C3            |   10 x 10 x 16    | POOL:AVG, f:2, s:2  
 S4            |    5 x  5 x 16    | FC, 400 -> 120
 C5            |  120              | FC, 120 ->  84
 F6            |   84              | SOFTMAX, 84 ->  10
 Y             |   10              |
```

## AlexNet

number of parameters: ~60 M

`11*11*3*96+5*5*96*256+3*3*256*384+3*3*256*384+3*3*384*256+6*6*256*4096+4096*4096+4096*1000 = 61,925,408`

```code
X:INPUT    |  227 x 227 x   3   | CONV     ~ f:11, s:4
C1         |   55 x  55 x  96   | POOL-MAX ~ f:3, s:2 
P2         |   27 x  27 x  96   | CONV     ~ f:5, SAME
C3         |   27 x  27 x 256   | POOL-MAX ~ f:3, s:2 
P4         |   13 x  13 x 256   | CONV     ~ f:3, SAME
C5         |   13 x  13 x 384   | CONV     ~ f:3, SAME
C6         |   13 x  13 x 384   | CONV     ~ f:3, SAME
C7         |   13 x  13 x 256   | POOL-MAX ~ f:3, s:2
P8         |    6 x   6 x 256   | FC       ~ 9216 -> 4096
F9         | 4096               | FC       ~ 4096 -> 4096
F10        | 4096               | SOFTMAX  ~ 4096 -> 1000
Y          | 1000

```

## VGG-16 Network

number of parameters: ~138 M

```code
C: CONV, f:3, SAME
P: MAX POOL, f:2, s:2

C#64 x 2 : run Conv with 64 filters for 2 times

X:INPUT   |  224 x 224 x   3   | C#64 x 2
C3        |  224 x 224 x  64   | P
P4        |  112 x 112 x  64   | C#128 x 2 
C6        |  112 x 112 x 128   | P
P7        |   56 x  56 x 128   | C#256 x 3
C10       |   56 x  56 x 256   | P
P11       |   28 x  28 x 256   | C#512 x 3
C14       |   28 x  28 x 512   | P
P15       |   14 x  14 x 512   | C#512 x 3
C18       |   14 x  14 x 512   | P
P19       |    7 x   7 x 512   | FC
F20       | 4096               | FC
F21       | 4096               | SOFTMAX
Y         | 1000               |
```

## Residual block

$$
\begin{align}
a^{[l+2]} & = g \big( z^{[l+2]} + \underbrace{a^{[l]}}_{\text{residual block}} \big) \\
& = g \big( w^{[l+2]} \ a^{[l+1]} + b^{[l+2]} + a^{[l]} \big)
\end{align}
$$

若沒有 residual block, 而 $ w^{[l+2]}, b^{[l+2]} $ 逼近於 0, 最佳化程序就不易找回原來的 identity : $ a^{[l]} $

## One by one convolution

用 $ 1 \times 1 \times n_c $ 的 filter 進行 convolution

也叫做 network in network

## Inception Network

使用各種不同形式的 CONV operation，設計好輸出一樣的 $ \big( n_h, n_w, ? \big) $ ，將個別結果堆疊 stack 起來，形成該層的結果

problem of computation cost:

```
INPUT   28 x28 x192
   >>> CONV f:5, #:32, SAME >>>
OUTPUT  28 x28 x 32
```

這樣的計算，乘法需要 `28 * 28 * 32 * (5 * 5 * 192)` = ~ 120M 次乘法。 

## Bottleneck layer

```
INPUT  28 x 28 x 192
    >>> CONV f:1, #:16, SAME
C1     28 x 28 x 16            <<< Bottleneck Layer
    >>> CONV f:5, #:32, SAME
C2     28 x 28 x 32
```

則需要的乘法計算次數:

```
(1*1*192) * (28*28*16) + (5*5*16) * (28*28*32) = ~12,443,648
```

透過 bottleneck layer, 減少了10倍運算 


## Inception module

利用重複的 inception module, 成為 inception network,  
單一的 inception module 構成如下:

```
                   ------------------> CONV 1x1 #64 --> 28x28x 64
Prev Activations:  --> CONV 1x1#96 --> CONV 3x3#128 --> 28x28x128
28 x 28 x 192      --> CONV 1x1#16 --> CONV 5x5# 32 --> 28x28x 32
                   --> MAXP 3x3    --> CONV 1x1# 32 --> 28x28x 32
                   
將結果 stack 起來，形成 result: 28x28x (64+128+32+32)
```



## Transfer learning

有許多成熟的 NN 已經 open source, 直接可以用它的結構，以及 weights 來做 transfer learning.

通常會比自己從頭開始訓練的結果好。

## Common Data Augmentation

- 左右顛倒
- Random Crop (不是很完美的辦法，但通常也有幫助)
- Rotation
- Shearing
- Local warping
- Color Shifting ( 如 R:+20, G:-20, B:+20 ) 模擬因為光線變化，產生的色彩差異。
- PCA color augmentation (from AlexNet paper)

實作上，可以用 A:CPU+Thread 做 data augmentation (distortion), 另外 B:CPU+Thread 做 mini-batch training.

## Tips for doing well on benchmarks

- Train several networks independently and average their outputs $ \hat{y} $
- Multi-crop at test time: Run classifier on multiple versions of test image and average output, such as "10-Crop"