這裡開始已經假設你已經看過前面的所有基礎文件說明，因此多數註解會拿掉以維護版面乾淨

AlexNet之後的一個重要里程碑就是VGG16的出現，"深"仍然是探索神經網路的一個方向，在縮小output的同時也增加filter的數量，而且都採用大量3x3的filter來降低參數數量(雖然還是很多)。

在下已有翻譯VGG16論文，也可以參閱[相關文件](https://hackmd.io/@shaoeChen/SyjI6W2zB/https%3A%2F%2Fhackmd.io%2F%40shaoeChen%2FBJ2DMA7QU)

首先載入相關需求套件

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

In [2]:
tf.__version__

'2.1.0'

指定硬體資源

In [3]:
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type='GPU')

資料集的部份是使用ImageNet訓練，不過這部份在下就只提供[資料集連結](http://www.image-net.org/)，不然硬train一發怕時間太久。

從論文中我們知道：
* VGG16的input dimension為224x224x3
* filter size為3x3
* 每一個block都會在卷積之後same padding
* 每一個block在pooling之後都會降低維度，但同時倍數增加filter
* 每次的pooling都是maxpooling，而且為2x2，stride=2

利用標準的keras Sequential來建置模型

In [4]:
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(224, 224, 3)),   
    # block-1 
    # filter:64, same padding
    tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    # pooling: maxpooling 2x2, stride=2, output 112 x 112 x 64
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
    # block-2
    # filter:128, same padding
    tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    # pooling: maxpooling 2x2, stride=2, output 56 x 56 x 128
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),
    # block-3
    # filter:256, same padding
    tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    # pooling: maxpooling 2x2, stride=2, output 28 x 28 x 256
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)),    
    # block-4
    # filter:512, same padding
    tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    # pooling: maxpooling 2x2, stride=2, output 14 x 14 x 512
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)), 
    # block-5
    # filter:512, same padding
    tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), strides=(1, 1), padding='same', activation='relu'),
    # pooling: maxpooling 2x2, stride=2, output 7 x 7 x 512
    tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)), 
    
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(4096, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1000, activation='softmax')    
])

確認模型

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 224, 224, 64)      1792      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 224, 224, 64)      36928     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 112, 112, 128)     73856     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 112, 112, 128)     147584    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 56, 56, 128)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 56, 56, 256)       2

編譯模型

In [6]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=['accuracy']
)

事實上，VGG訓練起來還真的要人命，因為參數量真的是大的恐怖。但是它算是傳統架構(承接AlexNet思維)，因此在架構過程中我們使用`tf.keras.models.Sequential`就可以完成。但VGG之後的架構就比較沒有辦法這樣。而VGG19據吳恩達老師課堂上所述，其效能與VGG16相差不遠，因此大家都還是選用VGG16多，這點有興趣可以自行驗證。