這裡開始已經假設你已經看過前面的所有基礎文件說明，因此多數註解會拿掉以維護版面乾淨

Inception是一個不同於AlexNet、VGG思維的架構，他不以取深，而取寬，讓模型自己學習要怎麼取特徵，很特別，別具一格，並且以更少的參數得到更好的結果。

模型不會特定從v1一路弄到v3，有最新的就取最新就好了，不直接v4是因為v4加入resnet的概念，因此一步一步實作瞭解。

在下已有翻譯Inception論文，也可以參閱[相關文件](https://hackmd.io/@shaoeChen/SyjI6W2zB/https%3A%2F%2Fhackmd.io%2F%40shaoeChen%2FrkIGBzWEI)

首先載入相關需求套件

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

In [2]:
tf.__version__

'2.1.0'

指定硬體資源

In [3]:
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type='GPU')

資料集的部份是使用ImageNet訓練，不過這部份在下就只提供[資料集連結](http://www.image-net.org/)，不然硬train一發怕時間太久。

從論文中我們知道：
* inception的每一個block是堆出來的，它由機器自己學習要用什麼樣的filter來抽取特徵
* inception的每一層都是conv -> bn，因此實作上我們會把這個寫成一個function，減少重覆的程式碼

Inception已經無法再使用單純的`tf.keras.models.Sequential`來逐層建構模型，因此要採用其它的方式，class或function

利用標準的keras function來建置模型，範例參考[keras application](https://github.com/keras-team/keras-applications/blob/master/keras_applications/inception_v3.py)

In [4]:
def conv_bn(x, 
            filter_num, 
            filter_row, 
            filter_col,
            padding='same', 
            strides=1, 
            name=None):
    """conv layer
    每一個卷積後面都會加上BN，然後再經過ReLU
    
    parameters:
        x: input
        filter_num: filter的數量
        filter_row: filter的row
        filter_col: filter的col
        padding: padding的方式
        strides: 卷積的步幅，假設都是nxn
        name: 這一層的名稱，盡可能的設置，後續繪製出圖形的時候比較能夠理解自己的模型
    
    return:
        回傳經過卷積、BN、ReLE的結果
        
    remark:
        如果有想要設置更多參數就上下加入相對應參數即可
    """
    if name is not None:
        bn_name = name + '_bn'
        conv_name = name + '_conv'
    else:
        bn_name = None
        conv_name = None
    
    x = tf.keras.layers.Conv2D(
            filters=filter_num,
            kernel_size=(filter_row, filter_col),
            strides=strides,
            padding=padding,
            name=conv_name)(x)
    x = tf.keras.layers.BatchNormalization(name=bn_name)(x)
    x = tf.keras.layers.Activation('relu', name=name)(x)
    return x
    

上面的程式碼非常直覺瞭解，就是將輸入經過conv、bn、relu計算之後回傳

下面我們開始建構inception

In [5]:
def inception():
    # 如果你有需要可以將input shape做為參數
    # 但相信你的專案在建置的時候已經確定其維度，因此這邊會直接寫死
    
    img_input = tf.keras.layers.Input(shape=(299, 299, 3))
    
    # 先利用標準的CNN架構做了兩次的MaxPooling將維度縮小
    x = conv_bn(img_input, 32, 3, 3, padding='valid', strides=2, name='layer_1')
    x = conv_bn(x, 32, 3, 3, padding='valid', name='layer_2')
    x = conv_bn(x, 64, 3, 3, name='layer_3')
    x = tf.keras.layers.MaxPool2D((3, 3), strides=(2, 2))(x)
    
    x = conv_bn(x, 80, 1, 1, padding='valid', name='layer_4')
    x = conv_bn(x, 192, 3, 3, padding='valid', name='layer_5')
    x = tf.keras.layers.MaxPool2D((3, 3), strides=(2, 2))(x)
    
    # 第一個區塊
    # 區塊內總共含有四個feature extraction的方法
    # 這四種方法我們可以發現到它們的input都是x
    # 這代表模型會將同一個輸入x做多種不同的特徵提取
    # 首先是1x1
    branch1x1 = conv_bn(x, 64, 1, 1, name='b1_1x1')
    
    # 接下來是5x5
    branch5x5 = conv_bn(x, 48, 1, 1, name='b1_5x5_1')
    branch5x5 = conv_bn(branch5x5, 64, 5, 5, name='b1_5x5_2')
    
    # 然後是3x3
    branch3x3 = conv_bn(x, 64, 1, 1, name='b1_3x3_1')
    branch3x3 = conv_bn(branch3x3, 96, 3, 3, name='b1_3x3_2')
    branch3x3 = conv_bn(branch3x3, 96, 3, 3, name='b1_3x3_3')
    
    # 最後是pool
    branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                   strides=(1, 1),
                                                   padding='same')(x)
    
    branch_pool = conv_bn(branch_pool, 64, 1, 1, name='b1_pool')
    
    # 將四個feature extract之後的資料堆疊做為output
    # 裡面含有四種口味的特徵
    # axis=3代表我們要依著channel這個軸來堆疊
    # 堆疊過程中最重要的就是維度的確認
    # 以這個區塊來說，最終的hxw為35x35，因此你必需確保你的每一個方法的output都是35x35，否則會報錯
    x = tf.keras.layers.concatenate(
        [branch1x1, branch5x5, branch3x3, branch_pool],
        axis=3,
        name='mixed1'
    )
    
    
    model = tf.keras.models.Model(img_input, x, name='inception_v3')
    return model

我們可以先堆疊一個區塊，然後利用`summary`來驗證模型維度是否跟我們所想的一樣

In [6]:
model = inception()

In [7]:
model.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
layer_1_conv (Conv2D)           (None, 149, 149, 32) 896         input_1[0][0]                    
__________________________________________________________________________________________________
layer_1_bn (BatchNormalization) (None, 149, 149, 32) 128         layer_1_conv[0][0]               
__________________________________________________________________________________________________
layer_1 (Activation)            (None, 149, 149, 32) 0           layer_1_bn[0][0]                 
_______________________________________________________________________________________

從模型資訊可以看的到，最終我們得到的是一個35x35x288的輸出，這代表第一個區塊之後我們有288個35x35的filter，而這288個filter是由四種不同的特徵提取方法學習而得。

這種作法的好處在於即使深，它的總參數量還是比VGG16、AlexNet還要來的少太多太多，而且效能還不輸。

讓我們將整個模型建置完成

In [8]:
def inception():
    # 如果你有需要可以將input shape做為參數
    # 但相信你的專案在建置的時候已經確定其維度，因此這邊會直接寫死
    
    img_input = tf.keras.layers.Input(shape=(299, 299, 3))
    
    # 先利用標準的CNN架構做了兩次的MaxPooling將維度縮小
    x = conv_bn(img_input, 32, 3, 3, padding='valid', strides=2, name='layer_1')
    x = conv_bn(x, 32, 3, 3, padding='valid', name='layer_2')
    x = conv_bn(x, 64, 3, 3, name='layer_3')
    x = tf.keras.layers.MaxPool2D((3, 3), strides=(2, 2))(x)
    
    x = conv_bn(x, 80, 1, 1, padding='valid', name='layer_4')
    x = conv_bn(x, 192, 3, 3, padding='valid', name='layer_5')
    x = tf.keras.layers.MaxPool2D((3, 3), strides=(2, 2))(x)
    
    # 第一個區塊 35x38x288
    # 區塊內總共含有四個feature extraction的方法
    # 這四種方法我們可以發現到它們的input都是x
    # 這代表模型會將同一個輸入x做多種不同的特徵提取
    # 首先是1x1
    branch1x1 = conv_bn(x, 64, 1, 1, name='b1_1x1')
    
    # 接下來是5x5
    branch5x5 = conv_bn(x, 48, 1, 1, name='b1_5x5_1')
    branch5x5 = conv_bn(branch5x5, 64, 5, 5, name='b1_5x5_2')
    
    # 然後是3x3
    branch3x3 = conv_bn(x, 64, 1, 1, name='b1_3x3_1')
    branch3x3 = conv_bn(branch3x3, 96, 3, 3, name='b1_3x3_2')
    branch3x3 = conv_bn(branch3x3, 96, 3, 3, name='b1_3x3_3')
    
    # 最後是pool
    branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                   strides=(1, 1),
                                                   padding='same')(x)
    
    branch_pool = conv_bn(branch_pool, 64, 1, 1, name='b1_pool')
    
    # 將四個feature extract之後的資料堆疊做為output
    # 裡面含有四種口味的特徵
    # axis=3代表我們要依著channel這個軸來堆疊
    # 堆疊過程中最重要的就是維度的確認
    # 以這個區塊來說，最終的hxw為35x35，因此你必需確保你的每一個方法的output都是35x35，否則會報錯    
    x = tf.keras.layers.concatenate(
        [branch1x1, branch5x5, branch3x3, branch_pool],
        axis=3,
        name='mixed1'
    )
    
    # 第二個區塊 35x35x288
    # 其實應該是可以用迴圈跟第一個區塊寫在一起
    # 首先是1x1
    branch1x1 = conv_bn(x, 64, 1, 1, name='b2_1x1')
    
    # 接下來是5x5
    branch5x5 = conv_bn(x, 48, 1, 1, name='b2_5x5_1')
    branch5x5 = conv_bn(branch5x5, 64, 5, 5, name='b2_5x5_2')
    
    # 然後是3x3
    branch3x3 = conv_bn(x, 64, 1, 1, name='b2_3x3_1')
    branch3x3 = conv_bn(branch3x3, 96, 3, 3, name='b2_3x3_2')
    branch3x3 = conv_bn(branch3x3, 96, 3, 3, name='b2_3x3_3')
    
    # 最後是pool
    branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                   strides=(1, 1),
                                                   padding='same')(x)
    
    branch_pool = conv_bn(branch_pool, 64, 1, 1, name='b2_pool')
    
    # 將四個feature extract之後的資料堆疊做為output    
    x = tf.keras.layers.concatenate(
        [branch1x1, branch5x5, branch3x3, branch_pool],
        axis=3,
        name='mixed2'
    )        
    
    # 第三個區塊 17x17x768
    # 首先是3x3
    branch3x3_1 = conv_bn(x, 384, 3, 3, strides=2, padding='valid', name='b3_3x3_1')
    
    # 接下來是3x3
    branch3x3_2 = conv_bn(x, 64, 1, 1, name='b3_3x3_2_1')
    branch3x3_2 = conv_bn(branch3x3_2, 96, 3, 3, name='b3_3x3_2_2')
    branch3x3_2 = conv_bn(branch3x3_2, 96, 3, 3, strides=2, padding='valid', name='b3_3x3_2_3')
    
    # 最後是pool
    branch_pool = tf.keras.layers.MaxPooling2D((3, 3),
                                               strides=(2, 2))(x)    
    
    # 將四個feature extract之後的資料堆疊做為output    
    x = tf.keras.layers.concatenate(
        [branch3x3_1, branch3x3_2, branch_pool],
        axis=3,
        name='mixed3'
    )   
    
    # 第四個區塊 17x17x768
    # 首先是1x1
    branch1x1 = conv_bn(x, 192, 1, 1, name='b4_1x1')    
    
    # 接下來是7x7
    branch7x7_1 = conv_bn(x, 128, 1, 1, name='b4_7x7_1_1')
    branch7x7_1 = conv_bn(branch7x7_1, 128, 1, 7, name='b4_7x7_1_2')
    branch7x7_1 = conv_bn(branch7x7_1, 192, 7, 1, name='b4_7x7_1_3')
    
    # 然後是7x7
    branch7x7_2 = conv_bn(x, 128, 1, 1, name='b4_7x7_2_1')
    branch7x7_2 = conv_bn(branch7x7_2, 128, 7, 1, name='b4_7x7_2_2')
    branch7x7_2 = conv_bn(branch7x7_2, 128, 1, 7, name='b4_7x7_2_3')
    branch7x7_2 = conv_bn(branch7x7_2, 128, 7, 1, name='b4_7x7_2_4')
    branch7x7_2 = conv_bn(branch7x7_2, 192, 1, 7, name='b4_7x7_2_5')
    
    # 最後是pool
    branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                   strides=(1, 1),
                                                   padding='same')(x)
    
    branch_pool = conv_bn(branch_pool, 192, 1, 1, name='b4_pool')
    
    # 將四個feature extract之後的資料堆疊做為output    
    x = tf.keras.layers.concatenate(
        [branch1x1, branch7x7_1, branch7x7_2, branch_pool],
        axis=3,
        name='mixed4'
    ) 
    
    # 第五、六個區塊 17x17x768
    for i in range(2):
        # 首先是1x1
        branch1x1 = conv_bn(x, 192, 1, 1, name='b' + str(5 + i) + '_1x1')    

        # 接下來是7x7
        branch7x7_1 = conv_bn(x, 128, 1, 1, name='b' + str(5 + i) + '_7x7_1_1')
        branch7x7_1 = conv_bn(branch7x7_1, 128, 1, 7, name='b' + str(5 + i) + '_7x7_1_2')
        branch7x7_1 = conv_bn(branch7x7_1, 192, 7, 1, name='b' + str(5 + i) + '_7x7_1_3')

        # 然後是7x7
        branch7x7_2 = conv_bn(x, 128, 1, 1, name='b' + str(5 + i) + '_7x7_2_1')
        branch7x7_2 = conv_bn(branch7x7_2, 128, 7, 1, name='b' + str(5 + i) + '_7x7_2_2')
        branch7x7_2 = conv_bn(branch7x7_2, 128, 1, 7, name='b' + str(5 + i) + '_7x7_2_3')
        branch7x7_2 = conv_bn(branch7x7_2, 192, 7, 1, name='b' + str(5 + i) + '_7x7_2_4')
        branch7x7_2 = conv_bn(branch7x7_2, 192, 1, 7, name='b' + str(5 + i) + '_7x7_2_5')

        # 最後是pool
        branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                       strides=(1, 1),
                                                       padding='same')(x)

        branch_pool = conv_bn(branch_pool, 192, 1, 1, name='b' + str(5 + i) + '_pool') 
        
        # 將四個feature extract之後的資料堆疊做為output    
        x = tf.keras.layers.concatenate(
            [branch1x1, branch7x7_1, branch7x7_2, branch_pool],
            axis=3,
            name='mixed' + str(5 + i)
        )   
    
    # 第七個區塊 17x17x768
    # 首先是1x1
    branch1x1 = conv_bn(x, 192, 1, 1, name='b7_1x1')    
    
    # 接下來是7x7
    branch7x7_1 = conv_bn(x, 192, 1, 1, name='b7_7x7_1_1')
    branch7x7_1 = conv_bn(branch7x7_1, 192, 1, 7, name='b7_7x7_1_2')
    branch7x7_1 = conv_bn(branch7x7_1, 192, 7, 1, name='b7_7x7_1_3')    
   
    # 然後是7x7
    branch7x7_2 = conv_bn(x, 192, 1, 1, name='b7_7x7_2_1')
    branch7x7_2 = conv_bn(branch7x7_2, 192, 7, 1, name='b7_7x7_2_2')
    branch7x7_2 = conv_bn(branch7x7_2, 192, 1, 7, name='b7_7x7_2_3')
    branch7x7_2 = conv_bn(branch7x7_2, 192, 7, 1, name='b7_7x7_2_4')
    branch7x7_2 = conv_bn(branch7x7_2, 192, 1, 7, name='b7_7x7_2_5')    
    # 最後是pool
    branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                   strides=(1, 1),
                                                   padding='same')(x)

    branch_pool = conv_bn(branch_pool, 192, 1, 1, name='b7_pool')     
    # 將四個feature extract之後的資料堆疊做為output    
    x = tf.keras.layers.concatenate(
        [branch1x1, branch7x7_1, branch7x7_2, branch_pool],
        axis=3,
        name='mixed7' 
    )  
        
        
    # 第八個區塊 8x8x1200
    # 首先是3x3
    branch3x3 = conv_bn(x, 192, 1, 1, name='b8_3x3_1')
    branch3x3 = conv_bn(branch3x3, 320, 3, 3, strides=2, padding='valid', name='b8_3x3_2')
    
    # 接下來是7x7
    branch7x7 = conv_bn(x, 192, 1, 1, name='b8_7x7_1')
    branch7x7 = conv_bn(branch7x7, 192, 1, 7, name='b8_7x7_2')
    branch7x7 = conv_bn(branch7x7, 192, 7, 1, name='b8_7x7_3')
    branch7x7 = conv_bn(branch7x7, 192, 3, 3, strides=2, padding='valid', name='b8_7x7_4')
    
    # 最後是pool
    branch_pool = tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)
    x = tf.keras.layers.concatenate(
        [branch3x3, branch7x7, branch_pool],
        axis=3,
        name='mixed8'
    )
    
    # 第九、十個區塊 8x8x2048
    for i in range(2):
        # 首先是1x1
        branch1x1 = conv_bn(x, 320, 1, 1, name='b' + str(9 + i) + '_1x1')    
        
        # 接下來是3x3
        branch3x3_1 = conv_bn(x, 384, 1, 1, name='b' + str(9 + i) + '_3x3_1')
        # 下面要注意的是，它是將3x3之後的output再分別經過1x3、3x1，然後再堆起來
        branch3x3_1_1 = conv_bn(branch3x3_1, 384, 1, 3, name='b' + str(9 + i) + '_3x3_1_1')
        branch3x3_1_2 = conv_bn(branch3x3_1, 384, 3, 1, name='b' + str(9 + i) + '_3x3_1_2')
        branch3x3_1 = tf.keras.layers.concatenate(
            [branch3x3_1_1, branch3x3_1_2],
            axis=3,
            name='mixed' + str(9 + i ) + '_1_' + str(i)
        )

        # 然後是3x3
        branch3x3_2 = conv_bn(x, 448, 1, 1, name='b' + str(9 + i) + '_3x3_2_1')
        branch3x3_2 = conv_bn(branch3x3_2, 384, 1, 1, name='b' + str(9 + i) + '_3x3_2_2')
        # 下面要注意的是，它是將3x3之後的output再分別經過1x3、3x1，然後再堆起來
        branch3x3_2_1 = conv_bn(branch3x3_2, 384, 1, 3, name='b' + str(9 + i) + '_3x3_21')
        branch3x3_2_2 = conv_bn(branch3x3_2, 384, 3, 1, name='b' + str(9 + i) + '_3x3_22')
        branch3x3_2 = tf.keras.layers.concatenate(
            [branch3x3_2_1, branch3x3_2_2],
            axis=3,
            name='mixed' + str(9 + i ) + '_2_' + str(i)
        )
        
        # 最後是pool
        branch_pool = tf.keras.layers.AveragePooling2D((3, 3),
                                                       strides=(1, 1),
                                                       padding='same')(x)

        branch_pool = conv_bn(branch_pool, 192, 1, 1, name='b' + str(9 + i) + '_pool') 
        
        # 將四個feature extract之後的資料堆疊做為output    
        x = tf.keras.layers.concatenate(
            [branch1x1, branch3x3_1, branch3x3_2, branch_pool],
            axis=3,
            name='mixed' + str(9 + i)
        )         
        
    # 然後是global average pooling或global max pooling
    # inception並沒有接fully connected layer，這大大減少了參數量
    x = tf.keras.layers.GlobalAveragePooling2D(name='g_avg_pool')(x)
    x = tf.keras.layers.Dense(1000, activation='softmax', name='predictions')(x)
    
    model = tf.keras.models.Model(img_input, x, name='inception_v3')
    return model

In [9]:
model = inception()
model.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
layer_1_conv (Conv2D)           (None, 149, 149, 32) 896         input_2[0][0]                    
__________________________________________________________________________________________________
layer_1_bn (BatchNormalization) (None, 149, 149, 32) 128         layer_1_conv[0][0]               
__________________________________________________________________________________________________
layer_1 (Activation)            (None, 149, 149, 32) 0           layer_1_bn[0][0]                 
_______________________________________________________________________________________

編譯模型

In [10]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=['accuracy']
)

可以明顯發現到，Inception_v3也算的上是深，有十個block，但參數量只有2000萬，相比AlexNet的6000與VGG16的1億3千萬，真的是少太多。

但Inception為模型的架構提出一個新的思維，不一定要關注深度，而還有寬，利用不同的大小的filter的堆疊，讓模型過程中自己決定要怎麼抽取出特徵。

事貫上你可以發現，在Inception裡面有多處使用1x1的filter來做維度的降低，然後再做conv，這也某種方面的控制了參數量的發散，再加上最後不再採用fully connected layer而是global pooling，再次的減少參數量。

global average pooling或是global maxpooling的概念，就是單一filter上取平均或是最大值來做一個點，因此，最後一個block的output為8x8x2048會濃縮為2048個點，直接讓feature map來決定這個值，而不再利用fully connected layer做一次非線性計算，可以為最終的output做更直觀的理解。