# LeNet

在3.DL基础里面我们构造了一个含单隐藏层的MLP对Fashion-MNIST中的图像进行分裂。每张图像的高和宽为28像素。我们将图像中的像素Flatten，得到长度为784的向量，并输入FC层中。然而这种方法是有局限性的:
1. 图像在同一列邻近的像素在向量中会相聚较远，构成的模式很难被模型识别
2. 对于大尺寸的输入图像，使用FC层很容易导致模型较大。假设输入的是高和宽均为1000像素的彩色(RGB)图片。即使全连接层的输出仍为256个，但是该层的权重参数为1000x1000x3x256，大约为3G的内存。这会带来过于复杂的模型和过高的储存开销。

而卷积层解决了这两个问题。一方面，卷积层保留输入数据的形状，使图像的像素在高和宽两个方向上的相关性均能被有效识别；另一方面，军基层通过滑动窗口将统一卷积核与不同位置的输入重复计算，从而避免参数尺寸过大。

## 介绍

LeNet分为卷积层块和全连接层块两个部分。下面我们分别介绍这两个模块。

卷积层块里的基本单位是卷积层后接最大池化层：卷积层用来识别图像里的空间模式，如线条和物体局部，之后的最大池化层则用来降低卷积层对位置的敏感性。卷积层块由两个这样的基本单位重复堆叠构成。在卷积层块中，每个卷积层都使用$5\times 5$的窗口，并在输出上使用sigmoid激活函数。第一个卷积层输出通道数为6，第二个卷积层输出通道数则增加到16。这是因为第二个卷积层比第一个卷积层的输入的高和宽要小，所以增加输出通道使两个卷积层的参数尺寸类似。卷积层块的两个最大池化层的窗口形状均为$2\times 2$，且步幅为2。由于池化窗口与步幅形状相同，池化窗口在输入上每次滑动所覆盖的区域互不重叠。

卷积层块的输出形状为(批量大小, 高, 宽, 通道)。当卷积层块的输出传入全连接层块时，全连接层块会将小批量中每个样本变平（flatten）。也就是说，全连接层的输入形状将变成二维，其中第一维是小批量中的样本，第二维是每个样本变平后的向量表示，且向量长度为通道、高和宽的乘积。全连接层块含3个全连接层。它们的输出个数分别是120、84和10，其中10为输出的类别个数。

<img src="img/class7_1.1.png" style="zoom:100%">

**总结:（Conv+MaxPool)\*2 + FC\*3**

## 实现

In [None]:
MaxPool2D()

In [1]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense

In [16]:
# 构建网络
LeNet = tf.keras.Sequential([
    Conv2D(filters=6, kernel_size=(5,5), activation='sigmoid', input_shape=(28,28,1)),
    MaxPool2D(pool_size=(2,2), strides=(2,2)),
    Conv2D(filters=16, kernel_size=(5,5), activation='sigmoid'),
    MaxPool2D(pool_size=(2,2), strides=(2,2)),
    Flatten(),
    Dense(120, activation='sigmoid'),
    Dense(84, activation='sigmoid'),
    Dense(10, activation='sigmoid')
])

LeNet.summary()

Model: "sequential_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_171 (Conv2D)          (None, 24, 24, 6)         156       
_________________________________________________________________
max_pooling2d_39 (MaxPooling (None, 12, 12, 6)         0         
_________________________________________________________________
conv2d_172 (Conv2D)          (None, 8, 8, 16)          2416      
_________________________________________________________________
max_pooling2d_40 (MaxPooling (None, 4, 4, 16)          0         
_________________________________________________________________
flatten (Flatten)            (None, 256)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 120)               30840     
_________________________________________________________________
dense_4 (Dense)              (None, 84)              

In [17]:
# 获取数据
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# 修改数据shape (channel=1)
train_images = tf.reshape(train_images, (train_images.shape[0],train_images.shape[1],train_images.shape[2], 1))
test_images = tf.reshape(test_images, (test_images.shape[0],test_images.shape[1],test_images.shape[2], 1))

In [18]:
# 训练模型
LeNet.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.9, momentum=0.0, nesterov=False),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
LeNet.fit(train_images, train_labels, epochs=5, validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fdbe8a8ec50>

In [19]:
# 测试集检验
LeNet.evaluate(test_images, test_labels, verbose=2)

10000/10000 - 1s - loss: 0.5056 - accuracy: 0.8125


[0.5056428252696991, 0.8125]

# AlexNet

## 介绍

相比于LeNet，AlexNet包含8层转换，即5层卷积、2层FC隐层、1层FC输出层。

第一层为卷积且shape=(11,11), 因为ImageNet的图像像素比MNIST大10倍以上，所以使用更大的卷积层来捕获物体。第二层中的卷积窗口为(5,5)，之后的卷积全使用(3,3)。此外，第一、第二、第五卷积层之后都是用了shape=(3,3) & strides=(2,2)的最大池化层。并且其中使用的channels数量为LeNet中的数十倍。

在最后一个卷积层之后是两个FC层。AlexNet中的所有激活函数全部为ReLU。

网络中使用了DropOut来控制FC层中的模型复杂度。

AlexNet中引入了大量的图像augment，从而进一步扩大数据集来缓解过拟合。

<img src="img/class7_2.1.png" style="zoom:100%">

**总结: (Conv+MaxPool)\*2 + (Conv\*3 + MaxPool) + (FC + DropOut)\*2 + FC**

## 实现

In [20]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout

In [21]:
# 构建网络

AlexNet = tf.keras.models.Sequential([
    Conv2D(filters=96,kernel_size=11,strides=4,activation='relu', input_shape=(224,224,1)),
    MaxPool2D(pool_size=3, strides=2),
    Conv2D(filters=256,kernel_size=5,padding='same',activation='relu'),
    MaxPool2D(pool_size=3, strides=2),
    Conv2D(filters=384,kernel_size=3,padding='same',activation='relu'),
    Conv2D(filters=384,kernel_size=3,padding='same',activation='relu'),
    Conv2D(filters=256,kernel_size=3,padding='same',activation='relu'),
    MaxPool2D(pool_size=3, strides=2),
    Flatten(),
    Dense(4096,activation='relu'),
    Dropout(0.5),
    Dense(4096,activation='relu'),
    Dropout(0.5),
    Dense(10,activation='sigmoid')
])
AlexNet.summary()

Model: "sequential_19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_173 (Conv2D)          (None, 54, 54, 96)        11712     
_________________________________________________________________
max_pooling2d_41 (MaxPooling (None, 26, 26, 96)        0         
_________________________________________________________________
conv2d_174 (Conv2D)          (None, 26, 26, 256)       614656    
_________________________________________________________________
max_pooling2d_42 (MaxPooling (None, 12, 12, 256)       0         
_________________________________________________________________
conv2d_175 (Conv2D)          (None, 12, 12, 384)       885120    
_________________________________________________________________
conv2d_176 (Conv2D)          (None, 12, 12, 384)       1327488   
_________________________________________________________________
conv2d_177 (Conv2D)          (None, 12, 12, 256)     

In [22]:
# 获取处理后数据

def get_data():
    # 获取数据
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

    # 修改数据shape (channel=1)
    train_images = tf.reshape(train_images, (train_images.shape[0],train_images.shape[1],train_images.shape[2], 1))
    test_images = tf.reshape(test_images, (test_images.shape[0],test_images.shape[1],test_images.shape[2], 1))

    # pad图像shape至244,244
    train_images = tf.image.resize_with_pad(train_images, 224, 224, )
    test_images = tf.image.resize_with_pad(test_images, 224, 224, )
    
    # 标准化图像
    train_images = train_images / 255.0
    test_images = test_images / 255.0
    
    
    return train_images, train_labels, test_images, test_labels

train_images, train_labels, test_images, test_labels = get_data()

In [23]:
# 训练

AlexNet.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False),
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

# AlexNet.fit(x=train_images,
#             y=train_labels,
#             batch_size=128,
#             epochs=5,
#             verbose=2)

# 因为CPU训练要很久，所以用GPU训练完了把weights拿过来
AlexNet.load_weights('files/class7_2_weights.h5')

# 查看test集前2000的准确率
AlexNet.evaluate(test_images[:2000], test_labels[:2000], verbose=2)

2000/2000 - 10s - loss: 0.3125 - accuracy: 0.8845


[0.31246792006492613, 0.8845]

# VGG

## 介绍

VGG名字来源于其论文作者所在实验室为Visual Geometry Group。VGG提出了可以通过重复使用简单的基础块来构建深度模型的思路。

**总结: VGG_block\*n + (FC + DropOut)\*2 + FC**

**VGG_block = Conv\*n + MaxPool**

## 实现

In [24]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout

VGG块的组成规律是：连续使用数个相同的填充为1、窗口形状为3×3的卷积层后接上一个步幅为2、窗口形状为2×2的最大池化层。卷积层保持输入的高和宽不变，而池化层则对其减半。我们使用`vgg_block`函数来实现这个基础的VGG块，它可以指定卷积层的数量`num_convs`和输出通道数`num_filters`。

与AlexNet和LeNet一样，VGG网络由卷积层模块后接全连接层模块构成。卷积层模块串联数个`vgg_block`，其超参数由变量`conv_arch`定义。该变量指定了每个VGG块里卷积层个数和输出通道数。全连接模块则跟AlexNet中的一样。

In [25]:
# VGG块
def vgg_block(num_convs, num_filters):
    '''
    num_convs: 卷积层的个数
    filters: 卷积层内filters的数量
    '''
    blk = tf.keras.models.Sequential()
    for _ in range(num_convs):
        blk.add(Conv2D(num_filters, kernel_size=3, padding='same', activation='relu'))
    
    blk.add(MaxPool2D(pool_size=2, strides=2))
    return blk



现在我们构造一个VGG网络。它有5个卷积块，前2块使用单卷积层，而后3块使用双卷积层。第一块的输出通道是64，之后每次对输出通道数翻倍，直到变为512。因为这个网络使用了8个卷积层和3个全连接层，所以经常被称为VGG-11。

In [26]:
# VGG网络
def vgg_net(conv_arch):
    net = tf.keras.models.Sequential()
    for (num_convs, filters) in conv_arch:
        net.add(vgg_block(num_convs, filters))
    net.add(tf.keras.models.Sequential([
        Flatten(),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        Dense(4096, activation='relu'),
        Dropout(0.5),
        Dense(10, activation='sigmoid')
    ]))
    return net

# 构造网络
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
VGG = vgg_net(conv_arch)

In [28]:
# 构造一个高和宽为244的单通道数据样本观察每一层的形状

X = tf.random.uniform((1,224,224,1))
for blk in VGG.layers:
    X = blk(X)
    print(blk.name, 'output shape:\t', X.shape)

sequential_21 output shape:	 (1, 112, 112, 64)
sequential_22 output shape:	 (1, 56, 56, 128)
sequential_23 output shape:	 (1, 28, 28, 256)
sequential_24 output shape:	 (1, 14, 14, 512)
sequential_25 output shape:	 (1, 7, 7, 512)
sequential_26 output shape:	 (1, 10)


In [29]:
# 获取数据与训练模型

def get_data():
    # 获取数据
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

    # 修改数据shape (channel=1)
    train_images = tf.reshape(train_images, (train_images.shape[0],train_images.shape[1],train_images.shape[2], 1))
    test_images = tf.reshape(test_images, (test_images.shape[0],test_images.shape[1],test_images.shape[2], 1))

    # pad图像shape至244,244
    train_images = tf.image.resize_with_pad(train_images, 224, 224, )
    test_images = tf.image.resize_with_pad(test_images, 224, 224, )
    
    # 标准化图像
    train_images = train_images / 255.0
    test_images = test_images / 255.0
    
    return train_images, train_labels, test_images, test_labels

train_images, train_labels, test_images, test_labels = get_data()

In [33]:
# 训练网络

VGG.compile(optimizer=tf.keras.optimizers.SGD(learning_rate=0.05, momentum=0.0, nesterov=False),
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

# VGG.fit(x=train_images,
#         y=train_labels,
#         batch_size=128,
#         epochs=5,
#         verbose=2)

# 因为CPU训练要很久，所以用GPU训练完了把weights拿过来
VGG.load_weights('files/class7_3_weights.h5')

# 查看test集前2000的准确率
VGG.evaluate(test_images[:2000], test_labels[:2000], verbose=2)

2000/2000 - 91s - loss: 0.2166 - accuracy: 0.9210


[0.21661576175689698, 0.921]

# NiN

## 介绍

本part前几节中介绍的LeNet、AlexNet、VGG在设计上的共同之处是: 先以军基层构成的模块充分抽取空间特征，再以全连接层构成的模块来输出分类结果。其中AlexNet和VGG对LeNet的改进主要在与如何对这个两个模块加宽和加深。

本节介绍的NiN，它提出了另一个思路，即串联多个由卷积层和FC层构成的小网络来构建一个深层网络

**总结: (NiN_block+ MaxPool)\*n + DropOut + NiN_block + Global_Avg_Pool + Flatten**

**NiN_block = Conv + 1x1Conv + 1x1Conv**

## 实现

In [34]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, GlobalAveragePooling2D

NiN块是NiN中的基础块。卷积层的输入和输出通常是四维数组(样本, 高, 宽, 通道数), 而FC层的输入和输出通常是二维数组(样本, 特征)。如果想在FC层后再接上卷积层，则需要将FC层的输出变化为四维。在class6_4中，介绍过1x1卷积层，可以将其看为FC层。因此NiN中使用1x1卷积层来代替FC层，从而使空间信息能够自然传递到后面的层中。

In [35]:
# NiN块
def nin_block(num_filters, kernel_size, strides, padding):
    blk = tf.keras.models.Sequential([
        Conv2D(filters=num_filters, kernel_size=kernel_size, 
               strides=strides, padding=padding, activation='relu'),
        Conv2D(filters=num_filters, kernel_size=1, activation='relu'),
        Conv2D(filters=num_filters, kernel_size=1, activation='relu')
    ])
    return blk

除了NiN块之外，NiN去掉了AlexNet最后的三个FC层，取而代之的，NiN使用了输出通道数等于标签类别数的NiN块，然后使用全局平均池化层对每个同道中人的所有元素求平均并直接用于分类。这里的全局平均池化层即窗口形状等于输入空间维形状的平均池化层。NiN的这个设计的好处是可以显著减小模型参数尺寸，从而缓解过拟合。然而，该设计优势会造成获得有效模型的训练时间增加。

In [36]:
# NiN网络
def nin():
    net = tf.keras.models.Sequential([
        nin_block(num_filters=96, kernel_size=11, strides=4, padding='valid'),
        MaxPool2D(pool_size=3, strides=2),
        nin_block(num_filters=256, kernel_size=5, strides=1, padding='same'),
        MaxPool2D(pool_size=3, strides=2),
        nin_block(num_filters=384, kernel_size=3, strides=1, padding='same'),
        MaxPool2D(pool_size=3, strides=2),
        Dropout(0.5),
        nin_block(num_filters=10, kernel_size=3, strides=1, padding='same'),
        GlobalAveragePooling2D(),
        Flatten()
    ])
    return net
NiN = nin()

In [37]:
# 构造一个高和宽为244的单通道数据样本观察每一层的形状

X = tf.random.uniform((1,224,224,1))
for blk in NiN.layers:
    X = blk(X)
    print(blk.name, 'output shape:\t', X.shape)

sequential_27 output shape:	 (1, 54, 54, 96)
max_pooling2d_49 output shape:	 (1, 26, 26, 96)
sequential_28 output shape:	 (1, 26, 26, 256)
max_pooling2d_50 output shape:	 (1, 12, 12, 256)
sequential_29 output shape:	 (1, 12, 12, 384)
max_pooling2d_51 output shape:	 (1, 5, 5, 384)
dropout_4 output shape:	 (1, 5, 5, 384)
sequential_30 output shape:	 (1, 5, 5, 10)
global_average_pooling2d_3 output shape:	 (1, 10)
flatten_3 output shape:	 (1, 10)


In [38]:
# 获取数据与训练模型

def get_data():
    # 获取数据
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

    # 修改数据shape (channel=1)
    train_images = tf.reshape(train_images, (train_images.shape[0],train_images.shape[1],train_images.shape[2], 1))
    test_images = tf.reshape(test_images, (test_images.shape[0],test_images.shape[1],test_images.shape[2], 1))

    # pad图像shape至244,244
    train_images = tf.image.resize_with_pad(train_images, 224, 224, )
    test_images = tf.image.resize_with_pad(test_images, 224, 224, )
    
    # 标准化图像
    train_images = train_images / 255.0
    test_images = test_images / 255.0
    
    return train_images, train_labels, test_images, test_labels

train_images, train_labels, test_images, test_labels = get_data()

In [41]:
# 训练网络

NiN.compile(optimizer=tf.keras.optimizers.Adam(lr=1e-7),
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])

# NiN.fit(x=train_images,
#         y=train_labels,
#         batch_size=128,
#         epochs=5,
#         verbose=2)

# 因为CPU训练要很久，所以用GPU训练完了把weights拿过来
NiN.load_weights('files/class7_4_weights.h5')

# 查看test集前2000的准确率
NiN.evaluate(test_images[:2000], test_labels[:2000], verbose=2)

2000/2000 - 9s - loss: 2.5901 - accuracy: 0.1475


[2.5901342182159426, 0.1475]

# GoogLeNet

## 介绍

本节介绍的是GoogLeNet第一版。其基础卷积块叫做Inception.下图为Inception块中的结构。

Inception块里有4条并行的线路。前3条线路使用窗口大小分别是$1\times 1$、$3\times 3$和$5\times 5$的卷积层来抽取不同空间尺寸下的信息，其中中间2个线路会对输入先做$1\times 1$卷积来减少输入通道数，以降低模型复杂度。第四条线路则使用$3\times 3$最大池化层，后接$1\times 1$卷积层来改变通道数。4条线路都使用了合适的填充来使输入与输出的高和宽一致。最后我们将每条线路的输出在通道维上连结，并输入接下来的层中去。

<img src="img/class7_5.1.svg" style="zoom:100%">

GoogLeNet跟VGG一样，在主体卷积部分中使用5个模块（block），每个模块之间使用步幅为2的$3\times 3$最大池化层来减小输出高宽。第一模块使用一个64通道的$7\times 7$卷积层。

总结:

1. Inception块相当于一个有4条线路的自网络。它通过不同窗口形状的Conv2D和MaxPool来并行抽取信息，并使用1x1卷积层减少通道数从而降低模型复杂度
2. GoogLeNet将多个设计惊喜的Inception块和其他层串联起来。其中Inception块内的通道数分配之比是在ImageNet数据集上通过大量实验得出的
3. GoogLeNet和它的升级版是非常高效的模型，在类似的精度测试下，它们的计算复杂度往往更低

## 实现

In [42]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, GlobalAveragePooling2D

In [43]:
# Inception块
class Inception(tf.keras.layers.Layer):
    def __init__(self, c1, c2, c3, c4):
        super().__init__()
        # 线路1, 单1x1卷积层
        self.p1_1 = Conv2D(c1, kernel_size=1, padding='same', activation='relu')
        
        # 线路2, 1x1卷积层 后接 3x3卷积层
        self.p2_1 = Conv2D(c2[0], kernel_size=1, padding='same', activation='relu')
        self.p2_2 = Conv2D(c2[1], kernel_size=3, padding='same', activation='relu')
        
        # 线路3, 1x1卷积层 后接 5x5卷积层
        self.p3_1 = Conv2D(c3[0], kernel_size=1, padding='same', activation='relu')
        self.p3_2 = Conv2D(c3[1], kernel_size=5, padding='same', activation='relu')
        
        # 线路4, 3x3最大池化 后接 1x1卷积层
        self.p4_1 = MaxPool2D(pool_size=3, padding='same', strides=1)
        self.p4_2 = Conv2D(c4, kernel_size=1, padding='same', activation='relu')
    
    def call(self, x):
        # 其实就相当于, x为这个块的输入
        # 然后四条path对x输入都有不同的输出
        # 最终的正向传播的结果是把这四条path的结果给concat起来
        # 具体原理参见5.DL计算
        # 其中p1-4 为 第1-4条path输出的结果
        p1 = self.p1_1(x)
        p2 = self.p2_2(self.p2_1(x))
        p3 = self.p3_2(self.p3_1(x))
        p4 = self.p4_2(self.p4_1(x))
        return tf.concat([p1, p2, p3, p4], axis=-1)  # 在通道维上连结输出

In [44]:
# GoogLeNet模型

def googlenet():
    
    # block1 ~ Conv2D + MaxPool
    b1 = tf.keras.models.Sequential([
        Conv2D(filters=64, kernel_size=7, strides=2, padding='same', activation='relu'),
        MaxPool2D(pool_size=3, strides=2, padding='same')
    ])
    
    # block2 ~ Conv2D + Conv2D + MaxPool
    b2 = tf.keras.models.Sequential([
        Conv2D(filters=64, kernel_size=1, strides=1, padding='same', activation='relu'),
        Conv2D(filters=192, kernel_size=3, strides=1, padding='same', activation='relu'),
        MaxPool2D(pool_size=3, strides=2, padding='same')
    ])
    
    # block3 ~ Inception*2 + MaxPool
    b3 = tf.keras.models.Sequential([
        Inception(64, (96, 128), (16, 32), 32),
        Inception(128, (128, 192), (32, 96), 64),
        MaxPool2D(pool_size=3, strides=2, padding='same')
    ])
    
    # block4 ~ Inception*5 + MaxPool
    b4 = tf.keras.models.Sequential([
        Inception(192, (96, 208), (16, 48), 64),
        Inception(160, (112, 224), (24, 64), 64),
        Inception(128, (128, 256), (24, 64), 64),
        Inception(112, (144, 288), (32, 64), 64),
        Inception(256, (160, 320), (32, 128), 128),
        MaxPool2D(pool_size=3, strides=2, padding='same')
    ])
    
    # block5 ~ Inception*2 + GlobalAvgPool
    b5 = tf.keras.models.Sequential([
        Inception(256, (160, 320), (32, 128), 128),
        Inception(384, (192, 384), (48, 128), 128),
        GlobalAveragePooling2D() # 使每个通道的高和宽变为1，其值为该通道的平均值
    ])
    
    # output ~ FC
    net = tf.keras.models.Sequential([
        b1, b2, b3, b4, b5, Dense(10)
    ])
    
    return net

In [45]:
# 构造一个高和宽为96的单通道数据样本观察每一层的形状

GoogLeNet = googlenet()
X = tf.random.uniform(shape=(1, 96, 96, 1))
for layer in GoogLeNet.layers:
    X = layer(X)
    print(GoogLeNet.name, 'output shape:\t', X.shape)

sequential_37 output shape:	 (1, 24, 24, 64)
sequential_37 output shape:	 (1, 12, 12, 192)
sequential_37 output shape:	 (1, 6, 6, 480)
sequential_37 output shape:	 (1, 3, 3, 832)
sequential_37 output shape:	 (1, 1024)
sequential_37 output shape:	 (1, 10)


In [46]:
# 获取数据与训练模型

def get_data():
    # 获取数据
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

    # 修改数据shape (channel=1)
    train_images = tf.reshape(train_images, (train_images.shape[0],train_images.shape[1],train_images.shape[2], 1))
    test_images = tf.reshape(test_images, (test_images.shape[0],test_images.shape[1],test_images.shape[2], 1))

    # pad图像shape至244,244
    train_images = tf.image.resize_with_pad(train_images, 224, 224, )
    test_images = tf.image.resize_with_pad(test_images, 224, 224, )
    
    # 标准化图像
    train_images = train_images / 255.0
    test_images = test_images / 255.0
    
    return train_images, train_labels, test_images, test_labels

train_images, train_labels, test_images, test_labels = get_data()

In [49]:
# 训练网络

GoogLeNet.compile(optimizer=tf.keras.optimizers.Adam(lr=1e-7),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
  
# GoogLeNet.fit(x=train_images,
#               y=train_labels,
#               batch_size=128,
#               epochs=5,
#               verbose=2)

# 因为CPU训练要很久，所以用GPU训练完了把weights拿过来
GoogLeNet.load_weights('files/class7_5_weights.h5')

# 查看test集前2000的准确率
GoogLeNet.evaluate(test_images[:2000], test_labels[:2000], verbose=2)

2000/2000 - 22s - loss: 1.7944 - accuracy: 0.3060


[1.7944151592254638, 0.306]

# 重点总结

- LeNet: (Conv + MaxPool) $\times$ 2 + Flatten + Dense $\times$ 3
- AlexNet: (Conv + MaxPool) $\times$ 2 + (Conv $\times$ 3 + MaxPool) + Flatten + (Dense + DropOut) $\times$ 2 + Dense
- VGG: VGG_block $\times$ N + Flatten + (Dense + DropOut) $\times$ 2 + Dense
    - VGG_block: Conv $\times$ n + MaxPool
- NiN: (NiN_block + MaxPool) $\times$ 3 + DropOut + NiN_block + GlobalAvgPool + Flatten
    - NiN_block: Conv + 1$\times$1Conv + 1$\times$1Conv
- GoogLeNet: Conv + MaxPool + Conv $\times$ 2 + MaxPool + Inception $\times$ 2 + MaxPool + Inception $\times$ 5 + MaxPool + Inception $\times$ 2 + GlobalAvgPool
    - Inception_block: 
        1. 1$\times$1 Conv
        2. 1$\times$1 Conv + Conv
        3. 1$\times$1 Conv + Conv
        4. MaxPool + 1$\times$1 Conv