#### NiN 网络中的网络

使用kernel大小为$1 \times 1$ 的卷积

下面代码定义了一个这样的代码块：它由一个正常的卷积层接上两个kernel是 $1 \times 1$ 的卷积层构成。后面两个充当两个全连接层的角色。

In [1]:
from mxnet.gluon import nn

def mlpconv(channels, kernel_size, padding, strides=1, max_pooling=True):
    out = nn.Sequential()
    out.add(
        nn.Conv2D(channels=channels, kernel_size=kernel_size, strides=strides, padding=padding, activation='relu'),
        nn.Conv2D(channels=channels, kernel_size=1, padding=0, strides=1, activation='relu'),
        nn.Conv2D(channels=channels, kernel_size=1, padding=0, strides=1, activation='relu')
    )
    if max_pooling:
        out.add(nn.MaxPool2D(pool_size=3, strides=2))
    return out

RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb

RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb

In [4]:
from mxnet import nd

blk = mlpconv(64, 3, 0)
blk.initialize()

x = nd.random.uniform(shape=(32, 3, 16, 16))
y = blk(x)
y.shape

(32, 64, 6, 6)

NiN 的卷积层的参数跟AlexNet类似，使用三组不同的设定

- kernel: $11 \times 11$, channels: 96
- kernel: $5 \times 5$, channels: 256
- kernel: $3 \times 3$, channels: 384

除了使用了$1 \times 1$卷积外，NiN在最后不是使用全连接，而是使用通道数为输出类别个数的`mlpconv`, 外接一个平均池化层来将每个通道例的数值平均称一个标量。

In [5]:
net = nn.Sequential()
with net.name_scope():
    net.add(
        mlpconv(96, 11, 0, strides=4),
        mlpconv(256, 5, 2),
        mlpconv(384, 3, 1),
        nn.Dropout(.5),
        #目标类为10类
        mlpconv(10, 3, 1, max_pooling=False),
        # 输入为 batch_size * 10 * 5 * 5, 通过AvgPool2D转成
        # batch_size * 10 * 1 * 1
        # 我们可以使用 nn.AvgPool2D(pool_size=5)
        # 但更方便的是使用全局池化，可以避免估算pool_size大小
        nn.GlobalAvgPool2D(),
        nn.Flatten()
    )

#### 获取数据训练

In [None]:
import sys
sys.path.append('..')
import utils
from mxnet import gluon, init

train_data, test_data = utils.load_data_fashion_mnist(batch_size=64, resize=224)

ctx = utils.try_gpu()
net.initialize(ctx=ctx, init=init.Xavier())

loss = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

utils.train(train_data, test_data, net, loss, trainer, ctx, num_epochs=1)

Start training on  cpu(0)
