# ResNet
微软亚洲研究院的研究员设计了更深但结构更加简单的网络 ResNet，并且凭借这个网络子在 2015 年 ImageNet 比赛上大获全胜。

ResNet 有效地解决了深度神经网络难以训练的问题，可以训练高达 1000 层的卷积网络。网络之所以难以训练，是因为存在着梯度消失的问题，离 loss 函数越远的层，在反向传播的时候，梯度越小，就越难以更新，随着层数的增加，这个现象越严重。之前有两种常见的方案来解决这个问题：

1.按层训练，先训练比较浅的层，然后在不断增加层数，但是这种方法效果不是特别好，而且比较麻烦

2.使用更宽的层，或者增加输出通道，而不加深网络的层数，这种结构往往得到的效果又不好

ResNet 通过引入了跨层链接解决了梯度回传消失的问题。

这就普通的网络连接跟跨层残差连接的对比图，使用普通的连接，上层的梯度必须要一层一层传回来，而是用残差连接，相当于中间有了一条更短的路，梯度能够从这条更短的路传回来，避免了梯度过小的情况。

假设某层的输入是 x，期望输出是 H(x)， 如果我们直接把输入 x 传到输出作为初始结果，这就是一个更浅层的网络，更容易训练，而这个网络没有学会的部分，我们可以使用更深的网络 F(x) 去训练它，使得训练更加容易，最后希望拟合的结果就是 F(x) = H(x) - x，这就是一个残差的结构

残差网络的结构就是上面这种残差块的堆叠，下面让我们来实现一个 residual block

In [1]:
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import

import tensorflow as tf
from utils import cifar10_input

  from ._conv import register_converters as _register_converters


In [2]:
batch_size = 64

train_imgs, train_labels, val_imgs, val_labels = cifar10_input.load_data(image_size=96)

In [3]:
import tensorflow.contrib.slim as slim

### 首先定义一个下采样函数

In [4]:
def subsample(x, factor, scope=None):
    if factor == 1:
        return x
    return slim.max_pool2d(x, [1, 1], factor, scope=scope)

### 定义一个`residual_block`

In [5]:
def residual_block(x, bottleneck_depth, out_depth, stride=1, scope='residual_block'):
    in_depth = x.get_shape().as_list()[-1]
    with tf.variable_scope(scope):
        # 如果通道数没有改变,用下采样改变输入的大小
        if in_depth == out_depth:
            shortcut = subsample(x, stride, 'shortcut')
        # 如果有变化, 用卷积改变输入的通道以及大小
        else:
            shortcut = slim.conv2d(x, out_depth, [1, 1], stride=stride, activation_fn=None, scope='shortcut')

        residual = slim.conv2d(x, bottleneck_depth, [1, 1], stride=1, scope='conv1')
        residual = slim.conv2d(residual, bottleneck_depth, 3, stride, scope='conv2')
        residual = slim.conv2d(residual, out_depth, [1, 1], stride=1, activation_fn=None, scope='conv3')

        # 相加操作
        output = tf.nn.relu(shortcut + residual)

        return output

### 构建`resnet`整体结构

In [6]:
def resnet(inputs, num_classes, reuse=None, is_training=None, verbose=False):
    with tf.variable_scope('resnet', reuse=reuse):
        net = inputs
        
        if verbose:
            print('input: {}'.format(net.shape))
        
        with slim.arg_scope([slim.batch_norm], is_training=is_training):
            with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], padding='SAME'):
                with tf.variable_scope('block1'):
                    net = slim.conv2d(net, 32, [5, 5], stride=2, scope='conv_5x5')

                    if verbose:
                        print('block1: {}'.format(net.shape))
                    
                with tf.variable_scope('block2'):
                    net = slim.max_pool2d(net, [3, 3], 2, scope='max_pool')
                    net = residual_block(net, 32, 128, scope='residual_block1')
                    net = residual_block(net, 32, 128, scope='residual_block2')

                    if verbose:
                        print('block2: {}'.format(net.shape))
                    
                with tf.variable_scope('block3'):
                    net = residual_block(net, 64, 256, stride=2, scope='residual_block1')
                    net = residual_block(net, 64, 256, scope='residual_block2')

                    if verbose:
                        print('block3: {}'.format(net.shape))
                    
                with tf.variable_scope('block4'):
                    net = residual_block(net, 128, 512, stride=2, scope='residual_block1')
                    net = residual_block(net, 128, 512, scope='residual_block2')

                    if verbose:
                        print('block4: {}'.format(net.shape))
                
                with tf.variable_scope('classification'):
                    net = tf.reduce_mean(net, [1, 2], name='global_pool', keep_dims=True)
                    net = slim.flatten(net, scope='flatten')
                    net = slim.fully_connected(net, num_classes, activation_fn=None, normalizer_fn=None, scope='logit')

                    if verbose:
                        print('classification: {}'.format(net.shape))
                    
                return net

In [7]:
with slim.arg_scope([slim.conv2d], activation_fn=tf.nn.relu, normalizer_fn=slim.batch_norm) as sc:
    conv_scope = sc

In [8]:
is_training = tf.placeholder(tf.bool, name='is_training')

with slim.arg_scope(conv_scope):
    train_out = resnet(train_imgs, 10, is_training=is_training, verbose=True)
    val_out = resnet(val_imgs, 10, is_training=is_training, reuse=True)

input: (64, 96, 96, 3)
block1: (64, 48, 48, 32)
block2: (64, 24, 24, 128)
block3: (64, 12, 12, 256)
block4: (64, 6, 6, 512)
Instructions for updating:
keep_dims is deprecated, use keepdims instead
classification: (64, 10)


### 构建训练

In [9]:
with tf.variable_scope('loss'):
    train_loss = tf.losses.sparse_softmax_cross_entropy(labels=train_labels, logits=train_out, scope='train')
    val_loss = tf.losses.sparse_softmax_cross_entropy(labels=val_labels, logits=val_out, scope='val')

In [10]:
with tf.name_scope('accuracy'):
    with tf.name_scope('train'):
        train_acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(train_out, axis=-1, output_type=tf.int32), train_labels), tf.float32))
    with tf.name_scope('val'):
        val_acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(val_out, axis=-1, output_type=tf.int32), val_labels), tf.float32))

In [11]:
lr = 0.01

opt = tf.train.MomentumOptimizer(lr, momentum=0.9)

In [12]:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    train_op = opt.minimize(train_loss)

### 开始训练

In [13]:
from utils.learning import train_with_bn

In [14]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

train_with_bn(sess, train_op, train_loss, train_acc, val_loss, val_acc, 20000, is_training)

sess.close()

[train]: step 0 loss = 2.2766 acc = 0.1562 (0.0166 / batch)
[val]: step 0 loss = 2.2960 acc = 0.1250
[train]: step 1000 loss = 1.6274 acc = 0.4531 (0.0833 / batch)
[train]: step 2000 loss = 2.4385 acc = 0.4375 (0.0828 / batch)
[train]: step 3000 loss = 0.6490 acc = 0.7656 (0.0824 / batch)
[train]: step 4000 loss = 1.1083 acc = 0.6562 (0.0826 / batch)
[val]: step 4000 loss = 1.4970 acc = 0.5625
[train]: step 5000 loss = 0.9370 acc = 0.7031 (0.0828 / batch)
[train]: step 6000 loss = 0.7490 acc = 0.7344 (0.0826 / batch)
[train]: step 7000 loss = 0.4196 acc = 0.8281 (0.0827 / batch)
[train]: step 8000 loss = 0.4825 acc = 0.7969 (0.0827 / batch)
[val]: step 8000 loss = 1.3932 acc = 0.7031
[train]: step 9000 loss = 0.3133 acc = 0.8750 (0.0829 / batch)
[train]: step 10000 loss = 0.3113 acc = 0.8906 (0.0827 / batch)
[train]: step 11000 loss = 0.1614 acc = 0.9531 (0.0825 / batch)
[train]: step 12000 loss = 0.1955 acc = 0.8906 (0.0827 / batch)
[val]: step 12000 loss = 2.0709 acc = 0.6719
[train]