## 算法介绍

  
  VGG16以及VGG19证明卷积神经网络可以进行到很深层，但当网络变得更深，会面临梯度消失和梯度爆炸问题。梯度消失，就是在深层神经网络的训练过程中，计算得到的梯度越来越小，使得权值得不到更新的情形，这样算法也就失效了。而梯度爆炸则是相反的情况，是指在神经网络训练过程中梯度变得越来越大，权值得到疯狂更新的情形，这样算法得不到收敛，模型也就失效了。当然，其间通过设置 relu 和归一化激活函数层等手段使得我们很好的解决这些问题。但当我们将网络层数加到更深时却发现训练的准确率在逐渐降低。这种并不是由过拟合造成的神经网络训练数据识别准确率降低的现象我们称之为退化（degradation）。  
  
  ![fig1](fig1.png)  
  
  上图我们可以看到，56层的普通卷积网络不管是在训练集还是在测试集上的训练误差都要高于20层的卷积网络，这就是一个典型的退化现象。
  
  残差网络（ResNet）通过残差块（residual block）结构去解决网络的退化问题，残差块通过给网络之间添加一个捷径（shortcuts）或者也叫跳跃连接（skip connetcion），使得捷径之间的网络能够学习一个恒等函数，使得在加深网络的情况下训练效果至少不会变差。下图是残差块的基本结构：  
  
  ![fig2](fig2.png)  
  
  以上残差块是一个两层的网络结构，输入x经过两层的加权和激活得到F(x)的输出，这是典型的普通卷积网络结构。残差块的区别在于，添加了一个从输入x到两层网络输出单元的shortcut，这使得输入节点的信息单元直接获得了与输出节点的信息单元通信的能力 ，这时候在进行relu激活之前的输出就不再是F(x)了，而是F(x)+x。当很多个具备类似结构的残差块组建在一起，则为残差网络。
  
  普通深度神经网络和残差深度神经网络结构上的不同如下图所示：
  
  ![fig3](fig3.png)  
  

## 算法实现

  
  对于残差块的实现，关键在于实现一个跳跃连接，根据残差块输入输出大小不同分为两种：一种是输入输出一致情况下的Identity Block，另一种是输入输出不一致情况下的Convolutional Block。 对于输入输出不一致的情况，跳跃连接中通过卷积操作，使得输入输出一致。 
  
  下图是Identity Block的结构图：  
  
  ![fig4](fig4.png) 
  
  根据结构图，基于keras架构实现如下：

In [4]:
import keras

def identity_block(x, f, filters, stage, block):
    '''
    identity block
    :param x: input tensor (n, W, H, C)
    :param f: integer, conv kernel size
    :param filters: integer list, the number of filters in the conv layers
    :param stage: integer, position in the network(naem)
    :param block: string, name the layers, position in the network
    
    returns:
    :param x: output tensor(n, W, H, C)
    '''
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    F1, F2, F3 = filters
    
    x_shortcut = x#输入输出一致
    
    #conv1
    x = keras.layers.convolutional.Conv2D(filters = F1, kernel_size = (1, 1), strides = (1, 1), padding = 'valid',
                                          kernel_initializer = keras.initializers.glorot_uniform(seed=0), 
                                          name = conv_name_base + '2a')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '2a')(x)
    x = keras.layers.Activation('relu')(x)
    
    #conv2
    x = keras.layers.convolutional.Conv2D(filters = F2, kernel_size = (f, f), strides = (1, 1), padding = 'same',
                                         kernel_initializer = keras.initializers.glorot_uniform(seed=0),
                                         name = conv_name_base + '2b')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '2b')(x)
    x = keras.layers.Activation('relu')(x)
    
    #conv3
    x = keras.layers.convolutional.Conv2D(filters = F3, kernel_size = (1, 1), strides = (1, 1), padding = 'valid',
                                         kernel_initializer = keras.initializers.glorot_uniform(seed=0),
                                         name = conv_name_base + '2c')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '2c')(x)
    
    #add
    x = keras.layers.Add()([x, x_shortcut])
    x = keras.layers.Activation('relu')(x)
    
    return x

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


  
  下图是Convolutional Block的结构图：  
  
  ![fig5](fig5.png)
  
  具体编码实现为：

In [21]:
def convolutional_block(x, f, filters, stage, block, stride=2):
    '''
    convolutional block
    :param x: input tensor (n, W, H, C)
    :param f: integer, conv kernel size
    :param filters: integer list, the number of filters in the conv layers
    :param stage: integer, position in the network(naem)
    :param block: string, name the layers, position in the network
    :param stride: integer, stride params to be usedf
    
    returns:
    :param x: output tensor(n, W, H, C)
    '''
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    F1, F2, F3 = filters
    
    x_shortcut = x#输入输出不一致
    
    #conv1
    x = keras.layers.convolutional.Conv2D(filters = F1, kernel_size = (1, 1), strides = (stride, stride), padding = 'valid',
                                         kernel_initializer = keras.initializers.glorot_uniform(seed=0),
                                         name = conv_name_base + '2a')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '2a')(x)
    x = keras.layers.Activation('relu')(x)
    
    #conv2
    x = keras.layers.convolutional.Conv2D(filters = F2, kernel_size = (f, f), strides = (1, 1), padding = 'same',
                                         kernel_initializer = keras.initializers.glorot_uniform(seed=0),
                                         name = conv_name_base + '2b')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '2b')(x)
    x = keras.layers.Activation('relu')(x)
    
    #conv3
    x = keras.layers.convolutional.Conv2D(filters = F3, kernel_size = (1, 1), strides = (1, 1), padding = 'valid',
                                         kernel_initializer = keras.initializers.glorot_uniform(seed=0),
                                         name = conv_name_base + '2c')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '2c')(x)
    
    #shortcutx
    x_shortcut = keras.layers.convolutional.Conv2D(filters = F3, kernel_size = (1, 1), strides = (stride, stride), padding = 'valid',
                                                  kernel_initializer = keras.initializers.glorot_uniform(seed=0),
                                                  name = conv_name_base + '1')(x_shortcut)
    x_shortcut = keras.layers.normalization.BatchNormalization(axis = 3, name = bn_name_base + '1')(x_shortcut)
    
    #add
    x = keras.layers.Add()([x, x_shortcut])
    x = keras.layers.Activation('relu')(x)
    return x

  基于上述搭建好的残差模块，开始构建一个resnet50残差网络，网络基本结构如下：
  
  ![fig6](fig6.png)
  
  代码实现如下：

In [1]:
def ResNet50(input_shape=(224,224,3), classes = 1000, use_dropout = True, dropout_rate = 0.2):
    '''
    resnet50
    :param input_shape:  tuple, input tensor shape
    :param classes: integer, classes defined by your dataset
    :param use_dropout: bool, use dropout or not
    :param dropout_rate: float, only valid if use_dropout is true
    
    returns:
    
    keras model
    
    '''
    
    x_input = keras.layers.Input(input_shape)
    x = keras.layers.convolutional.ZeroPadding2D(padding = (3, 3))(x_input)
    
    #stage1
    x = keras.layers.convolutional.Conv2D(filters = 64, kernel_size = (7, 7), strides = (2, 2), 
                                         kernel_initializer = keras.initializers.glorot_uniform(seed=0), name='conv1')(x)
    x = keras.layers.normalization.BatchNormalization(axis = 3, name = 'bn_conv1')(x)
    x = keras.layers.Activation('relu')(x)
    x = keras.layers.pooling.MaxPooling2D(pool_size = (3, 3), strides = (2, 2))(x)
    
    #stage2
    x = convolutional_block(x = x, f = 3, filters = [64, 64, 256], stage = 2, block = 'a', stride = 1)
    x = identity_block(x = x, f = 3, filters = [64, 64, 256], stage = 2, block = 'b')
    x = identity_block(x = x, f = 3, filters = [64, 64, 256], stage = 2, block = 'c')
    
    #stage3
    x = convolutional_block(x = x, f = 3, filters = [128, 128, 512], stage = 3, block = 'a', stride = 2)
    x = identity_block(x = x, f = 3, filters = [128, 128, 512], stage = 3, block = 'b')
    x = identity_block(x = x, f = 3, filters = [128, 128, 512], stage = 3, block = 'c')
    x = identity_block(x = x, f = 3, filters = [128, 128, 512], stage = 3, block = 'd')
    
    #stage4
    x = convolutional_block(x = x, f = 3, filters = [256, 256, 1024], stage = 4, block = 'a', stride = 2)
    x = identity_block(x = x, f = 3, filters = [256, 256, 1024], stage = 4, block = 'b')
    x = identity_block(x = x, f = 3, filters = [256, 256, 1024], stage = 4, block = 'c')
    x = identity_block(x = x, f = 3, filters = [256, 256, 1024], stage = 4, block = 'd')
    x = identity_block(x = x, f = 3, filters = [256, 256, 1024], stage = 4, block = 'e')
    x = identity_block(x = x, f = 3, filters = [256, 256, 1024], stage = 4, block = 'f')
    
    #stage5
    x = convolutional_block(x = x, f = 3, filters = [512, 512, 2048], stage = 5, block = 'a', stride = 2)
    x = identity_block(x = x, f = 3, filters = [512, 512, 2048], stage = 5, block = 'b')
    x = identity_block(x = x, f = 3, filters = [512, 512, 2048], stage = 5, block = 'c')
    
    #avgpool
    x = keras.layers.pooling.AveragePooling2D(pool_size = (2, 2), strides = (2, 2))(x)
    #flatten 
    x = keras.layers.core.Flatten(name = 'flatten')(x)
    
    #dropout
    if use_dropout:
        x = keras.layers.Dropout(dropout_rate)(x)
    
    #FC
    x = keras.layers.core.Dense(units = classes, activation='softmax', kernel_initializer='glorot_uniform', name = 'fc' + str(classes))(x)
    #create model
    model = keras.models.Model(inputs = x_input, outputs = x, name = 'ResNet50')
    
    return model

In [24]:
model = ResNet50()
print(model.summary())

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_9 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
zero_padding2d_9 (ZeroPadding2D (None, 230, 230, 3)  0           input_9[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 64) 9472        zero_padding2d_9[0][0]           
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 112, 112, 64) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation

  
  上面对resnet50基于keras框架进行了整体实现，其实，keras中的applications模块中已经集成了众多带有预训练权重的深度神经网络模型，resnet50也在其中，下面对基于keras.applications模块构建resnet50进行说明：
  
  ResNet50预训练权重由Kaiming He发布的预训练权重移植而来， 基于[MIT License](https://github.com/KaimingHe/deep-residual-networks/blob/master/LICENSE)  
  其接口定义如下：  
   ```
  keras.applications.resnet50.ResNet50(
      include_top=True, 
      weights='imagenet',
      input_tensor=None, 
      input_shape=None,
      pooling=None,
      classes=1000)```
  
  通过上述接口，会定义一个50层残差网络模型，权重训练子ImageNet  
  模型在Theano和Tensorflow后端均可使用，并接受channels_first和channels_last两种输入维度顺序  
  模型的默认输入尺寸是224*224
  
  参数：  
  - include_top：是否保留顶层的全连接网络
  - weights：None代表随机初始化，即不加载预训练权重。'imagenet'代表加载预训练权重
  - input_tensor：可填入Keras tensor作为模型的图像输出tensor
  - input_shape：可选，仅当include_top=False有效，应为长为3的tuple，指明输入图片的shape，图片的宽高必须大于197，如(200,200,3)
  - pooling：当include_top=False时，该参数指定了池化方式。None代表不池化，最后一个卷积层的输出为4D张量。‘avg’代表全局平均池化，‘max’代表全局最大     值池化。
  - classes：可选，图片分类的类别数，仅当include_top=True并且不加载预训练权重时可用。
  
  返回值：  
  Keras 模型对象  
  
  下面是接口的使用示例:  
  - 我们可以直接使用训练好的网络对自己的数据进行预测，需要注意的是，要预测的数据需要和训练数据类别一致，下面我们通过keras.application.resnet50模块对下面的数据进行类别预测：  
  ![dog](dog.jpg)

In [2]:
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

model = ResNet50(weights='imagenet')

img_path = 'dog.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=3)[0])

Predicted: [('n02093256', 'Staffordshire_bullterrier', 0.81218076), ('n02093428', 'American_Staffordshire_terrier', 0.18362524), ('n02087394', 'Rhodesian_ridgeback', 0.0022921043)]


- 利用keras.application.resnet50模块提取图像特征：

In [4]:
model = ResNet50(weights='imagenet', include_top=False)

img_path = 'dog.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

features = model.predict(x)
print(features.shape)

(1, 1, 1, 2048)


## 参考文档
(1) He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.  
(2) https://www.deeplearning.ai/  
(3) https://keras-cn.readthedocs.io/en/latest/other/application/