##  迁移学习 (tensorflow)

- 从哪里迁移？
    - 源网络：基于imageNet数据集训练好的AlexNet
    - 本实验使用的源网络的[参考链接](http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/)
    - 权重文件为 bvlc_alexnet.npy （可从上面[链接](http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/)处点击下载）
- 迁移到哪里？
    - 目标数据集：cifar10
- 怎么迁移？
    - 去除AlexNet最后一层（原来的1000个类别的输出层）
    - 加上神经元个数为10的输出层（cifar10有10个类别）
    - 加载前面所有层的源网络上训练好的权重
    - 随机初始化最后一层权重
    - 训练最后一层
 
构建网络的部分代码参考了[该链接](https://github.com/kratzert/finetune_alexnet_with_tensorflow/tree/5d751d62eb4d7149f4e3fd465febf8f07d4cea9d)，在此基础上做了些调整

In [1]:
import tensorflow as tf
import numpy as np
import keras.datasets.cifar10 as cifar10
import cv2
from tqdm import tqdm
import sys
import os
from datetime import datetime
#os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
#os.environ["CUDA_VISIBLE_DEVICES"] = "0"


Using TensorFlow backend.


### 一些提升代码可重用性的辅助方法



In [2]:
# 卷积层
def conv(x, filter_height, filter_width, num_filters, stride_y, stride_x, name,padding='SAME', groups=1):

    # Get number of input channels
    input_channels = int(x.get_shape()[-1])

    # Create lambda function for the convolution
    convolve = lambda i, k: tf.nn.conv2d(i, k,
                                   strides = [1, stride_y, stride_x, 1],
                                   padding = padding)

    with tf.variable_scope(name) as scope:
        # Create tf variables for the weights and biases of the conv layer
        weights = tf.get_variable('weights',
                                  shape = [filter_height, filter_width,
                                  input_channels/groups, num_filters])
        biases = tf.get_variable('biases', shape = [num_filters])


        if groups == 1:
            conv = convolve(x, weights)

        # In the cases of multiple groups, split inputs & weights and
        else:
            # Split input and weights and convolve them separately
            input_groups = tf.split(axis = 3, num_or_size_splits=groups, value=x)
            weight_groups = tf.split(axis = 3, num_or_size_splits=groups, value=weights)
            output_groups = [convolve(i, k) for i,k in zip(input_groups, weight_groups)]

            # Concat the convolved output together again
            conv = tf.concat(axis = 3, values = output_groups)

        # Add biases
        bias = tf.nn.bias_add(conv, biases)

        # Apply relu function
        relu = tf.nn.relu(bias, name = scope.name)

        return relu

# 全连接层
def fc(x, num_in, num_out, name, relu = True):
    with tf.variable_scope(name) as scope:
        weights = tf.get_variable('weights', shape=[num_in, num_out], trainable=True)
        biases = tf.get_variable('biases', [num_out], trainable=True)
        # Matrix multiply weights and inputs and add bias
        act = tf.nn.xw_plus_b(x, weights, biases, name=scope.name)

        if relu == True:
            # Apply ReLu non linearity
            relu = tf.nn.relu(act)      
            return relu
        else:
            return act
    
# 池化层
def max_pool(x, filter_height, filter_width, stride_y, stride_x, name, padding='SAME'):
    return tf.nn.max_pool(x, ksize=[1, filter_height, filter_width, 1],
                        strides = [1, stride_y, stride_x, 1],
                        padding = padding, name = name)

def lrn(x, radius, alpha, beta, name, bias=1.0):
    return tf.nn.local_response_normalization(x, depth_radius = radius, alpha = alpha,
                                            beta = beta, bias = bias, name = name)
  
def dropout(x, keep_prob):
    return tf.nn.dropout(x, keep_prob)

### 创建网络
注意：网络结构中迁移自别人网络的那部分网络，需要和别人训练时使用的网络结构一致，这样才可以方便我们载入别人预训练好的权重。假设别人提供了网络结构代码与权重文件，那我们就创建一样的网络结构并导入权重（例子中的情况）。若别人提供了网络结构meta后缀的文件和ckpt权重文件，那我们就需要以对应的从meta文件导入网络的方式创建网络。总而言之，要一一对应。

In [3]:
class AlexNet(object):
    
    def __init__(self, x, keep_prob, num_classes, skip_layer, weights_path = 'DEFAULT'):
    
        # 初始化相关参数
        self.X = x
        self.NUM_CLASSES = num_classes
        self.KEEP_PROB = keep_prob
        self.SKIP_LAYER = skip_layer #不加载预训练参数的层

        if weights_path == 'DEFAULT':      
            self.WEIGHTS_PATH = 'bvlc_alexnet.npy' #默认权重文件路径
        else:
            self.WEIGHTS_PATH = weights_path #自定义权重文件路径

        # 创建网络
        self.create()

    def create(self):

        # 1st Layer: Conv (w ReLu) -> Lrn -> Pool
        conv1 = conv(self.X, 11, 11, 96, 4, 4, padding = 'VALID', name = 'conv1')
        norm1 = lrn(conv1, 2, 1e-05, 0.75, name = 'norm1')
        pool1 = max_pool(norm1, 3, 3, 2, 2, padding = 'VALID', name = 'pool1')

        # 2nd Layer: Conv (w ReLu) -> Lrn -> Poolwith 2 groups
        conv2 = conv(pool1, 5, 5, 256, 1, 1, groups = 2, name = 'conv2')
        norm2 = lrn(conv2, 2, 1e-05, 0.75, name = 'norm2')
        pool2 = max_pool(norm2, 3, 3, 2, 2, padding = 'VALID', name ='pool2')

        # 3rd Layer: Conv (w ReLu)
        conv3 = conv(pool2, 3, 3, 384, 1, 1, name = 'conv3')

        # 4th Layer: Conv (w ReLu) splitted into two groups
        conv4 = conv(conv3, 3, 3, 384, 1, 1, groups = 2, name = 'conv4')

        # 5th Layer: Conv (w ReLu) -> Pool splitted into two groups
        conv5 = conv(conv4, 3, 3, 256, 1, 1, groups = 2, name = 'conv5')
        pool5 = max_pool(conv5, 3, 3, 2, 2, padding = 'VALID', name = 'pool5')

        # 6th Layer: Flatten -> FC (w ReLu) -> Dropout
        flattened = tf.reshape(pool5, [-1, 6*6*256])
        fc6 = fc(flattened, 6*6*256, 4096, name='fc6')
        dropout6 = dropout(fc6, self.KEEP_PROB)

        # 7th Layer: FC (w ReLu) -> Dropout
        fc7 = fc(dropout6, 4096, 4096, name = 'fc7')
        dropout7 = dropout(fc7, self.KEEP_PROB)

        # 8th Layer: FC and return unscaled activations
        # (for tf.nn.softmax_cross_entropy_with_logits)
        # 和原始网络不一样的一层， 输出 self.NUM_CLASSES 的大小为10 而不是原来的1000
        self.fc8 = fc(dropout7, 4096, self.NUM_CLASSES, relu = False, name='fc8')

    
    def load_initial_weights(self, session):
        """
        对需要加载预先训练好的权重的层，从bvlc_alexnet.npy加载预先训练好的权重到网络中来。
        """

        # 加载权重文件
        weights_dict = np.load(self.WEIGHTS_PATH, encoding = 'bytes').item()
        
        # 循环所有的层
        for op_name in weights_dict:
            # 检查当前层是否是我们想要加载预训练好的权重并冻结权重不进行训练的层
            if op_name not in self.SKIP_LAYER:
                with tf.variable_scope(op_name, reuse = True):
                    # 加载参数到对应的层
                    for data in weights_dict[op_name]:
                        # Biases
                        if len(data.shape) == 1:
                            var = tf.get_variable('biases', trainable = False)
                            session.run(var.assign(data))
                        # Weights
                        else:
                            var = tf.get_variable('weights', trainable = False)
                            session.run(var.assign(data))
            
     

### 加载 cifar10 数据集
首次加载时会自动从网络自动下载该数据集，非首次加载时会自动从本地读取

In [4]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()


### 通道转换 RGB -> BGR
cifar10.load_data()获得的数据集是RGB通道的，但是我们迁移的源网络则是基于BGR通道的图片进行训练的，因此，首先我们需要进行通道转换的预处理

In [5]:
x_train[:, :, :, 0], x_train[:, :, :, 2] = x_train[:, :, :, 2], x_train[:, :, :, 0].copy()
x_test[:, :, :, 0], x_test[:, :, :, 2] = x_test[:, :, :, 2], x_test[:, :, :, 0].copy()

- 训练数据集一共有50000张图片
- 测试数据集一共有10000张图片
- 图片大小 32 * 32 * 3 

In [6]:
print(x_train.shape)
print(x_test.shape)

(50000, 32, 32, 3)
(10000, 32, 32, 3)


### 图片生成器
包括对图片进行翻转，resize, 去均值化，one_hot编码等一系列预处理操作

In [7]:
class ImageDataGenerator:
    def __init__(self, x, y, horizontal_flip=False, shuffle=False, 
                 mean = np.array([104., 117., 124.]), scale_size=(227, 227),
                 nb_classes = 10):
                
        # 初始化参数
        self.horizontal_flip = horizontal_flip #是否水平翻转
        self.n_classes = nb_classes #输出类别数
        self.shuffle = shuffle #是否打乱数据
        self.mean = mean #图片均值，以便进行去均值化处理
        self.scale_size = scale_size #图片resize后的尺寸
        self.pointer = 0
        
        self.images = x 
        self.labels = y
        self.data_size = len(self.labels)
        
        if self.shuffle:
            self.shuffle_data()

        
    def shuffle_data(self):
        """
        打乱数据
        """
        images = self.images.copy()
        labels = self.labels.copy()
        self.images = []
        self.labels = []
        
        idx = np.random.permutation(len(labels))
        for i in idx:
            self.images.append(images[i])
            self.labels.append(labels[i])
                
    def reset_pointer(self):
        """
        重置指针指向初始位置
        """
        self.pointer = 0
        
        if self.shuffle:
            self.shuffle_data()
        
    
    def next_batch(self, batch_size):
        """
        生成一批量的数据
        """
        # 获取一批图片
        batch_images = self.images[self.pointer:self.pointer + batch_size]
        batch_labels = self.labels[self.pointer:self.pointer + batch_size]
        
        # 更新指针
        self.pointer += batch_size
        
        images = np.ndarray([batch_size, self.scale_size[0], self.scale_size[1], 3])
        
        one_hot_labels = np.zeros((batch_size, self.n_classes))
        for i in range(len(batch_images)):
            img = batch_images[i]
            
            # 随机水平翻转图片
            if self.horizontal_flip and np.random.random() < 0.5:
                img = cv2.flip(img, 1)
            
            # resize图片大小
            img = cv2.resize(img, (self.scale_size[0], self.scale_size[1]))
            img = img.astype(np.float32)
            
            # 去均值化处理
            img -= self.mean
                                                                 
            images[i] = img
            
            # 对结果进行one-hot编码 
            one_hot_labels[i][batch_labels[i]] = 1


        # return 一批预处理完后的图片和对应标签
        return images, one_hot_labels



### 训练网络

In [8]:
learning_rate = 0.01
num_epochs = 10
batch_size = 256 # 若电脑因为显存大小不够或内存大小不够而报错，请减小batch_size

dropout_rate = 0.5
num_classes = 10 # 最后一层的输出类别数

# train_layers 指定哪些网络层是需要训练的
train_layers = ['fc8'] # 此处仅训练最后新添加的一层的参数
#train_layers = ['fc8', 'fc7', 'fc6', 'fc7', 'conv5', 'conv4', 'conv3', 'conv2', 'conv1'] # 此处被注释掉的地方是为了对比实验存在的


# 网络的输入与输出
x = tf.placeholder(tf.float32, [batch_size, 227, 227, 3])
y = tf.placeholder(tf.float32, [None, num_classes])
keep_prob = tf.placeholder(tf.float32)

# 模型初始化
model = AlexNet(x, keep_prob, num_classes, train_layers)

# score指向模型的输出
score = model.fc8

# 获取所有待训练的 variables
var_list = [v for v in tf.trainable_variables() if v.name.split('/')[0] in train_layers]

# 计算损失函数
with tf.name_scope("cross_entropy"):
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = score, labels = y))  

# Train op
with tf.name_scope("train"):
    # 获取梯度
    gradients = tf.gradients(loss, var_list)
    gradients = list(zip(gradients, var_list))

    # 对可训练的层使用GradientDescentOptimizer
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.apply_gradients(grads_and_vars=gradients)


# 计算accuracy
with tf.name_scope("accuracy"):
    correct_pred = tf.equal(tf.argmax(score, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


# 初始化数据生成器
train_generator = ImageDataGenerator(x_train,y_train, 
                                     horizontal_flip = True, shuffle = True)
val_generator = ImageDataGenerator(x_test,y_test, shuffle = False) 

# 计算每一个epoch的step数
train_batches_per_epoch = np.floor(train_generator.data_size / batch_size).astype(np.int16)
val_batches_per_epoch = np.floor(val_generator.data_size / batch_size).astype(np.int16)


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # 加载模型参数
    model.load_initial_weights(sess)

    print("{} Start training...".format(datetime.now()))

    # Loop over number of epochs
    for epoch in range(num_epochs):

        print("{} Epoch number: {}".format(datetime.now(), epoch+1))
        sys.stdout.flush()
        step = 1
        pbar = tqdm(total=train_batches_per_epoch-1)
    
        while step < train_batches_per_epoch:
            pbar.update(1)
            # Get a batch of images and labels
            batch_xs, batch_ys = train_generator.next_batch(batch_size)

            # And run the training op
            sess.run(train_op, feed_dict={x: batch_xs, 
                                          y: batch_ys, 
                                          keep_prob: dropout_rate})
            step += 1
        pbar.close()
        # Validate the model on the entire validation set
        print("{} Start validation".format(datetime.now()))
        sys.stdout.flush()
        test_acc = 0.
        test_count = 0
        for _ in tqdm(range(val_batches_per_epoch)):
            batch_tx, batch_ty = val_generator.next_batch(batch_size)
            acc = sess.run(accuracy, feed_dict={x: batch_tx, 
                                                y: batch_ty, 
                                                keep_prob: 1.})
            test_acc += acc
            test_count += 1
        test_acc /= test_count
        print("{} Validation Accuracy = {:.4f}".format(datetime.now(), test_acc))

        # Reset the file pointer of the image data generator
        val_generator.reset_pointer()
        train_generator.reset_pointer()



2018-12-03 14:46:38.041832 Start training...
2018-12-03 14:46:38.041919 Epoch number: 1


100%|██████████| 194/194 [01:50<00:00,  1.75it/s]

2018-12-03 14:48:29.331706 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.77it/s]

2018-12-03 14:48:51.337752 Validation Accuracy = 0.6944
2018-12-03 14:48:51.356271 Epoch number: 2



100%|██████████| 194/194 [01:54<00:00,  1.65it/s]

2018-12-03 14:50:46.444825 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.71it/s]

2018-12-03 14:51:09.280474 Validation Accuracy = 0.7313
2018-12-03 14:51:09.301053 Epoch number: 3



100%|██████████| 194/194 [01:54<00:00,  1.70it/s]

2018-12-03 14:53:04.115745 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.76it/s]

2018-12-03 14:53:26.224274 Validation Accuracy = 0.6962
2018-12-03 14:53:26.244045 Epoch number: 4



100%|██████████| 194/194 [01:50<00:00,  1.76it/s]

2018-12-03 14:55:17.016173 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.77it/s]

2018-12-03 14:55:39.044961 Validation Accuracy = 0.7341
2018-12-03 14:55:39.064166 Epoch number: 5



100%|██████████| 194/194 [01:49<00:00,  1.78it/s]

2018-12-03 14:57:29.594900 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.77it/s]

2018-12-03 14:57:51.650191 Validation Accuracy = 0.7278
2018-12-03 14:57:51.670206 Epoch number: 6



100%|██████████| 194/194 [01:51<00:00,  1.77it/s]

2018-12-03 14:59:43.300880 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.76it/s]

2018-12-03 15:00:05.448166 Validation Accuracy = 0.7042
2018-12-03 15:00:05.467255 Epoch number: 7



100%|██████████| 194/194 [01:50<00:00,  1.71it/s]

2018-12-03 15:01:56.377345 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.76it/s]

2018-12-03 15:02:18.498373 Validation Accuracy = 0.7549
2018-12-03 15:02:18.518558 Epoch number: 8



100%|██████████| 194/194 [01:50<00:00,  1.76it/s]

2018-12-03 15:04:09.585243 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.76it/s]

2018-12-03 15:04:31.692118 Validation Accuracy = 0.7241
2018-12-03 15:04:31.711660 Epoch number: 9



100%|██████████| 194/194 [01:51<00:00,  1.71it/s]

2018-12-03 15:06:23.521184 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.76it/s]

2018-12-03 15:06:45.675578 Validation Accuracy = 0.7322
2018-12-03 15:06:45.694950 Epoch number: 10



100%|██████████| 194/194 [01:51<00:00,  1.76it/s]

2018-12-03 15:08:37.604543 Start validation



100%|██████████| 39/39 [00:22<00:00,  1.76it/s]

2018-12-03 15:08:59.767560 Validation Accuracy = 0.7238



