Deep Learning
=============

任务 4
------------

根据先前的`2_fullyconnected.ipynb` 和 `3_regularization.ipynb`，我们训练了[notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html)字母分类的全连接网络

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

这次任务的目标是创建一个卷积神经网络

The goal of this assignment is make the neural network convolutional.

In [29]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [43]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


重新格式化TensorFlow-friendly shape
- 卷积需要立方体格式的图像数据
- 标签是独热编码

Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [44]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np
print('训练集', train_dataset.shape, train_labels.shape)
print('验证集', valid_dataset.shape, valid_labels.shape)
print('测试集', test_dataset.shape, test_labels.shape)
def reformat(dataset, labels):
    dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('训练集', train_dataset.shape, train_labels.shape)
print('验证集', valid_dataset.shape, valid_labels.shape)
print('测试集', test_dataset.shape, test_labels.shape)

训练集 (200000, 28, 28) (200000,)
验证集 (10000, 28, 28) (10000,)
测试集 (10000, 28, 28) (10000,)
训练集 (200000, 28, 28, 1) (200000, 10)
验证集 (10000, 28, 28, 1) (10000, 10)
测试集 (10000, 28, 28, 1) (10000, 10)


In [32]:
# 计算精准度，np.argmax是取数组里最大值，横轴 100*sum(训练结果=测试结果的数量)/总行数
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

In [33]:
# conv模型
def model_conv(data):
    # tf.nn.conv2d是TensorFlow里面实现卷积的函数
    # tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)
    # 第一个参数input：指需要做卷积的输入图像，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，
    # 具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32和float64其中之一
    # 第二个参数filter：相当于CNN中的卷积核，二维的滤波器矩阵，也叫权重矩阵
    # 它要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]这样的shape，
    # 具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，
    # 有一个地方需要注意，第三维in_channels，就是参数input的第四维
    # 最后的out_channels是输出几个图的结果（深度）
    # 第三个参数strides：卷积时在图像每一维的步长，这是一个一维的向量，长度4
    # strides[0]和strides[3]的两个1是默认值，中间两个1代表padding时在x方向运动一步，y方向运动一步
    # 第一个是批处理（batch），最后一个是卷积的深度（depth）
    # 第四个参数padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式
    # V是不可超越边界是图片尺寸-核尺寸+1/步长，S是可以超越的，核心位置可以贴变，特征结果是图片尺寸／步长
    # 第五个参数：use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true
    # 输出结果[0]是批处理（batch）[1]和[2]是图片经过过滤器后的长宽结果，[3]是卷积的深度（depth）
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    shape = hidden.get_shape().as_list()
    print(shape)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    print(shape)
    # reshape重新转成2维格式用来变成以前的格式计算
    # 新的二维数组A的维度是[filter_height＊filter_width＊in_channels, out_channels].
    # 其次，input数组依然为4维数组，但是维度发生了变化。
    # input产生新的4维数组B的维度是[batch, out_height, out_width,  filter_height * filter_width * in_channels]。
    # 最后进行乘法B＊A。
    # 所以在设置filter时应注意filter的维度以及input的维度的设置，否则conv2d无法进行运算。
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    print(reshape)# 16, 784
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

---
问题 1
---------
上面的卷积模型使用步幅为2的卷积来降低维数。通过池化操作取代步幅步幅2和内核大小2的卷积。

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [40]:
# maxpool模型
def model_maxpool(data):
    # tf.nn.conv2d是TensorFlow里面实现卷积的函数
    # tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)
    # 第一个参数input：指需要做卷积的输入图像，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，
    #                具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，
    #                注意这是一个4维的Tensor，要求类型为float32和float64其中之一
    # 第二个参数filter：相当于CNN中的卷积核，二维的滤波器矩阵，也叫权重矩阵
    #                 要求是一个Tensor，具有[filter_height, filter_width, in_channels, out_channels]这样的shape，
    #                 具体含义是[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]，要求类型与参数input相同，
    #                 有一个地方需要注意，第三维in_channels，就是参数input的第四维
    #                 最后的out_channels是输出几个图的结果（深度）
    # 第三个参数strides：卷积时在图像每一维的步长，这是一个一维的向量，长度4
    #                  strides[0]和strides[3]的两个1是默认值，中间两个1代表padding时在x方向运动一步，y方向运动一步
    #                  第一个是批处理（batch），最后一个是卷积的深度（depth）
    # 第四个参数padding：string类型的量，只能是"SAME","VALID"其中之一，这个值决定了不同的卷积方式
    #                  V是不可超越边界是图片尺寸-核尺寸+1/步长，S是可以超越的，核心位置可以贴变，特征结果是图片尺寸／步长
    # 第五个参数：use_cudnn_on_gpu:bool类型，是否使用cudnn加速，默认为true
    #           输出结果[0]是批处理（batch）[1]和[2]是图片经过过滤器后的长宽结果，[3]是卷积的深度（depth）
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    shape = hidden.get_shape().as_list()
#     print(shape)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    # 池化操作 参数：
    # value：需要池化的输入。
    #       一般池化层接在卷积层后面，所以输入通常是feature map，依然是[batch, height, width, channels]这样的shape.
    # ksize：池化窗口的大小。
    #       取一个四维向量，一般是[1, height, width, 1]，因为我们不想在batch和channels上做池化，所以这两个维度设为了1.
    # strides：和卷积类似，窗口在每一个维度上滑动的步长，一般也是[1, stride,stride, 1]
    # padding：和卷积类似，可以取'VALID' 或者'SAME'
    # data_format：字符串. 目前支持 'NHWC' 和 'NCHW'.
    conv = tf.nn.max_pool(conv, [1,2,2,1], [1,2,2,1], padding='SAME') # 逐步改变尺寸
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
#     print(shape)
    # reshape重新转成2维格式用来变成以前的格式计算
    # 新的二维数组A的维度是[filter_height＊filter_width＊in_channels, out_channels].
    # 其次，input数组依然为4维数组，但是维度发生了变化。
    # input产生新的4维数组B的维度是[batch, out_height, out_width,  filter_height * filter_width * in_channels]。
    # 最后进行乘法B＊A。
    # 所以在设置filter时应注意filter的维度以及input的维度的设置，否则conv2d无法进行运算。
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    print("hidden.shape", shape[0] , shape[1] , shape[2] , shape[3])
    print("reshape.shape", reshape.shape)
    print("layer3_weights.shape", layer3_weights.shape)
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases

In [7]:
def model(data):  
        # data (batch, 28, 28, 1)          
        # weights reshaped to (patch_size*patch_size*num_channels, depth)  
        # data reshaped to (batch, 14, 14,  patch_size*patch_size*num_channels)  
        # conv shape (batch, 14, 14, depth)  
        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME') # convolution  
        hidden = tf.nn.relu(conv + layer1_biases)  
        # weights shape (patch_size, patch_size, depth, depth)  
        # weights reshaped into (patch_size*patch_size* depth, depth)  
        # hidden reshaped into (batch, 7, 7, patch_size*patch_size* depth)  
        # conv shape (batch, 7, 7, depth)  
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME') # convolution  
        # conv shape (batch, 7, 7, depth)  
        #print('conv1 shape', conv.get_shape().as_list())  
        conv = tf.nn.max_pool(conv, [1,2,2,1], [1,2,2,1], padding='SAME') # strides change dimensions  
        #print('conv2 shape', conv.get_shape().as_list())  
        hidden = tf.nn.relu(conv + layer2_biases)  
        #  hidden shape (batch, 4, 4, depth)  
         
        shape = hidden.get_shape().as_list()  
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])   
        # reshape (batch,4*4*depth)  
        # weights shape( 4 * 4*depth, num_hidden)  
        # hidden shape(batch, num_hidden)  
        #print('reshape shape', reshape.get_shape().as_list())  
        #print('layer3_weights', layer3_weights.get_shape().as_list())  
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)   
        #  return tensor  (batch, num_labels)  
        return tf.matmul(hidden, layer4_weights) + layer4_biases  

让我们用两个卷积层建设一个小型网络，跟在一个全连接层之后。卷积网络是更加昂贵的计算，所以我们限制全连接节点的深度和数量

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [45]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # 输入数据
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    print("tf_train_dataset.shape=",tf_train_dataset.shape)
    print("tf_valid_dataset.shape=",tf_valid_dataset.shape)
    

    # 变量，在这里是过滤器用
    # truncated_normal按照正态分布初始化权重
    # mean是正态分布的平均值
    # stddev是正态分布的标准差（standard deviation）
    # seed是作为分布的random seed（随机种子，我百度了一下，跟什么伪随机数发生器还有关，就是产生随机数的）
    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    layer3_weights = tf.Variable(tf.truncated_normal([image_size // 7 * image_size // 7 * depth, num_hidden], stddev=0.1))

#     print(layer3_weights.shape)
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))


    # 训练计算
    # 损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度
    # 它是一个非负实值函数，通常使用L(Y, f(x))来表示，损失函数越小，模型的可能指就越好。
    logits = model_maxpool(tf_train_dataset)
    
#     print(logits.get_shape())# (16, 10)
#     print(tf_train_labels.get_shape()) # (16, 10)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
    # 优化器
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

    # 对训练，验证和测试数据集进行预测
    
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model_maxpool(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model_maxpool(tf_test_dataset))

tf_train_dataset.shape= (16, 28, 28, 1)
tf_valid_dataset.shape= (10000, 28, 28, 1)
hidden.shape 16 4 4 16
reshape.shape (16, 256)
layer3_weights.shape (256, 64)
hidden.shape 10000 4 4 16
reshape.shape (10000, 256)
layer3_weights.shape (256, 64)
hidden.shape 10000 4 4 16
reshape.shape (10000, 256)
layer3_weights.shape (256, 64)


---
问题 2
---------
试着用卷积网来获得最好的性能。以经典的[LeNet5](http://yann.lecun.com/exdb/lenet/)架构为例，添加Dropout和/或添加学习速率衰减。

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [23]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # 输入数据
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # 变量，在这里是过滤器用
    
    # 初始的学习速率
    starter_learning_rate = 0.1 
    # 全局的step，与 decay_step 和 decay_rate一起决定了 learning rate的变化
    global_step = tf.Variable(0, trainable=False)
    # 衰减速度
    decay_steps = 100
    # 衰减系数
    decay_rate = 0.5
    # 如果staircase=True，那就表明每decay_steps次计算学习速率变化，更新原始学习速率.
    # 如果是False，那就是每一步都更新学习速率
    staircase = True
    # 指数衰减:法通过这个函数，可以先使用较大的学习率来快速得到一个比较优的解，然后随着迭代的继续逐步减小学习率，使得模型在训练后期更加稳定
    # 87.7% 仅仅指数衰减
    learning_rate = tf.train.exponential_decay(starter_learning_rate,global_step,decay_steps,decay_rate,staircase)
    
    # truncated_normal按照正态分布初始化权重
    # mean是正态分布的平均值
    # stddev是正态分布的标准差（standard deviation）
    # seed是作为分布的random seed（随机种子，我百度了一下，跟什么伪随机数发生器还有关，就是产生随机数的）
    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    layer3_weights = tf.Variable(tf.truncated_normal([image_size // 7 * image_size // 7 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))


    # 训练计算
    # 损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度
    # 它是一个非负实值函数，通常使用L(Y, f(x))来表示，损失函数越小，模型的可能指就越好。
    logits = model_maxpool(tf_train_dataset)
    
    print(logits.get_shape())# (16, 10)
    print(tf_train_labels.get_shape()) # (16, 10)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
    # 优化器
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

    # 对训练，验证和测试数据集进行预测
    
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model_maxpool1(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model_maxpool1(tf_test_dataset))

[16, 14, 14, 16]
[16, 4, 4, 16]
Tensor("Reshape:0", shape=(16, 256), dtype=float32)
(16, 10)
(16, 10)
[10000, 14, 14, 16]
[10000, 4, 4, 16]
Tensor("Reshape_4:0", shape=(10000, 256), dtype=float32)
[10000, 14, 14, 16]
[10000, 4, 4, 16]
Tensor("Reshape_5:0", shape=(10000, 256), dtype=float32)


In [49]:
num_steps = 1

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        
        print("tf_train_dataset",tf_train_dataset)
        print("tf_train_labels",tf_train_labels)
        
        print("batch_labels.shape",batch_labels.shape)
        
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
tf_train_dataset Tensor("Placeholder:0", shape=(16, 28, 28, 1), dtype=float32)
tf_train_labels Tensor("Placeholder_1:0", shape=(16, 10), dtype=float32)
batch_labels.shape (16, 10)
Minibatch loss at step 0: 3.771455
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Test accuracy: 10.0%
