Deep Learning
=============

Assignment 2
------------

在之前的任务中，我们创建了一个经过处理后的数据集，在这个任务中，我们会基于这个数据集用Tensorflow逐渐训练一个越来越深越来越精确的模型。

In [14]:
# 声明库依赖
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

首先导入我们之前处理过的数据集

In [6]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


其次格式化数据集：根据我们即将训练模型的输入改变数组的维度
- data:(N, 28, 28) => (N, 784)
- labels: (N,) => (N, 10)   1.0 -> [0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]

In [7]:
image_size = 28
num_labels = 10

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
    # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)
print(train_labels[0])

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)
[ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]


我们首先基于简单的梯度下降来建立一个单层神经网络模型。

TensorFlow 工作流程主要如下:
* 首先，在下面的这段代码块内定义计算图，包括输入，权重参数以及对应的计算操作:

      with graph.as_default():
          ...

* 其次在下面这段代码块内，通过调用`session.run()`多次运算在我们这个计算图中定义的计算操作:

      with tf.Session(graph=graph) as session:
          ...

将前数据加载到TensorFlow中并构建与我们训练相对应的计算图:

In [23]:
# 使用梯度下降训练模型
# 为了加快计算时间，我们取训练集的前1000条数据
train_subset = 10000

graph = tf.Graph()
with graph.as_default():

    # 定义输入
    # 加载训练集，验证集，测试集
    # Tensorflow API: tf.constant()
    tf_train_dataset = tf.constant(train_dataset[:train_subset])
    tf_train_labels = tf.constant(train_labels[:train_subset])
    
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # 定义权重矩阵参数
    # 这些就是我们即将要训练的参数，我们随机初始化一个服从高斯分布的权重矩阵(784x10)，并且将偏移量初始化为0
    # X * W: 1x784 x 784x10 = 1x10
    # Tensorflow API: tf.truncated_normal(), tf.Variable(), tf.zeros()
    
    weights = tf.Variable(tf.truncated_normal([784, 10]))
    biases = tf.Variable(tf.zeros([10]))

    # 训练计算
    # 我们将权重矩阵和输入训练集进行线性运算将运算后的结果和真实值进行交叉商检验，得到损失函数
    # Tensorflow API: tf.matmul(), tf.nn.softmax_cross_entropy_with_logits(), tf.reduce_mean()
    
    logits = tf.matmul(tf_train_dataset, weights) + biases
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))

    # 通过梯度下降定义最小化损失函数值
    # Tensorflow API: tf.train.GradientDescentOptimizer()
    
    optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

    # 预测训练集，验证集和测试集的输出
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(tf.matmul(tf_valid_dataset, weights) + biases)
    test_prediction = tf.nn.softmax(tf.matmul(tf_test_dataset, weights) + biases)

运行计算图:

In [24]:
num_steps = 801

def accuracy(predictions, labels):
    # numpy.argmax(a, axis=None, out=None): 返回对应维度最大值的索引值.
    
    # array = [[1,3,5,7,9],[10,8,6,4,2]]
    # lables = [[0,0,0,0,1],[1,0,0,0,0]]
    
    # np.argmax(array, 1) = [4, 0]       
    # np.argmax(labels, 1) = [4, 0]
    # np.argmax(array, 1) == np.argmax(lables, 1): [True, True]
    # np.sum([True, True]) = 2; np.sum([True, False]) = 1
    
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

with tf.Session(graph=graph) as session:
    # 真正初始化我们定义在图中的参数: 权重矩阵参数，偏移量
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        # 调用.run()运行我们想要计算的操作，比如最优化函数optimizer，训练集预测值，并得到损失值和预测集结果
        _, l, predictions = session.run([optimizer, loss, train_prediction])
        if (step % 100 == 0):
            print('Loss at step %d: %f' % (step, l))
            print('Training accuracy: %.1f%%' % accuracy(predictions, train_labels[:train_subset, :]))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Instructions for updating:
Use `tf.global_variables_initializer` instead.
Initialized
Loss at step 0: 19.537174
Training accuracy: 8.7%
Validation accuracy: 10.4%
Loss at step 100: 2.363143
Training accuracy: 71.4%
Validation accuracy: 70.4%
Loss at step 200: 1.899808
Training accuracy: 74.3%
Validation accuracy: 72.8%
Loss at step 300: 1.642689
Training accuracy: 75.7%
Validation accuracy: 73.8%
Loss at step 400: 1.468133
Training accuracy: 76.7%
Validation accuracy: 74.2%
Loss at step 500: 1.339121
Training accuracy: 77.2%
Validation accuracy: 74.4%
Loss at step 600: 1.238581
Training accuracy: 78.0%
Validation accuracy: 74.8%
Loss at step 700: 1.157117
Training accuracy: 78.6%
Validation accuracy: 74.9%
Loss at step 800: 1.089345
Training accuracy: 79.1%
Validation accuracy: 74.9%
Test accuracy: 0.0%




In [4]:
def weight_variable(shape):
    '''
    权重矩阵
    @shape: 矩阵维度. eg. [10, 5] 表示定义一个10x5的矩阵
    '''
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    '''
    偏移向量
    @shape: 向量维度. eg. [10] 表示定义一个1x10的向量
    '''
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def variable_summaries(var):
    '''
    统计训练过程中某些参数的值并记录变化过程，主要用于图表显示
    '''
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

def define_input(image_size=28, number_labels=10):
    with tf.name_scope('input'):
        x = tf.placeholder(tf.float32, [None, image_size * image_size], name='x-input')
        y_ = tf.placeholder(tf.float32, [None, number_labels], name='y-input')

    with tf.name_scope('input_reshape'):
        image_shaped_input = tf.reshape(x, [-1, image_size, image_size, 1])
        tf.summary.image('input', image_shaped_input, number_labels)
    return x, y_

def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
    with tf.name_scope(layer_name):
        with tf.name_scope('weights'):
            #--------------------------------------------------------------------
            # 创建一个input_dim * output_dim维度的权重矩阵
            
            weights = weight_variable([input_dim, output_dim])
            
            #--------------------------------------------------------------------
            variable_summaries(weights)

        with tf.name_scope('biases'):
            #--------------------------------------------------------------------
            # 创建一个 output_dim 维度的偏移向量
            
            biases = bias_variable([output_dim])
            
            #--------------------------------------------------------------------
            variable_summaries(biases)

        with tf.name_scope('Wx_plus_b'):
            #--------------------------------------------------------------------
            # 线性计算: W * X + b
            
            preactivate = tf.matmul(input_tensor, weights) + biases
            
            #--------------------------------------------------------------------
            tf.summary.histogram('pre_activations', preactivate)

        #--------------------------------------------------------------------
        # 通过激活函数计算线性结果: act

        activations = act(preactivate)

        #--------------------------------------------------------------------
        
        tf.summary.histogram('activations', activations)
    return activations



# Input: N x 784
# Hidden_nodes = 1024(N x 1024) => hidden_weight = 784 x 1024
# output: N x 10 => output_weight = 1024 x 10
def main(learning_rate=0.05, max_steps=3001, batch_size=128):
    sess = tf.InteractiveSession()

    
    #---------------------------------------------------------------------------------
    # 定义输入训练集:x,训练真实值:y (the image size is 28 and labels is 10)
    #

    x, y_ = define_input(28, 10)

    #---------------------------------------------------------------------------------
    
    
    #--------------------------------------------------------------------------------
    # define first layer called 'layer1' with 1024 neurons
    # 定义神经网络的第一层layer1，拥有1024个神经元

    hidden1 = nn_layer(x, 784, 1024, 'layer1')

    #--------------------------------------------------------------------------------
    

    #################################################################
    # with tf.name_scope('dropout'):
    #     keep_prob = tf.placeholder(tf.float32)
    #     tf.summary.scalar('dropout_keep_probability', keep_prob)
    #     droped = tf.nn.dropout(hidden1, keep_prob)
    #################################################################

    
    #--------------------------------------------------------------------------------
    # 定义神经网络第二层layer2，拥有10个神经网络，激活函数为tf.identity

    y = nn_layer(x, 1024, 10, 'layer2', tf.identity)

    #--------------------------------------------------------------------------------
    

    with tf.name_scope('cross_entropy'):
        #--------------------------------------------------------------------------------
        # 定义损失函数: tf.nn.softmax_cross_entropy_with_logits(labels=?, logits=?)
        # 计算交叉商平均值: tf.reduce_mean(per_loss)

        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

        #--------------------------------------------------------------------------------
    tf.summary.scalar('cross_entropy', cross_entropy)

    with tf.name_scope('train'):
        #--------------------------------------------------------------------------------
        # 最优化损失函数: tf.train.AdamOptimizer(learning_rate).minimize(loss)

        train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

        #--------------------------------------------------------------------------------
        

    with tf.name_scope('accuracy'):
        with tf.name_scope('correct_prediction'):
            correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
        with tf.name_scope('accuracy'):
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar('accuracy', accuracy)

    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter('./summary/train', sess.graph)

    tf.global_variables_initializer().run()

    for step in range(max_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size)]
        batch_labels = train_labels[offset:(offset + batch_size)]
        feed_dict = {x: batch_data, y_: batch_labels}

        if step % 500 == 99:
            run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
            run_metadata = tf.RunMetadata()
            summary, _, acc = sess.run([merged, train_step, accuracy],
                                       feed_dict=feed_dict,
                                       options=run_options,
                                       run_metadata=run_metadata)
            train_writer.add_run_metadata(run_metadata, 'step%03d' % step)
            train_writer.add_summary(summary, step)
            print('Adding run metadata for %s and the accuracy is %s' % (step, acc))
        else:
            #--------------------------------------------------------------------------------
            # training merged, tain_step, accuracy 

            # summary, _, acc = ???

            #--------------------------------------------------------------------------------
            
            
            
            train_writer.add_summary(summary, step)

            
        if (step % 500 == 0):
            summary, acc = sess.run([merged, accuracy], feed_dict={x: valid_dataset, y_: valid_labels})

            train_writer.add_summary(summary, step)
            print('Accuracy at step %s: %s' % (step, acc))

    summary, acc = sess.run([merged, accuracy], feed_dict={x: test_dataset, y_: test_labels})

    train_writer.add_summary(summary, step + 1)
    print('Total Test Accuracy at step %s: %s' % (step + 1, acc))

    train_writer.close()

main(0.02, 3001, 128)


Accuracy at step 0: 0.3577
Adding run metadata for 99 and the accuracy is 0.773438
Accuracy at step 500: 0.7683
Adding run metadata for 599 and the accuracy is 0.75
Accuracy at step 1000: 0.7868
Adding run metadata for 1099 and the accuracy is 0.734375
Accuracy at step 1500: 0.7895
Adding run metadata for 1599 and the accuracy is 0.820312
Accuracy at step 2000: 0.8029
Adding run metadata for 2099 and the accuracy is 0.820312
Accuracy at step 2500: 0.8036
Adding run metadata for 2599 and the accuracy is 0.867188
Accuracy at step 3000: 0.8143
Total Test Accuracy at step 3001: 0.8859
