![](https://s3.cn-north-1.amazonaws.com.cn/u-img/012a864a-3376-48c5-88be-8ca9fe31b1be)

# 1. 多层神经网络

在本课程中，你会学到如何用 TensorFlow 构建多层神经网络。之前你应该了解，在网络里面添加一个隐藏层，可以让它构建更复杂的模型。而且，在隐藏层用非线性激活函数可以让它对非线性函数建模。

一个常用的非线性函数叫 [ReLU（rectified linear unit）](https://en.wikipedia.org/wiki/Rectifier_(neural_networks))。ReLU 函数对所有负的输入，返回 0；所有 $x >0$ 的输入，返回 $x$。

接下来你会看到如何在 TensorFlow 里实现一个 ReLU 隐藏层。

## 1.1 TensorFlow ReLUs

TensorFlow 提供了 ReLU 函数 `tf.nn.relu()`，如下所示：

```shell
# Hidden Layer with ReLU activation function
# 隐藏层用 ReLU 作为激活函数
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)
```

上面的代码把 `tf.nn.relu()` 放到 `隐藏层`，就像开关一样把负权重关掉了。在激活函数之后，添加像 `输出层` 这样额外的层，就把模型变成了非线性函数。这个非线性的特征使得网络可以解决更复杂的问题。

**练习：**

下面你将用 ReLU 函数把一个线性单层网络转变成非线性多层网络。

![](https://s3.cn-north-1.amazonaws.com.cn/u-img/06fd3fd3-9de8-4dfa-88fb-b806b0810065)

In [3]:
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print session results
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits))

[[ 5.11      8.440001]
 [ 0.        0.      ]
 [24.010002 38.239998]]


## 1.2 TensorFlow 中的深度神经网络

你已经学过了如何用 TensorFlow 构建一个逻辑分类器。现在你会学到如何用逻辑分类器来构建一个深度神经网络。

### 1.2.1 详细指导

接下来我们看看如何用 TensorFlow 来构建一个分类器来对 MNIST 数字进行分类。如果你要在自己电脑上跑这个代码，文件在[这儿](https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58a61a3a_multilayer-perceptron/multilayer-perceptron.zip)。你可以在[Aymeric Damien 的 GitHub repository](https://github.com/aymericdamien/TensorFlow-Examples)里找到更多的 TensorFlow 的例子。

### 1.2.2 代码

**TensorFlow MNIST**

In [4]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ./train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ./train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


你可以使用 TensorFlow 提供的 MNIST 数据集，他把分批和独热码都帮你处理好了。

### 1.2.3 学习参数 Learning Parameters

In [5]:
import tensorflow as tf

# 参数 Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # 如果没有足够内存，可以降低 batch size
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

这里的关注点是多层神经网络的架构，不是调参，所以这里直接给你了学习的参数。

### 1.2.4 隐藏层参数 Hidden Layer Parameters

In [6]:
n_hidden_layer = 256 # layer number of features 特征的层数

`n_hidden_layer` 决定了神经网络隐藏层的大小。也被称作层的宽度。

### 1.2.5 权重和偏置项 Weights and Biases

In [7]:
# Store layers weight & bias
# 层权重和偏置项的储存
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

深度神经网络有多个层，每个层有自己的权重和偏置项。`'hidden_layer'` 的权重和偏置项只属于隐藏层（hidden_layer）， `'out'` 的权重和偏置项只属于输出层（output layer）。如果神经网络比这更深，那每一层都有权重和偏置项。

### 1.2.6 输入 Input

In [8]:
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

MNIST 数据集是由 28px * 28px 单[通道](https://en.wikipedia.org/wiki/Channel_(digital_image%29)图片组成。`tf.reshape()`函数把 28px * 28px 的矩阵转换成了 784px * 1px 的单行向量 `x`。

### 1.2.7 多层感知器 Multilayer Perceptron

![](https://s3.cn-north-1.amazonaws.com.cn/u-img/3dc523b3-ebb6-455d-a496-f2882c66ebe5)

In [9]:
# Hidden layer with RELU activation
# ReLU作为隐藏层激活函数
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
# 输出层的线性激活函数
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])

你之前已经见过 `tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])`，也就是 `xw + b`。把线性函数与 ReLU 组合在一起，形成一个2层网络。

### 1.2.8 优化器 Optimizer

In [10]:
# Define loss and optimizer
# 定义误差值和优化器
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



这跟 Intro to TensorFlow lab 里用到的优化技术一样。

### 1.2.9 Session

In [11]:
# Initializing the variables
# 初始化变量
init = tf.global_variables_initializer()

# Launch the graph
# 启动图
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    # 训练循环
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        # 遍历所有 batch
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            # 运行优化器进行反向传导、计算 cost（获取 loss 值）
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

TensorFlow 中的 MNIST 库提供了分批接收数据的能力。调用mnist.train.next_batch()函数返回训练数据的一个子集。

### 1.2.10 深度神经网络

![](https://s3.cn-north-1.amazonaws.com.cn/u-img/245d51c5-0167-4f6d-b80e-e71e126ebaca)

就是这样！从一层到两层很简单。向网络中添加更多层，可以让你解决更复杂的问题。

## 1.3 保存和读取 TensorFlow 模型

训练一个模型的时间很长。但是你一旦关闭了 TensorFlow session，你所有训练的权重和偏置项都丢失了。如果你计划在之后重新使用这个模型，你需要重新训练！

幸运的是，TensorFlow 可以让你通过一个叫 `tf.train.Saver` 的类把你的进程保存下来。这个类可以把任何 `tf.Variable` 存到你的文件系统。

### 1.3.1 保存变量
让我们通过一个简单地例子来保存 `weights` 和 `bias` Tensors。第一个例子你只是存两个变量，后面会教你如何把一个实际模型的所有权重保存下来。

In [12]:
import tensorflow as tf

# The file path to save the data
# 文件保存路径
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
# 用来存取 Tensor 变量的类
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    # 初始化所有变量
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
   # 显示变量和权重
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    # 保存模型
    saver.save(sess, save_file)

Weights:
[[ 0.9315782   0.28426078 -0.17783403]
 [ 1.10509     0.45378187 -0.22788237]]
Bias:
[-1.1493552   0.08015648 -0.96579224]


`weights` 和 `bias Tensors` 用 `tf.truncated_normal()` 函数设定了随机值。用 `tf.train.Saver.save()` 函数把这些值被保存在 `save_file` 位置，命名为 "model.ckpt"，（".ckpt" 扩展名表示"checkpoint"）。

如果你使用 TensorFlow 0.11.0RC1 或者更新的版本，还会生成一个包含了 TensorFlow graph 的文件 "model.ckpt.meta"。

### 1.3.2 加载变量

现在这些变量已经存好了，让我们把它们加载到新模型里。

In [None]:
# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
# 用来存取 Tensor 变量的类
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    # 加载权重和偏置项
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    # 显示权重和偏置项
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

注意，你依然需要在 Python 中创建 `weights` 和 `bias` Tensors。`tf.train.Saver.restore()` 函数把之前保存的数据加载到 `weights` 和 `bias` 当中。

因为 `tf.train.Saver.restore()` 设定了 TensorFlow 变量，这里你不需要调用 `tf.global_variables_initializer()`了。

### 1.3.3 保存一个训练好的模型

让我们看看如何训练一个模型并保存它的权重。

从一个模型开始：

In [14]:
# Remove previous Tensors and Operations
# 移除之前的  Tensors 和运算
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST 数据输入 (图片尺寸: 28*28)
n_classes = 10  # MNIST 总计类别 (数字 0-9)

# Import MNIST data
# 加载 MNIST 数据
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
# 特征和标签
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
# 权重和偏置项
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
# 定义损失函数和优化器
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
# 计算准确率
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


让我们训练模型并保存权重：

In [15]:
import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
# 启动图
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    # 训练循环
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        # 遍历所有 batch
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        # 每运行10个 epoch 打印一次状态
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    # 保存模型
    saver.save(sess, save_file)
    print('Trained Model Saved.')

Epoch 0   - Validation Accuracy: 0.12160000205039978
Epoch 10  - Validation Accuracy: 0.3003999888896942
Epoch 20  - Validation Accuracy: 0.42719998955726624
Epoch 30  - Validation Accuracy: 0.519599974155426
Epoch 40  - Validation Accuracy: 0.5774000287055969
Epoch 50  - Validation Accuracy: 0.6204000115394592
Epoch 60  - Validation Accuracy: 0.6552000045776367
Epoch 70  - Validation Accuracy: 0.6740000247955322
Epoch 80  - Validation Accuracy: 0.6926000118255615
Epoch 90  - Validation Accuracy: 0.7085999846458435
Trained Model Saved.


### 1.3.4 加载训练好的模型

让我们从磁盘中加载权重和偏置项，验证测试集准确率。

In [16]:
saver = tf.train.Saver()

# Launch the graph
# 加载图
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

INFO:tensorflow:Restoring parameters from ./train_model.ckpt
Test Accuracy: 0.7294999957084656


就是这样！你现在知道如何保存再加载一个 TensorFlow 的训练模型了。下一章节让我们看看如何把权重和偏置项加载到修改过的模型中。

## 1.4 把权重和偏置项加载到新模型中

很多时候你想调整，或者说“微调”一个你已经训练并保存了的模型。但是，把保存的变量直接加载到已经修改过的模型会产生错误。让我们看看如何解决这个问题。

### 1.4.1 命名报错

TensorFlow 对 Tensor 和计算使用一个叫 `name` 的字符串辨识器，如果没有定义 `name`，TensorFlow 会自动创建一个。TensorFlow 会把第一个节点命名为 `<Type>`，把后续的命名为 `<Type>_<number>`。让我们看看这对加载一个有不同顺序权重和偏置项的模型有哪些影响：

In [None]:
import tensorflow as tf

# Remove the previous weights and bias
# 移除先前的权重和偏置项
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名字
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名字
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    # 加载权重和偏置项 - 报错
    saver.restore(sess, save_file)

上述代码会有下列输出：

> Save Weights: Variable:0

> Save Bias: Variable_1:0

> Load Weights: Variable_1:0

> Load Bias: Variable:0

> ...

> InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match.

> ...

你注意到，`weights` 和 `bias` 的 `name` 属性与你保存的模型不同。这是为什么代码报“Assign requires shapes of both tensors to match”这个错误。`saver.restore(sess, save_file)` 代码试图把权重数据加载到bias里，把偏置项数据加载到 `weights`里。

与其让 TensorFlow 来设定 `name` 属性，不如让我们来手动设定：

In [None]:
import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名称
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名称
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    # 加载权重和偏置项 - 没有报错
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

> Save Weights: weights_0:0

> Save Bias: bias_0:0

> Load Weights: weights_0:0

> Load Bias: bias_0:0

> Loaded Weights and Bias successfully.

这次没问题！Tensor 名称匹配正确，数据被正确加载。

## 1.5 TensorFlow Dropout

![](https://s3.cn-north-1.amazonaws.com.cn/u-img/47666c10-996f-4740-9674-ce54c6f768ae)

(https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)

Dropout 是一个降低过拟合的正则化技术。它在网络中暂时的丢弃一些单元（[神经元](https://en.wikipedia.org/wiki/Artificial_neuron)），以及与它们的前后相连的所有节点。图 1 是 dropout 的工作示意图。

TensorFlow 提供了一个 `tf.nn.dropout()` 函数，你可以用来实现 dropout。

让我们来看一个 `tf.nn.dropout()` 的使用例子。

```python
keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])
```

上面的代码展示了如何在神经网络中应用 dropout。

`tf.nn.dropout()` 函数有两个参数：

1. `hidden_layer`：你要应用 dropout 的 tensor
2. `keep_prob`：任何一个给定单元的留存率（**没有**被丢弃的单元）

`keep_prob` 可以让你调整丢弃单元的数量。为了补偿被丢弃的单元，`tf.nn.dropout()` 把所有保留下来的单元（**没有**被丢弃的单元）* `1/keep_prob`

在训练时，一个好的 `keep_prob` 初始值是 `0.5`。

在测试时，把 `keep_prob` 值设为 `1.0` ，这样保留所有的单元，最大化模型的能力。

**练习**

这个练习的代码来自 ReLU 的练习，应用一个 dropout 层。用 ReLU 层和 dropout 层构建一个模型，`keep_prob` 值设为 `0.5`。打印这个模型的 logits。

注意: 由于 dropout 会随机丢弃单元，每次运行代码输出会有所不同。

In [20]:
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prob = tf.placeholder(tf.float32)
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print logits from a session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(logits, feed_dict={keep_prob: 0.5}))

[[ 7.6799994  15.059999  ]
 [ 0.71400005  0.91800004]
 [ 9.56        4.78      ]]
