# 深度神经网络


## 1. TensorFlow ReLUs

TensorFlow 提供了 ReLU 函数 [`tf.nn.relu()`](https://www.tensorflow.org/api_docs/python/tf/nn/relu)，如下所示：

In [None]:
# 隐藏层用 ReLU 作为激活函数
hidden_layer = tf.add(tf.matmul(features, hidden_weights), hidden_biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, output_weights), output_biases)

上面的代码把[`tf.nn.relu()`](https://www.tensorflow.org/api_docs/python/tf/nn/relu) 放到`隐藏层`，就像开关一样把负权重关掉了。添加像`输出层`这样额外的层在激活函数后，就把模型变成了非线性函数。这个非线性的特征使得网络可以解决更复杂的问题。

### 练习

下面你将用 ReLU 函数把一个线性单层网络转变成非线性多层网络。

![](http://upload-images.jianshu.io/upload_images/1791718-6529fcaaa6eab653.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


**`quiz.py`**

In [1]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf

output = None
hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model
hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
output = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print session results
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(output))

[[  5.11000013   8.44000053]
 [  0.           0.        ]
 [ 24.01000214  38.23999786]]


## 2. TensorFlow中的深度神经网络
![](http://upload-images.jianshu.io/upload_images/1791718-8616451a588e2b0b.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
### TensorFlow MNIST

```python
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

```

你可以使用 TensorFlow 提供的 MNIST 数据集，他把分批和独热码都帮你处理好了。

### 参数学习

```python
import tensorflow as tf

# 参数
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # 如果没有足够内存，可以降低 batch size
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
```

这里的关注点是多层神经网络的架构，不是调参，所以这里直接给你了学习的参数。

### 隐藏层参数

```python
n_hidden_layer = 256 # 层特征数量

```

`n_hidden_layer` 决定了神经网络隐藏层的大小。也被称作层的宽度。

### 权重和偏置项

```python
# 层权重和偏置项的储存
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

```

深度神经网络有多个层，每个层有自己的权重和偏置项。`'hidden_layer'`的权重和偏置项只对隐藏层， `'out'`的权重和偏置项只对输出层。如果神经网络比这更深，那每一层都有权重和偏置项。

### 输入

```python
# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

```

MNIST 数据集是由 28px 乘 28px 单[通道](https://en.wikipedia.org/wiki/Channel_(digital_image%29)图片组成。`tf.reshape()`函数把 28px 乘 28px 的矩阵换成了 784px by 1px 的向量 `x`。

### 多层感知器

![](http://upload-images.jianshu.io/upload_images/1791718-594ec128e107a119.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

```python
# ReLU作为隐藏层激活函数
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']),\
    biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# 输出层的线性激活函数
logits = tf.add(tf.matmul(layer_1, weights['out']), biases['out'])
```

你之前已经见过 `tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])`，也就是 `xw + b`。把线性函数与ReLU组合在一起，给你一个两层网络。

### 优化器 Optimizer

```python
# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)
```

这跟 Intro to TensorFlow lab 里用到的优化技巧一样。

### Session

```python
# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
```

TensorFlow 中的 MNIST 库提供了分批接收数据的能力。调用`mnist.train.next_batch()`函数返回训练数据的一个子集。

**完整代码如下**

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

n_hidden_layer = 256 # layer number of features

# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# tf Graph input
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.matmul(layer_1, weights['out']) + biases['out']

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        # Display logs per epoch step
        if epoch % display_step == 0:
            c = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(c))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # Decrease test_size if you don't have enough memory
    test_size = 256
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:test_size], y: mnist.test.labels[:test_size]}))


Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 56.936542511
Epoch: 0002 cost= 29.456695557
Epoch: 0003 cost= 20.940849304
Epoch: 0004 cost= 20.819250107
Epoch: 0005 cost= 13.877964973
Epoch: 0006 cost= 14.417156219
Epoch: 0007 cost= 13.922370911
Epoch: 0008 cost= 12.143887520
Epoch: 0009 cost= 9.431661606
Epoch: 0010 cost= 8.265071869
Epoch: 0011 cost= 8.060132027
Epoch: 0012 cost= 9.156995773
Epoch: 0013 cost= 5.838820457
Epoch: 0014 cost= 9.550095558
Epoch: 0015 cost= 7.832756042
Epoch: 0016 cost= 4.661233425
Epoch: 0017 cost= 6.167332649
Epoch: 0018 cost= 7.459166050
Epoch: 0019 cost= 4.983653069
Epoch: 0020 cost= 4.577014446
Optimization Finished!
Accuracy: 0.824219


## 3. 保存和读取Tensorflow模型

训练一个模型的时间很长。但是你一旦关闭了 TensorFlow session，你所有训练的权重和偏置项都丢失了。如果你计划在之后重新使用这个模型，你需要重新训练！

幸运的是，TensorFlow 可以让你通过一个叫` tf.train.Saver `的类把你的进程保存下来。这个类可以把任何` tf.Variable `存到你的文件系统。

### 保存变量

让我们通过一个简单地例子来保存` weights `和` bias Tensors`。第一个例子你只是存两个变量，后面会教你如何把一个实际模型的所有权重保存下来。


In [8]:
import tensorflow as tf

# The file path to save the data
# 文件保存路径
save_file = './model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
# 用来存取 Tensor 变量的类
saver = tf.train.Saver()

with tf.Session() as sess:
    # Initialize all the Variables
    # 初始化所有变量
    sess.run(tf.global_variables_initializer())

    # Show the values of weights and bias
    # 显示变量和权重
    print('Weights:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

    # Save the model
    # 保存模型
    saver.save(sess, save_file)

Weights:
[[-0.28352487  0.47375432 -0.61708784]
 [-0.53866094 -0.85328549 -0.30711174]]
Bias:
[ 1.33706081 -0.25312975 -0.38354632]


`weights `和` bias Tensors `用[tf.truncated_normal()](https://www.tensorflow.org/api_docs/python/tf/truncated_normal) 函数设定了随机值。用 [tf.train.Saver.save()](https://www.tensorflow.org/api_docs/python/tf/train/Saver#save) 函数把这些值被保存在save_file 位置，命名为 "model.ckpt"，（".ckpt" 扩展名表示"checkpoint"）。

如果你使用 TensorFlow 0.11.0RC1 或者更新版，一个叫做 "model.ckpt.meta" 的文件也会生成。它包含了 TensorFlow graph。

### 加载变量

现在这些变量已经存好了，让我们把它们加载到新模型里。


In [9]:
# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

# Class used to save and/or restore Tensor Variables
# 用来存取 Tensor 变量的类
saver = tf.train.Saver()

with tf.Session() as sess:
    # Load the weights and bias
    # 加载权重和偏置项
    saver.restore(sess, save_file)

    # Show the values of weights and bias
    # 显示权重和偏置项
    print('Weight:')
    print(sess.run(weights))
    print('Bias:')
    print(sess.run(bias))

INFO:tensorflow:Restoring parameters from ./model.ckpt
Weight:
[[-1.18505168  0.66321963  0.82311398]
 [ 0.83256513 -1.52581465  0.88546741]]
Bias:
[-1.37767291 -0.73833025  0.514498  ]


注意到，你依然需要在 Python 中创建 weights 和 bias。

[tf.train.Saver.restore()](https://www.tensorflow.org/api_docs/python/tf/train/Saver#restore) 函数把保存的数据加载到weights 和 bias 当中。

因为 [tf.train.Saver.restore()](https://www.tensorflow.org/api_docs/python/tf/train/Saver#restore) 设定了 TensorFlow 变量，这里你不需要调用[tf.global_variables_initializer()](https://www.tensorflow.org/api_docs/python/tf/global_variables_initializer)了

### 保存一个训练好的模型
让我们看看如何训练一个模型并保存它的权重。

从一个模型开始：


In [10]:
# Remove previous Tensors and Operations
# 移除之前的  Tensors 和运算
tf.reset_default_graph()

from tensorflow.examples.tutorials.mnist import input_data
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST 数据输入 (图片尺寸: 28*28)
n_classes = 10  # MNIST 总计类别 (数字 0-9)

# Import MNIST data
# 加载 MNIST 数据
mnist = input_data.read_data_sets('.', one_hot=True)

# Features and Labels
# 特征和标签
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
# 权重和偏置项
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
# 定义损失函数和优化器
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Calculate accuracy
# 计算准确率
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


让我们训练模型并保存权重：


In [11]:
import math

save_file = './train_model.ckpt'
batch_size = 128
n_epochs = 100

saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(n_epochs):
        total_batch = math.ceil(mnist.train.num_examples / batch_size)

        # Loop over all batches
        for i in range(total_batch):
            batch_features, batch_labels = mnist.train.next_batch(batch_size)
            sess.run(
                optimizer,
                feed_dict={features: batch_features, labels: batch_labels})

        # Print status for every 10 epochs
        if epoch % 10 == 0:
            valid_accuracy = sess.run(
                accuracy,
                feed_dict={
                    features: mnist.validation.images,
                    labels: mnist.validation.labels})
            print('Epoch {:<3} - Validation Accuracy: {}'.format(
                epoch,
                valid_accuracy))

    # Save the model
    saver.save(sess, save_file)
    print('Trained Model Saved.')

Epoch 0   - Validation Accuracy: 0.12680000066757202
Epoch 10  - Validation Accuracy: 0.24699999392032623
Epoch 20  - Validation Accuracy: 0.40880000591278076
Epoch 30  - Validation Accuracy: 0.5098000168800354
Epoch 40  - Validation Accuracy: 0.5694000124931335
Epoch 50  - Validation Accuracy: 0.6110000014305115
Epoch 60  - Validation Accuracy: 0.6430000066757202
Epoch 70  - Validation Accuracy: 0.6686000227928162
Epoch 80  - Validation Accuracy: 0.6877999901771545
Epoch 90  - Validation Accuracy: 0.7039999961853027
Trained Model Saved.


### 加载训练好的模型
让我们从磁盘中加载权重和偏置项，验证测试集准确率


In [12]:
saver = tf.train.Saver()

# Launch the graph
with tf.Session() as sess:
    saver.restore(sess, save_file)

    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: mnist.test.images, labels: mnist.test.labels})

print('Test Accuracy: {}'.format(test_accuracy))

INFO:tensorflow:Restoring parameters from ./train_model.ckpt
Test Accuracy: 0.7186999917030334


## 4. 参数微调

### 把权重和偏置项加载到新模型中
很多时候你想调整，或者说“微调”一个你已经训练并保存了的模型。但是，把保存的变量直接加载到已经修改过的模型会产生错误。让我们看看如何解决这个问题。

### 命名报错
TensorFlow 对 Tensor 和计算使用一个叫` name `的字符串辨识器，如果名称没有给，TensorFlow 会自动创建一个。TensorFlow 会把第一个节点命名为` <Type>`，把后续的命名为`<Type>_<number>`。让我们看看这对加载一个有不同顺序权重和偏置项的模型有哪些影响：

In [13]:
import tensorflow as tf

# Remove the previous weights and bias
# 移除先前的权重和偏置项
tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
# 两个 Tensor 变量：权重和偏置项
weights = tf.Variable(tf.truncated_normal([2, 3]))
bias = tf.Variable(tf.truncated_normal([3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名字
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
# 移除之前的权重和偏置项
tf.reset_default_graph()

# Two Variables: weights and bias
# 两个变量：权重和偏置项
bias = tf.Variable(tf.truncated_normal([3]))
weights = tf.Variable(tf.truncated_normal([2, 3]))

saver = tf.train.Saver()

# Print the name of Weights and Bias
# 打印权重和偏置项的名字
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - ERROR
    # 加载权重和偏置项 - 报错
    saver.restore(sess, save_file)

Save Weights: Variable:0
Save Bias: Variable_1:0
Load Weights: Variable_1:0
Load Bias: Variable:0
INFO:tensorflow:Restoring parameters from model.ckpt


InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2,3] rhs shape= [3]
	 [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_1, save/RestoreV2_1)]]

Caused by op 'save/Assign_1', defined at:
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
    ioloop.IOLoop.instance().start()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
    handler_func(fd_obj, events)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
    user_expressions, allow_stdin)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-13b2e58d5f90>", line 34, in <module>
    saver = tf.train.Saver()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
    assign_ops.append(saveable.restore(tensors, shapes))
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 155, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 274, in assign
    validate_shape=validate_shape)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 43, in assign
    use_locking=use_locking, name=name)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/nfdw/anaconda2/envs/dlnd-tf-lab/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2,3] rhs shape= [3]
	 [[Node: save/Assign_1 = Assign[T=DT_FLOAT, _class=["loc:@Variable_1"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Variable_1, save/RestoreV2_1)]]


你注意到，`weights `和` bias `的` name `属性与你保存的模型不同。这是为什么代码报“Assign requires shapes of both tensors to match”这个错误。`saver.restore(sess, save_file) `代码试图把权重数据加载到` bias `里，把偏置项数据加载到` weights `里。

与其让 TensorFlow 来设定` name `属性，不如让我们来手动设定：

In [14]:
import tensorflow as tf

tf.reset_default_graph()

save_file = 'model.ckpt'

# Two Tensor Variables: weights and bias
weights = tf.Variable(tf.truncated_normal([2, 3]), name='weights_0')
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Save Weights: {}'.format(weights.name))
print('Save Bias: {}'.format(bias.name))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

# Remove the previous weights and bias
tf.reset_default_graph()

# Two Variables: weights and bias
bias = tf.Variable(tf.truncated_normal([3]), name='bias_0')
weights = tf.Variable(tf.truncated_normal([2, 3]) ,name='weights_0')

saver = tf.train.Saver()

# Print the name of Weights and Bias
print('Load Weights: {}'.format(weights.name))
print('Load Bias: {}'.format(bias.name))

with tf.Session() as sess:
    # Load the weights and bias - No Error
    saver.restore(sess, save_file)

print('Loaded Weights and Bias successfully.')

Save Weights: weights_0:0
Save Bias: bias_0:0
Load Weights: weights_0:0
Load Bias: bias_0:0
INFO:tensorflow:Restoring parameters from model.ckpt
Loaded Weights and Bias successfully.


## 5. 正则化
![](http://upload-images.jianshu.io/upload_images/1791718-6a4839a43624f6f0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

神经网络不能单纯依赖任何神经元，因为它们都有可能被随机的Dropout。

因此，Dropout操作会迫使神经网络去学习一些冗余的表达，来确保当某些神经元失活时，仍然能正确地表达出所有的信息。

![](http://upload-images.jianshu.io/upload_images/1791718-b1432141da54bbc8.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

就好像紧身的牛仔裤， 非常合身，但很难穿上。因此，人们会倾向于穿更宽松的牛仔裤。

神经网络也是一样，学习一些更冗余的表达，虽然没有那么完美拟合当前的数据，但是能使得网络更加robust，并且防止过拟合。

![](http://upload-images.jianshu.io/upload_images/1791718-c8576c45ee1e623d.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)


我们使用Dropout来训练的神经网络，但是验证的时候，很显然我们不再希望保留这种随机性。

因此，我们考虑取激活值的平均，来得到一个综合的评估。

如图，这个 $Y_e$ 是所有训练时得到的 $Y_t$ 的平均值。

训练时有一个小技巧：
在训练过程中，对于Dropout的激活值使用0代替，其他激活值则放大到二倍。
这样均值不会发生改变。





## 6. TensorFlow Dropout

![](http://upload-images.jianshu.io/upload_images/1791718-4a1a2d0cde6c7a4e.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

Dropout 是一个降低过拟合的正则化技术。它在网络中暂时的丢弃一些单位([神经元](https://en.wikipedia.org/wiki/Artificial_neuron))，以及它们的前后连接，上图是一个 dropout 如何工作的示意图

TensorFlow 提供了一个 [tf.nn.dropout()](https://www.tensorflow.org/api_docs/python/tf/nn/dropout) 函数，你可以用来实现 dropout。

让我们来看一个如何使用 [tf.nn.dropout()](https://www.tensorflow.org/api_docs/python/tf/nn/dropout)的例子。

In [None]:
keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])


上面的代码展示了如何在神经网络中应用dropout

[tf.nn.dropout()](https://www.tensorflow.org/api_docs/python/tf/nn/dropout)函数有两个参数：
1. hidden_layer：你要应用 dropout 的 tensor
2. keep_prob：任何一个给定单位的留存率（**没有**丢弃的）

` keep_prob ` 可以让你调整 drop 单位的数量。为了补偿被丢弃的单位，[tf.nn.dropout()](https://www.tensorflow.org/api_docs/python/tf/nn/dropout) 把所有保留下来的单位（**没有**丢弃的）乘 1/keep_prob

在训练时，一个好的 `keep_prob` 初始值是0.5。
在测试时，把` keep_prob `值设为1.0，这样保留所有的单位，最大化模型的能力。

### 练习1

看下下面的代码，哪里出问题了？

语法没问题，但是测试准确率很低。

In [None]:
...

keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

...

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i in range(batches):
            ....

            sess.run(optimizer, feed_dict={
                features: batch_features,
                labels: batch_labels,
                keep_prob: 0.5})

    validation_accuracy = sess.run(accuracy, feed_dict={
        features: test_features,
        labels: test_labels,
        keep_prob: 0.5})


很明显，上面的代码不应该在validation时给keep_prob传值0.5，而应该设为0.1

### 练习 2

这个练习的代码来自ReLU的练习，应用一个dropout层。用ReLU层和dropout层构建一个模型，keep_prob值设为 0.5。打印这个模型的logits。

注意: 由于dropout的随机性，每次运行代码输出会有所不同。

#### `quiz.py`

In [18]:

# Solution is available in the other "solution.py" tab
import tensorflow as tf

hidden_layer_weights = [
    [0.1, 0.2, 0.4],
    [0.4, 0.6, 0.6],
    [0.5, 0.9, 0.1],
    [0.8, 0.2, 0.8]]
out_weights = [
    [0.1, 0.6],
    [0.2, 0.1],
    [0.7, 0.9]]

# Weights and biases
weights = [
    tf.Variable(hidden_layer_weights),
    tf.Variable(out_weights)]
biases = [
    tf.Variable(tf.zeros(3)),
    tf.Variable(tf.zeros(2))]

# Input
features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

# TODO: Create Model with Dropout
keep_prop = tf.placeholder(tf.float32)

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

# TODO: Print logits from a session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print (sess.run(logits, feed_dict={keep_prob: 0.5}))


[[  2.98000002   7.54000044]
 [  0.82600003   1.59000015]
 [  4.71999979  28.31999969]]
