In [1]:
import tensorflow as tf

## 关于Optimizer
TensorFlow会自动求导，然后更新参数，使用一行代码tf.train.GradientDescentOptimizer(learning_rate=1e-3).minimize(loss)，下面我们将其细分开来，讲一讲每一步。
### 自动梯度
首先优化函数的定义就是前面一部分opt = tf.train.GradientDescentOptimizer(learning_rate)，定义好优化函数之后，可以通过grads_and_vars = opt.compute_gradients(loss, <list of variables>)来计算loss对于一个变量列表里面每一个变量的梯度，得到的grads_and_vars是一个list of tuples，list中的每个tuple都是由(gradient, variable)构成的，我们可以通过get_grads_and_vars = [(gv[0], gv[1]) for gv in grads_and_vars]将其分别取出来，然后通过opt.apply_gradients(get_grads_and_vars)来更新里面的参数，下面我们举一个小例子。

In [2]:

x = tf.Variable(5, dtype=tf.float32)
y = tf.Variable(3, dtype=tf.float32)

z = x**2 + x * y + 3

sess = tf.Session()
# initialize variable
sess.run(tf.global_variables_initializer())

# define optimizer
optimizer = tf.train.GradientDescentOptimizer(0.1)

# compute gradient z w.r.t x and y
grads_and_vars = optimizer.compute_gradients(z, [x, y])

# fetch the variable
get_grads_and_vars = [(gv[0], gv[1]) for gv in grads_and_vars]

# dz/dx = 2*x + y= 13
# dz/dy = x = 5
print('grads and variables')
print('x: grad {}, value {}'.format(
sess.run(get_grads_and_vars[0][0]), sess.run(get_grads_and_vars[0][1])))

print('y: grad {}, value {}'.format(
sess.run(get_grads_and_vars[1][0]), sess.run(get_grads_and_vars[1][1])))

print('Before optimization')
print('x: {}, y: {}'.format(sess.run(x), sess.run(y)))

# optimize parameters
opt = optimizer.apply_gradients(get_grads_and_vars)
# x = x - 0.1 * dz/dx = 5 - 0.1 * 13 = 3.7
# y = y - 0.1 * dz/dy = 3 - 0.1 * 5 = 2.5
print('After optimization using learning rate 0.1')
sess.run(opt)
print('x: {:.3f}, y: {:.3f}'.format(sess.run(x), sess.run(y)))
sess.close()

grads and variables
x: grad 13.0, value 5.0
y: grad 5.0, value 3.0
Before optimization
x: 5.0, y: 3.0
After optimization using learning rate 0.1
x: 3.700, y: 2.500


在实际中，我们当然不用手动更新参数，optimizer类可以帮我们自动更新，另外还有一个函数也能够计算梯度。
```
tf.gradients(ys, xs, grad_ys=None, name='gradients', colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)
```
这个函数会返回list，list的长度就是xs的长度，list中每个元素都是![公式](https://www.zhihu.com/equation?tex=sum_%7Bys%7D%28dys%2Fdx%29)。
- 实际运用: 这个方法对于只训练部分网络非常有用，我们能够使用上面的函数只对网络中一部分参数求梯度，然后对他们进行梯度的更新。

### 优化函数类型
随机梯度下降(GradientDescentOptimizer)仅仅只是tensorflow中一个小的更新方法，下面是tensorflow目前支持的更新方法的总结
```
tf.train.GradientDescentOptimizer
tf.train.AdadeltaOptimizer
tf.train.AdagradOptimizer
tf.train.AdagradDAOptimizer
tf.train.MomentumOptimizer
tf.train.AdamOptimizer
tf.train.FtrlOptimizer
tf.train.ProximalGradientDescentOptimizer
tf.train.ProximalAdagradOptimizer
tf.train.RMSPropOptimizer
```
这个[博客](https://link.zhihu.com/?target=http%3A//sebastianruder.com/optimizing-gradient-descent/)对上面的方法都做了介绍，感兴趣的同学可以去看看，另外cs231n和coursera的神经网络课程也对各种优化算法做了介绍。

## TensorFlow 中的Logistic Regression
我们使用简单的logistic regression来解决分类问题，使用MNIST手写字体，我们的模型公式如下
![](https://www.zhihu.com/equation?tex=logits+%3D+X+%2A+w+%2B+b+%5C%5C+Y_%7Bpredicted%7D+%3D+softmax%28logits%29%5C%5C+loss+%3D+CrossEntropy%28Y%2C+Y_%7Bpredicted%7D%29)

### TensorFlow实现
TF Learn中内置了一个脚本可以读取MNIST数据集

In [3]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('./data/mnist', one_hot=True)

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ./data/mnist\train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ./data/mnist\train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting ./data/mnist\t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting ./data/mnist\t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/datas

接着定义占位符(placeholder)和权重参数

In [4]:
x = tf.placeholder(tf.float32, shape=[None, 784], name='image')
y = tf.placeholder(tf.int32, shape=[None, 10], name='label')

w = tf.get_variable('weight', shape=[784, 10], initializer=tf.truncated_normal_initializer())
b = tf.get_variable('bias', shape=[10], initializer=tf.zeros_initializer())

输入数据的shape=[None, 784]表示第一维接受任何长度的输入，第二维等于784是因为28x28=784。权重w使用均值为0,方差为1的正态分布，偏置b初始化为0。

然后定义预测结果、loss和优化函数

In [5]:
learning_rate = 0.1
logits = tf.matmul(x, w) + b
entropy = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(entropy, axis=0)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



使用tf.matmul做矩阵乘法，然后使用分类问题的loss函数交叉熵，最后将一个batch中的loss求均值，对其使用随机梯度下降法

因为数据集中有测试集，所以可以在测试集上验证其准确率

In [6]:
preds = tf.nn.softmax(logits)
correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1))
accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32), axis=0)

首先对输出结果进行softmax得到概率分布，然后使用tf.argmax得到预测的label，使用tf.equal得到预测的label和实际的label相同的个数，这是一个长为batch的0-1向量，然后使用tf.reduce_sum得到正确的总数。

最后在session中运算，这个过程就不再赘述。

### 结果与可视化
最后可以得到训练集的loss的验证集准确率如下

In [7]:
import time
batch_size = 128
n_epochs = 10

with tf.Session() as sess:
    writer = tf.summary.FileWriter('./logistic_log', sess.graph)
    start_time = time.time()
    sess.run(tf.global_variables_initializer())
    n_batches = int(mnist.train.num_examples / batch_size)
    for i in range(n_epochs):  # train the model n_epochs times
        total_loss = 0
        for _ in range(n_batches):
            X_batch, Y_batch = mnist.train.next_batch(batch_size)
            _, loss_batch = sess.run(
                [optimizer, loss], feed_dict={x: X_batch,
                                              y: Y_batch})
            total_loss += loss_batch
        print('Average loss epoch {0}: {1}'.format(i, total_loss / n_batches))

    print('Total time: {0} seconds'.format(time.time() - start_time))

    print('Optimization Finished!')  # should be around 0.35 after 25 epochs

    # test the model
    n_batches = int(mnist.test.num_examples / batch_size)
    total_correct_preds = 0

    for i in range(n_batches):
        X_batch, Y_batch = mnist.test.next_batch(batch_size)
        accuracy_batch = sess.run(accuracy, feed_dict={x: X_batch, y: Y_batch})
        total_correct_preds += accuracy_batch

    print('Accuracy {0}'.format(total_correct_preds / mnist.test.num_examples))


Average loss epoch 0: 2.6075366517324827
Average loss epoch 1: 1.038326608273255
Average loss epoch 2: 0.8234267927410998
Average loss epoch 3: 0.7197303167589895
Average loss epoch 4: 0.659674979277424
Average loss epoch 5: 0.6105360811129039
Average loss epoch 6: 0.5785137918584552
Average loss epoch 7: 0.5510510732799699
Average loss epoch 8: 0.5296188234101921
Average loss epoch 9: 0.5095977137138793
Total time: 6.411830186843872 seconds
Optimization Finished!
Accuracy 0.8773
