# RNN 做图像分类
前面我们讲了 RNN 特别适合做序列类型的数据，那么 RNN 能不能想 CNN 一样用来做图像分类呢？下面我们用 mnist 手写字体的例子来展示一下如何用 RNN 做图像分类，但是这种方法并不是主流，这里我们只是作为举例。

对于一张手写字体的图片，其大小是 28 * 28，我们可以将其看做是一个长为 28 的序列，每个序列的特征都是 28，也就是

![](https://ws4.sinaimg.cn/large/006tKfTcly1fmu7d0byfkj30n60djdg5.jpg)

这样我们解决了输入序列的问题，对于输出序列怎么办呢？其实非常简单，虽然我们的输出是一个序列，但是我们只需要保留其中一个作为输出结果就可以了，这样的话肯定保留最后一个结果是最好的，因为最后一个结果有前面所有序列的信息，就像下面这样

![](https://ws3.sinaimg.cn/large/006tKfTcly1fmu7fpqri0j30c407yjr8.jpg)

下面我们直接通过例子展示

In [1]:
from __future__ import division
from __future__ import absolute_import
from __future__ import print_function

import numpy as np

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.examples.tutorials.mnist.input_data as input_data

from utils.layers import lstm

tf.set_random_seed(2017)

  from ._conv import register_converters as _register_converters


导入`mnist`数据集

In [3]:
mnist = input_data.read_data_sets('MNIST_data', one_hot=True, reshape=False)
train_set = mnist.train
test_set = mnist.test

Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


观察一个批次数据的信息

In [4]:
train_imgs, train_labels = train_set.next_batch(64)

In [5]:
print(train_imgs.shape)
print(train_labels.shape)

(64, 28, 28, 1)
(64, 10)


回忆之前我们在构造`rnn`的初始状态的时候需要指定`batch_size`, 在构造`RNNCell`的`dropout`的时候需要知道`keep_prob`. 它们在训练和测试的时候明显应当是不同的取值, 我们可以非常方便的用`占位符`来实现这种不同

In [6]:
input_ph = tf.placeholder(shape=(None, 28, 28, 1), dtype=tf.float32)
label_ph = tf.placeholder(shape=(None, 10), dtype=tf.int64)
batch_size_ph = tf.placeholder(tf.int32, [])
keep_prob_ph = tf.placeholder(tf.float32, [])

将数据转化成满足`RNN`输入的形式

In [7]:
inputs = tf.transpose(tf.squeeze(input_ph, axis=[-1]), (1, 0, 2))

In [8]:
print(inputs.shape)

(28, ?, 28)


这样第一维就是时间步长, 第二维是`batch_size`, 第三维是输入特征个数

### 定义`rnn`分类模型

In [9]:
def rnn_classify(inputs, rnn_units=100, rnn_layers=2, batch_size=64, keep_prob=1, num_classes=10):
    # 构造一个多层`rnn`模型
    rnn_out, rnn_state = lstm(inputs, rnn_units, rnn_layers, batch_size, keep_prob=keep_prob)
    
    # 取出最后一个输出当作分类层的输入特征向量
    net = rnn_out[-1]
    
    # 最后连接一个分类层
    net = slim.flatten(net)
    net = slim.fully_connected(net, num_classes, activation_fn=None, scope='classification')
    
    return net

out = rnn_classify(inputs, batch_size=batch_size_ph, keep_prob=keep_prob_ph)

定义`loss`和`train_op`

In [10]:
loss = tf.losses.softmax_cross_entropy(logits=out, onehot_labels=label_ph)

acc = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(out, axis=-1), tf.argmax(label_ph, axis=-1)), dtype=tf.float32))

lr = 0.01
optimizer = tf.train.MomentumOptimizer(lr, 0.9)
train_op = optimizer.minimize(loss)

开始训练

In [11]:
sess = tf.InteractiveSession()

In [12]:
sess.run(tf.global_variables_initializer())

for e in range(10000):
    images, labels = train_set.next_batch(64)
    sess.run(train_op, feed_dict={input_ph: images, label_ph: labels, batch_size_ph: 64, keep_prob_ph: 0.5})
    if e % 1000 == 999:
        test_imgs, test_labels = test_set.next_batch(128)
        loss_train, acc_train = sess.run([loss, acc], feed_dict={input_ph: images, label_ph: labels, batch_size_ph: 64, keep_prob_ph: 1.0})
        loss_test, acc_test = sess.run([loss, acc], feed_dict={input_ph: test_imgs, label_ph: test_labels, batch_size_ph: 128, keep_prob_ph: 1.0})
        print('STEP {}: train_loss: {:.6f} train_acc: {:.6f} test_loss: {:.6f} test_acc: {:.6f}'.format(e + 1, loss_train, acc_train, loss_test, acc_test))

print('Train Done!')
print('-'*30)

train_loss = []
train_acc = []
for _ in range(train_set.num_examples // 100):
    image, label = train_set.next_batch(100)
    loss_train, acc_train = sess.run([loss, acc], feed_dict={input_ph: image, label_ph: label, batch_size_ph: 100, keep_prob_ph: 1.0})
    train_loss.append(loss_train)
    train_acc.append(acc_train)

print('Train loss: {:.6f}'.format(np.array(train_loss).mean()))
print('Train accuracy: {:.6f}'.format(np.array(train_acc).mean()))

test_loss = []
test_acc = []
for _ in range(test_set.num_examples // 100):
    image, label = test_set.next_batch(100)
    loss_test, acc_test = sess.run([loss, acc], feed_dict={input_ph: image, label_ph: label, batch_size_ph: 100, keep_prob_ph: 1.0})
    test_loss.append(loss_test)
    test_acc.append(acc_test)

print('Test loss: {:.6f}'.format(np.array(test_loss).mean()))
print('Test accuracy: {:.6f}'.format(np.array(test_acc).mean()))

STEP 1000: train_loss: 0.187749 train_acc: 0.953125 test_loss: 0.172851 test_acc: 0.953125
STEP 2000: train_loss: 0.126416 train_acc: 0.968750 test_loss: 0.145504 test_acc: 0.953125
STEP 3000: train_loss: 0.008995 train_acc: 1.000000 test_loss: 0.072065 test_acc: 0.968750
STEP 4000: train_loss: 0.011813 train_acc: 1.000000 test_loss: 0.039175 test_acc: 0.992188
STEP 5000: train_loss: 0.003073 train_acc: 1.000000 test_loss: 0.043307 test_acc: 0.984375
STEP 6000: train_loss: 0.037937 train_acc: 0.984375 test_loss: 0.057352 test_acc: 0.976562
STEP 7000: train_loss: 0.022462 train_acc: 0.984375 test_loss: 0.019089 test_acc: 0.992188
STEP 8000: train_loss: 0.001094 train_acc: 1.000000 test_loss: 0.092557 test_acc: 0.968750
STEP 9000: train_loss: 0.008558 train_acc: 1.000000 test_loss: 0.046139 test_acc: 0.976562
STEP 10000: train_loss: 0.027353 train_acc: 1.000000 test_loss: 0.107489 test_acc: 0.976562
Train Done!
------------------------------
Train loss: 0.033395
Train accuracy: 0.989364


可以看到，训练 10000 次在简单的 mnist 数据集上也取得的了 98% 的准确率，所以说 RNN 也可以做做简单的图像分类，但是这并不是他的主战场，下次课我们会讲到 RNN 的一个使用场景，时间序列预测。