## Keras实现Recurrent Neural Networks

使用MNIST数据集，主要用SimpleRNN实现。(LSTM, GRU是LSTM的简化版本，只有2个Gate)

In [41]:
import numpy as np
np.random.seed(100)  # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.utils import np_utils
from keras.layers import Dense, Activation, SimpleRNN
from keras.optimizers import Adam

### 数据处理
* MNIST里面的图像分辨率是28×28，为了使用RNN，我们将图像理解为序列化数据。 每一行作为一个输入单元，所以输入数据大小INPUT_SIZE = 28； 先是第1行输入，再是第2行，第3行，第4行，…，第28行输入， 这就是一张图片也就是一个序列，所以步长TIME_STEPS = 28。
* 训练数据要进行归一化处理，因为原始数据是8bit灰度图像所以需要除以255。

In [42]:
# download the mnist to the path '~/.keras/datasets/' if it is the first time to be called
# X shape (60000, 28, 28), y shape (60000,)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# data pre-processing
X_train = X_train.reshape(-1, 28, 28) / 255.   # normalize
X_test = X_test.reshape(-1, 28, 28) / 255.      # normalize
# convert to category 
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test, num_classes=10)

### 定义变量

In [43]:
TIME_STEPS = 28  # 等于图片高度，时间长度，28个时间点
INPUT_SIZE = 28  # 等于图片宽度，每一行读取多少个pixel
BATCH_SIZE = 100  # 一个批次图片数
BATCH_INDEX = 0
OUTPUT_SIZE = 10
CELL_SIZE = 100   # hidden layer
LR = 0.001

### 定义模型、编译模型

In [44]:
model = Sequential()
model.add(SimpleRNN(
    # for batch_input_shape, if using tensorflow as the backend, we have to put None for the batch_size.
    # Otherwise, model.evaluate() will get error.
    batch_input_shape=(None, TIME_STEPS, INPUT_SIZE),       
    output_dim=CELL_SIZE,
    return_sequences=False, # 只在最后一个时间点输出值
    unroll=True,
))

model.add(Dense(OUTPUT_SIZE))
model.add(Activation('softmax'))

# 定义优化器
adam = Adam(lr=0.001)
# 编译模型
model.compile(optimizer=adam,
              loss='categorical_crossentropy',
              metrics=['accuracy'])



  import sys


### 训练模型
每次训练的时候并不是取所有的数据，只是取BATCH_SIZE个序列，或者称为BATCH_SIZE张图片，这样可以大大降低运算时间，提高训练效率。

In [45]:

for step in range(601*10):  # epochs=10, 600 * 100 = 60000一个epoch
    # data shape = (batch_num, steps, inputs/outputs)
    X_batch = X_train[BATCH_INDEX:BATCH_INDEX+BATCH_SIZE, :, :]
    Y_batch = y_train[BATCH_INDEX:BATCH_INDEX+BATCH_SIZE, :]
    loss = model.train_on_batch(X_batch, Y_batch)
    BATCH_INDEX += BATCH_SIZE
    BATCH_INDEX = 0 if BATCH_INDEX >= X_train.shape[0] else BATCH_INDEX
    
    if step % 1000 == 0:
        loss, accuracy = model.evaluate(X_test, y_test, batch_size=y_test.shape[0], verbose=False)
        print(f"Step: {step}\tTest Loss: {loss}, \tTest accuracy: {accuracy}")

Step: 0	Test Loss: 2.2865800857543945, 	Test accuracy: 0.13439999520778656
Step: 1000	Test Loss: 0.22341260313987732, 	Test accuracy: 0.933899998664856
Step: 2000	Test Loss: 0.17896071076393127, 	Test accuracy: 0.9490000009536743
Step: 3000	Test Loss: 0.16045717895030975, 	Test accuracy: 0.9528999924659729
Step: 4000	Test Loss: 0.15019135177135468, 	Test accuracy: 0.9578999876976013
Step: 5000	Test Loss: 0.11457083374261856, 	Test accuracy: 0.9667999744415283
Step: 6000	Test Loss: 0.13894927501678467, 	Test accuracy: 0.9609000086784363


In [35]:
X_train.shape

(60000, 28, 28)