## 单层RNN
<img src="image/SimpleRNN.jpg"  width="500" >

## 具体的前向传播计算过程如下：
<img src="image/forward.png"  width="800" >

## 双向RNN(BRNN)
双向RNN需要保存两个方向的权重矩阵，所以需要的内存约为RNN的两倍
<img src="image/BiRNN.jpg"  width="300" >
<img src="image/formula.png"  width="500" >

## 多层双向RNN(DBRNN)  
当信息量太大的时候一次性保存不下所有重要信息，通过多个隐藏层可以保存更多的重要信息，正如我们看电视剧的时候也可能重复看同一集记住更多关键剧情
<img src="image/DBRNN.png"  width="300" >
注：  
每一层循环体中参数是共享的，但是不同层之间的权重矩阵是不同的。  
纵向有dropout，横向无dropout

Keras在layers包的recurrent模块中实现了RNN相关层模型的支持，并在wrapper模型中实现双向RNN包装器。  
recurrent模块中的RNN模型包括RNN、LSTM、GRU等模型：

1.RNN：全连接RNN模型

**SimpleRNN(units,activation=’tanh’,dropout=0.0,recurrent_dropout=0.0, return_sequences=False)**

2.LSTM：长短记忆模型

**LSTM(units,activation=’tanh’,dropout=0.0,recurrent_dropout=0.0,return_sequences=False)**

3.GRU：门限循环单元

**GRU(units,activation=’tanh’,dropout=0.0,recurrent_dropout=0.0,return_sequences=False)**

4.参数说明：

- units: RNN输出的维度

- activation: 激活函数，默认为tanh

- dropout: 0~1之间的浮点数，控制输入线性变换的神经元失活的比例

- recurrent_dropout：0~1之间的浮点数，控制循环状态的线性变换的神经元失活比例

- return_sequences: True返回整个序列,用于stack两个层，False返回输出序列的最后一个输出，若模型为深层模型时设为True

- input_dim: 当使用该层为模型首层时，应指定该值

- input_length: 当输入序列的长度固定时，该参数为输入序列的长度。当需要在该层后连接Flatten层，然后又要连接Dense层时，需要指定该参数  

wrapper模块实现双向RNN模型：

双向RNN包装器  
**Bidirectional(layer, merge_mode=’concat’, weights=None)**

参数说明:

- layer: SimpleRNN、LSTM、GRU等模型结构，确定是哪种RNN的双向模型  

- Merge_mode: 前向和后向RNN输出的结合方式，为sum,mul,concat,ave和None之一，若为None，则不结合，以列表形式返回，若是上文说到的拼接则为concat

### 使用Imdb数据集进行情感分析(二分类)

In [1]:
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout, Bidirectional, TimeDistributed
from keras.layers.recurrent import SimpleRNN
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.models import Model
from keras.callbacks import EarlyStopping
import os
import tarfile
import numpy as np
import re

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


### 读取数据
数据清洗：去除含有html标签的  
分词：此处为英文,不需  
去停用词：可以去除”the”、”a”等词,此处没加

In [2]:
import re
def rm_tags(text):
    re_tag = re.compile(r'<[^>]+>')
    return re_tag.sub('', text)

def read_files(filetype):
    """
    filetype: 'train' or 'test'
    return:
    all_texts: filetype数据集文本
    all_labels: filetype数据集标签
    """
    # 标签1表示正面，0表示负面
    all_labels = [1]*12500 + [0]*12500
    all_texts = []
    file_list = []
    path = r'./data/aclImdb/'
    # 读取正面文本名
    pos_path = path + filetype + '/pos/'
    for file in os.listdir(pos_path):
        file_list.append(pos_path+file)
    # 读取负面文本名
    neg_path = path + filetype + '/neg/'
    for file in os.listdir(neg_path):
        file_list.append(neg_path+file)
    # 将所有文本内容加到all_texts
    for file_name in file_list:
        with open(file_name, encoding='utf-8') as f:
            all_texts.append(rm_tags(" ".join(f.readlines())))
    return all_texts, all_labels

In [None]:
# 已解压过了
# tfile = tarfile.open(r'./data/aclImdb_v1.tar.gz', 'r:gz')  # r;gz是读取gzip压缩文件
# result = tfile.extractall('./data/')  # 解压缩文件到data文件夹中

In [3]:
train_texts, train_labels = read_files('train')
test_texts, test_labels = read_files('test')

### 处理成深度学习需要的数据格式

In [4]:
def preprocessing(train_texts, train_labels, test_texts, test_labels):
    tokenizer = Tokenizer(num_words=3800)  
    tokenizer.fit_on_texts(train_texts)
    # 对每一句影评文字转换为数字列表，使用每个词的编号进行编号
    x_train_seq = tokenizer.texts_to_sequences(train_texts)
    x_test_seq = tokenizer.texts_to_sequences(test_texts)
    x_train = sequence.pad_sequences(x_train_seq, maxlen=380)
    x_test = sequence.pad_sequences(x_test_seq, maxlen=380)
    y_train = np.array(train_labels)
    y_test = np.array(test_labels)
    return x_train, y_train, x_test, y_test

In [5]:
x_train, y_train, x_test, y_test = preprocessing(train_texts, train_labels, test_texts, test_labels)

### RNN模型
Embedding + RNN + FC1 + sigmoid

In [6]:
def RNN(maxlen = 380, max_features = 3800, embed_size = 32):
    model = Sequential()
    model.add(Embedding(max_features, embed_size, input_length=maxlen))
    model.add(Dropout(0.5))
    model.add(SimpleRNN(16))
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))
    return model

model.summary()：可以查看模型结构和参数等信息，便于理解模型

In [7]:
model = RNN()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 380, 32)           121600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 380, 32)           0         
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 16)                784       
_________________________________________________________________
dense_1 (Dense)              (None, 256)               4352      
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 257       
Total params: 126,993
Trainable params: 126,993
Non-trainable params: 0
_________________________________________________________________


### BiRNN模型
Embedding + BiRNN + Flatten + sigmoid  
使用wrappers包的Bidirecitional模块实现双向RNN模型，并且要将return_sequences参数设置为True,因为如上文所述需要将前、后向的重要信息拼接起来，所以需要将整个序列返回，而不是只返回最后一个预测词。

并且上文提到的是将前后向的进行拼接，所以使用的是’concat’，也可以使用sum对前后向结果求和或者其他对结果进行相应的操作。

In [6]:
def BRNN(maxlen = 380, max_features = 3800, embed_size = 32):
    model = Sequential()
    model.add(Embedding(max_features, embed_size, input_length=maxlen))
    model.add(Dropout(0.5))
    model.add(Bidirectional(SimpleRNN(16, return_sequences=True), merge_mode='concat'))
    model.add(Dropout(0.5))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))
    return model

In [7]:
model = BRNN()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 380, 32)           121600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 380, 32)           0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 380, 32)           1568      
_________________________________________________________________
dropout_2 (Dropout)          (None, 380, 32)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 12160)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 12161     
Total params: 135,329
Trainable params: 135,329
Non-trainable params: 0
_________________________________________________________________


### DBRNN
Embedding + BiRNN + BiRNN + sigmoid  
搭建一个两层的DBRNN模型，只需要再加一层SimpleRNN即可。  
要注意的是，如果要搭建多层DBRNN模型，除了最后一层SimpleRNN外，其他的SimpleRNN层都需要将return_sequences参数设置为True

In [6]:
def DBRNN(maxlen = 380, max_features = 3800, embed_size = 32):
    model = Sequential()
    model.add(Embedding(max_features, embed_size, input_length=maxlen))
    model.add(Dropout(0.5))
    model.add(Bidirectional(SimpleRNN(16, return_sequences=True), merge_mode='concat'))
    model.add(SimpleRNN(8))  #默认return_sequences=False
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))
    return model

In [7]:
model = DBRNN()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 380, 32)           121600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 380, 32)           0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 380, 32)           1568      
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 8)                 328       
_________________________________________________________________
dropout_2 (Dropout)          (None, 8)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 9         
Total params: 123,505
Trainable params: 123,505
Non-trainable params: 0
_________________________________________________________________


### 引入EarlyStopping，当验证集准确率不再改善时停止训练
之所以要提前停止训练，是因为继续训练会导致测试集上的准确率下降。那继续训练导致测试准确率下降的原因可能是  
1. 过拟合  
2. 学习率过大导致不收敛   
3. 使用正则项的时候，Loss的减少可能不是因为准确率增加导致的，而是因为权重大小的降低

在model.fit函数中调用callbacks，fit函数中有一个参数为callbacks。注意这里需要输入的是list类型的数据，所以通常情况只用EarlyStopping的话也要是[EarlyStopping()]

keras.callbacks.EarlyStopping(monitor=’val_loss’, patience=0, verbose=0, mode=’auto’)

参数说明：

monitor：需要监视的量，如’val_loss’, ‘val_acc’, ‘acc’, ‘loss’。

patience：能够容忍多少个epoch内都没有improvement。

verbose：信息展示模式

mode：‘auto’，‘min’，‘max’之一，在min模式下，如果检测值停止下降则中止训练。在max模式下，当检测值不再上升则停止训练。例如，当监测值为val_acc时，模式应为max，当检测值为val_loss时，模式应为min。在auto模式下，评价准则由被监测值的名字自动推断。

In [8]:
es = EarlyStopping(monitor='val_acc', patience=6)

### 编译模型

In [9]:
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])

### 训练RNN模型

In [10]:
batch_size = 128
epochs = 20
model.fit(x_train, y_train,
          validation_split=0.1,
          batch_size=batch_size,
          epochs=epochs,
          callbacks=[es],
          shuffle=True)

Train on 22500 samples, validate on 2500 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20


<keras.callbacks.History at 0x1b9c41a39b0>

### 训练BiRNN模型
需要重启kernel

In [10]:
batch_size = 128
epochs = 20
model.fit(x_train, y_train,
          validation_split=0.1,
          batch_size=batch_size,
          epochs=epochs,
          callbacks=[es],
          shuffle=True)

Train on 22500 samples, validate on 2500 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20


<keras.callbacks.History at 0x29382254748>

### 训练DBRNN模型
需要重启kernel

In [10]:
batch_size = 128
epochs = 20
model.fit(x_train, y_train,
          validation_split=0.1,
          batch_size=batch_size,
          epochs=epochs,
          callbacks=[es],
          shuffle=True)

Train on 22500 samples, validate on 2500 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20


<keras.callbacks.History at 0x21d4472be48>

### 预测模型

In [13]:
scores = model.evaluate(x_test, y_test)



In [12]:
print('RNN:test_loss: %f, accuracy: %f' % (scores[0], scores[1]))

RNN:test_loss: 0.465460, accuracy: 0.859240


In [15]:
print('BRNN:test_loss: %f, accuracy: %f' % (scores[0], scores[1]))

BRNN:test_loss: 0.391547, accuracy: 0.866680


In [15]:
print('DBRNN:test_loss: %f, accuracy: %f' % (scores[0], scores[1]))

DBRNN:test_loss: 0.385426, accuracy: 0.857360
