# 使用雙向(Bidirectional)LSTM進行影評資料(IMDB)分類

## 程式參考來源：
- https://keras.io/examples/nlp/bidirectional_lstm_imdb/

Keras 內建資料集 IMDB 為整數向量，事先已將文字轉為數字，一般原始資料應為文字

- 文字分類(Text Classification)：影評情緒分析(Sentiment Analysis)。
- 翻譯(Text Translation)：英文轉中文。

In [1]:
# 載入套件及參數設定。
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

# 只考慮 20000 個字彙
max_features = 20000  

# 每則影評只考慮前 200 個字
maxlen = 200  


## 建立模型

In [2]:
# 可輸入不定長度的整數陣列
inputs = keras.Input(shape=(None,), dtype="int32")

x = layers.Embedding(max_features, 128)(inputs)
# 使用 2 個 bidirectional LSTM
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(64))(x)
# 分類
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.summary()


Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, None)]            0         
_________________________________________________________________
embedding (Embedding)        (None, None, 128)         2560000   
_________________________________________________________________
bidirectional (Bidirectional (None, None, 128)         98816     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128)               98816     
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
Total params: 2,757,761
Trainable params: 2,757,761
Non-trainable params: 0
_________________________________________________________________


## 載入IMDB資料集 

In [3]:
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(
    num_words=max_features
)
print(f'訓練資料筆數：{len(x_train)}')
print(f'測試資料筆數：{len(x_test)}')

# 不足長度，後面補0
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)


訓練資料筆數：25000
測試資料筆數：25000


## 訓練模型

In [4]:
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=32, epochs=2, validation_split=0.2)


Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x2a1a18724f0>

## 模型評估

In [5]:
model.evaluate(x_test, y_test)



[0.3551137447357178, 0.8645600080490112]