<a href="https://colab.research.google.com/github/iamjudy/deep-learning-colab/blob/main/RNN_%E6%83%85%E6%84%8F%E5%88%86%E6%9E%90.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### 1. 讀入深度學習套件

In [None]:
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM
from tensorflow.keras.datasets import imdb

### 2. 讀入數據

一般自然語言處理, 我們會限制最大要使用的字數。

In [None]:
# 用 10000 字
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)
print(len(x_train))
print(len(x_test))

25000
25000


In [None]:
# 注意每筆評論的長度當然是不一樣的。
print(len(x_train[0]))
print(len(x_train[1]))

218
189


In [None]:
 # 正評 or 負評
print(y_train[0])
print(y_train[1])

1
0


### 3. 資料處理

雖然我們可以做真的 seq2seq, 可是資料長度不一樣對計算上有麻煩, 因此平常還是會固定一定長度, 其餘補 0。

In [None]:
x_train = sequence.pad_sequences(x_train, maxlen=100)
x_test = sequence.pad_sequences(x_test, maxlen=100)

### 4. step 01: 打造一個函數學習機

In [None]:
model = Sequential()
model.add(Embedding(10000, 256)) # shape 256 維

# 減少 LSTM 數量 128 --> 64
model.add(LSTM(64))

model.add(Dense(1, activation='sigmoid'))

In [None]:
model.compile(loss='binary_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 256)         2560000   
                                                                 
 lstm_1 (LSTM)               (None, 64)                82176     
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
Total params: 2,642,241
Trainable params: 2,642,241
Non-trainable params: 0
_________________________________________________________________


### 5. step 02: 訓練

In [None]:
# 減少至 5 epochs
model.fit(x_train, y_train, batch_size=32, epochs=5,
         validation_data=(x_test, y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f6e27aee310>

### 6. step 03: 測試

In [None]:
from tensorflow.keras.datasets.imdb import get_word_index

In [None]:
word_index = get_word_index()
word_index['this']
text = "this movie is worth seeing" 
seq = [word_index[x] for x in text.split()]
print(model.predict([seq]))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
[[0.96698916]]


In [None]:
text = "this genre of movie is boring" # 電影種類很無聊 -> 差評！
seq = [word_index[x] for x in text.split()]
print(model.predict([seq]))

[[0.23855028]]
