<a href="https://colab.research.google.com/github/raamiiChu/111-1_NCCU_DCT_3D_Game_Programming_G2/blob/main/0502_%E4%BD%9C%E6%A5%AD/%E6%A8%A1%E5%9E%8B%E8%A8%93%E7%B7%B4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font color=#FF3030>更改處皆以紅色字體標示在對應程式碼上方</font>

### 更改概要

1. 讀入 ipywidgets 的 interact_manual 
1. num_words 修改為 30000
1. 檢視原來的文字內容
1. maxlen 修改為 250
1. Embedding 修改為 (30000, 256)
1. 新增一層 LSTM，神經元數目為 256，return_sequences = True
1. 原 LSTM 的神經元數目修改為 256，return_sequences = False
1. loss function 修改為 "mse"
1. epochs 修改為 3
1. 使用 interact_manual 來測試

### 1. 讀入模組

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding
from tensorflow.keras.layers import LSTM

<font color=#FF3030>讀入 interact_manual</font>

In [None]:
from tensorflow.keras.datasets import imdb
from tensorflow.keras.datasets.imdb import get_word_index

from ipywidgets import interact_manual

### 2. 讀入數據



一般自然語言處理, 我們會限制最大要使用的字數。

<font color=#FF3030>num_words 修改為 30000</font>

In [None]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=30000)

In [None]:
print(len(x_train), len(x_test), sep=" | ")

25000 | 25000


注意每筆評論的長度當然是不一樣的。

In [None]:
print(len(x_train[0]), len(x_train[1]), sep=" | ")

218 | 189


In [None]:
print(y_train[0], y_train[1], sep=" | ")

1 | 0


<font color=#FF3030>檢視原來的文字內容</font>

[程式碼參考來源](https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset)

In [None]:
# word_index ---- {單字:索引值}
word_index = get_word_index()

In [None]:
def decoding(num:int) -> str:
    # 逐一拿取索引值，獲取對應單字，若找不到則以 "?" 代替
    return " ".join([reverse_word_index.get(i - 3, '?') for i in x_train[num]])

# reverse_word_index ---- {索引值:單字}
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

In [None]:
interact_manual(decoding, num=(0,24999));

interactive(children=(IntSlider(value=12499, description='num', max=24999), Button(description='Run Interact',…

### 3. 資料處理



雖然我們可以做真的 seq2seq, 可是資料長度不一樣對計算上有麻煩, 因此平常還是會固定一定長度, 其餘補 0。

<font color=#FF3030>maxlen 修改為 250</font>

In [None]:
x_train = sequence.pad_sequences(x_train, maxlen=250)
x_test = sequence.pad_sequences(x_test, maxlen=250)

### 4. step 01: 打造一個函數學習機

- <font color=#FF3030>Embedding 修改為 (30000, 256)</font>
- <font color=#FF3030>新增一層 LSTM，神經元數目為 256，return_sequences = True</font>
- <font color=#FF3030>原 LSTM 的神經元數目修改為 256，return_sequences = False</font>

In [None]:
model = Sequential()

In [None]:
model.add(Embedding(30000, 256))

return_sequences: 返回單個（False，默認值）或是多個（True） hidden state 

In [None]:
model.add(LSTM(256, return_sequences = True))

與 Ouput Layer 做連接的 LSTM 的 return_sequences 要設定為 False
（其實 return_sequences = False 可打可不打，因為默認值就是False）

In [None]:
model.add(LSTM(256, return_sequences = False))  

In [None]:
model.add(Dense(1, activation='sigmoid'))

#### 組裝

loss function 修改為 "mse"

In [None]:
model.compile(loss='mse',
             optimizer='adam',
             metrics=['accuracy'])

#### 欣賞我們的 model

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 256)         7680000   
                                                                 
 lstm (LSTM)                 (None, None, 256)         525312    
                                                                 
 lstm_1 (LSTM)               (None, 256)               525312    
                                                                 
 dense (Dense)               (None, 1)                 257       
                                                                 
Total params: 8,730,881
Trainable params: 8,730,881
Non-trainable params: 0
_________________________________________________________________


### 5. step 02: 訓練

<font color=#FF3030>epochs 修改為 3</font>

In [None]:
model.fit(x_train, y_train, batch_size=32, epochs=3,
         validation_data=(x_test, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f54e383dd90>

### 6. step 03: 測試

<font color=#FF3030>使用 interact_manual 來測試</font>

##### 自行輸入評論

In [None]:
#text = "this movie is worth seeing"
#text = "could of been so much better if properly cast directed and a better script"
#text = "do not watch this movie"

In [None]:
def positive_or_negative(text:str) -> str:
    seq = [word_index[x] for x in text.split() if x in word_index]
    return f"score: {model.predict([seq])[0][0]:.3f}"

In [None]:
interact_manual(positive_or_negative, text="Please type your comment here");

interactive(children=(Text(value='Please type your comment here', description='text'), Button(description='Run…

##### 使用測試資料輸入評論

In [None]:
def test_x_text(num:int) -> None:
    text = " ".join([reverse_word_index.get(i - 3, '?') for i in x_test[num] if i != 0])
    seq = [word_index[x] for x in text.split() if x in word_index]
    print(f"predict score: {model.predict([seq])[0][0]:.3f}", 
          f"real score: {y_test[num]}", 
          f"details:{text}", sep="\n")

reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

In [None]:
interact_manual(test_x_text, num=(0,24999));

interactive(children=(IntSlider(value=12499, description='num', max=24999), Button(description='Run Interact',…

### 7. 換個存檔方式



這次是把 model 和訓練權重分開存, 使用上更有彈性。

In [None]:
from google.colab import drive

drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%cd '/content/drive/My Drive/Colab Notebooks/數學軟體應用/作業/0502 作業/'

/content/drive/My Drive/Colab Notebooks/數學軟體應用/作業/0502 作業


In [None]:
model.save("imdb_model")



INFO:tensorflow:Assets written to: imdb_model/assets


INFO:tensorflow:Assets written to: imdb_model/assets


In [None]:
model_json = model.to_json()
open('imdb_model_architecture.json', 'w').write(model_json)
model.save_weights('ismdb_model_weights.h5')