# RNN情意分析實作
資料集：imdb電影評論資料集

### 1. 讀入深度學習套件

In [None]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
from tensorflow.keras.preprocessing import sequence #用來做一連串處理的
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding #embedding可以將輸入的10000維向量做適當的縮小
from tensorflow.keras.layers import GRU
from tensorflow.keras.datasets import imdb

### 2. 讀入數據

一般自然語言處理, 我們會限制最大要使用的字數。

In [None]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


In [None]:
len(x_train)

25000

In [None]:
len(x_test)

25000

In [None]:
len(x_train[0])

218

In [None]:
len(x_train[1])

189

In [None]:
y_train[0]

1

In [None]:
y_train[1]

0

### 3. 資料處理

將最常出現的**120**個字當作我們的輸入資料。

In [None]:
x_train = sequence.pad_sequences(x_train, maxlen=120)
x_test = sequence.pad_sequences(x_test, maxlen=120)

### 4. step 01: 打造一個函數學習機
改為使用**GRU**作為中間層，並降低神經元個數為**100個**

In [None]:
model = Sequential()

In [None]:
model.add(Embedding(10000, 100))

In [None]:
model.add(GRU(100))

In [None]:
model.add(Dense(1, activation='sigmoid'))

#### 組裝

In [None]:
model.compile(loss='binary_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

#### 欣賞我們的 model

In [None]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 100)         1000000   
                                                                 
 gru_1 (GRU)                 (None, 100)               60600     
                                                                 
 dense_1 (Dense)             (None, 1)                 101       
                                                                 
Total params: 1,060,701
Trainable params: 1,060,701
Non-trainable params: 0
_________________________________________________________________


### 5. step 02: 訓練

改為使用batch_size=50!

達到訓練集96%正確率、驗證集近85%正確率



In [None]:
model.fit(x_train, y_train, batch_size=50, epochs=5,
         validation_data=(x_test, y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fafc55fc490>

### 6. step 03: 測試

In [None]:
from tensorflow.keras.datasets.imdb import get_word_index

In [None]:
word_index = get_word_index()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json


In [None]:
word_index['this']

11

In [None]:
text = "this movie is worth seeing"

In [None]:
seq = [word_index[x] for x in text.split()]

In [None]:
model.predict([seq])

array([[0.9938752]], dtype=float32)

In [None]:
text = "could of been so much better if properly cast directed and a better script"

In [None]:
seq = [word_index[x] for x in text.split()]

In [None]:
model.predict([seq])

array([[0.74226993]], dtype=float32)

### 6. 換個存檔方式

這次是把 model 和訓練權重分開存, 使用上更有彈性。

In [None]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd '/content/drive/My Drive/Colab Notebooks'

/content/drive/My Drive/Colab Notebooks


In [None]:
model_json = model.to_json()
open('imdb_model_architecture.json', 'w').write(model_json)
model.save_weights('ismdb_model_weights.h5')