## CNN也可以做電影評論分析！
利用Convolution1D。
我們想要丟一段電影評論給神經網路，讓偉大的神經網路告訴我們是正評或負評(用0或1表示)。    
使用的資料庫是keras裡的IMDb --    
該資料庫的編法是將評論中最常用的字排序編號，因此你可以看到x_train.shape是一堆數字的array，並以0或1表示該評論為正評或負評。

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt



## 讀入套件、準備資料    
我們這裡用的是Conv1D, GlobalMaxPooling1D訓練方式是adam。

In [4]:
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalMaxPooling1D

Using TensorFlow backend.


In [5]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=5000)

Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz


In [6]:
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

25000 train sequences
25000 test sequences


## 0. 修改資料格式
輸入資料需要做修改，輸出資料本來就已經是我們要的形式。
因為評論長度有長有短，但我們希望25000筆資料的長度都是一致的，因此我只取評論中的400字(maxlen)，太多就砍掉、太少就補0。

In [7]:
x_train.shape

(25000,)

In [8]:
x_train = sequence.pad_sequences(x_train, maxlen=400)
x_test = sequence.pad_sequences(x_test, maxlen=400)

In [9]:
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

x_train shape: (25000, 400)
x_test shape: (25000, 400)


## 1. 搭建網路

In [22]:
# set parameters:
max_features = 5000
maxlen = 400
batch_size = 32
embedding_dims = 50
filters = 250
kernel_size = 3
hidden_dims = 250
epochs = 2

In [23]:
model = Sequential()

In [24]:
# we start off with an efficient embedding layer which maps our vocab indices into embedding_dims dimensions
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))
model.add(Dropout(0.2))

In [25]:
# we add a Convolution1D, which will learn filters word group filters of size filter_length:
model.add(Conv1D(filters, kernel_size, padding='valid', activation='relu', strides=1))

hint:padding = 'valid'    
"valid"代表只進行有效的convolution，即對邊界數據不處理。    
"same"代表保留邊界處的convolution結果，通常會使輸出shape與输入shape相同。

In [26]:
# we use max pooling:
model.add(GlobalMaxPooling1D())

In [27]:
model.add(Dense(hidden_dims))
model.add(Dropout(0.2))
model.add(Activation('relu'))

In [28]:
# We project onto a single unit output layer, and squash it with a sigmoid:
model.add(Dense(1))
model.add(Activation('sigmoid'))

hint:輸出層因為要的是非0即1所以Activation fucntion選用sigmoid較合適。

In [29]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

## 2. 訓練開始！
為了節省時間請訓練2-5次就好，等你會到家要訓練幾次隨你高興:)

In [30]:
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_test, y_test))

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f89ed2d2f98>

## 3. 使用偉大的神經網路
* 秀出結果
* model.evaluate

In [31]:
score = model.evaluate(x_test, y_test)



In [32]:
loss, acc = score

In [33]:
loss, acc

(0.27500683599472048, 0.88336000000000003)

In [34]:
result = model.predict_classes(x_test)



In [36]:
x_test[1], result[1]

(array([ 229,   34, 1561,    2,    9,   87,  253,   55,  702,  728,  545,
         441, 2072,  958,    7,   85,  189,   22,   19,   52,    2,   39,
           4,  636,  720,  121,   75,   67, 1655,    2,    2, 2377,   39,
           4, 2553,    4, 4971,  108, 2281,    2,    2, 4626,    2,   39,
           4,    6, 1726,   23, 4903,  890,  201,  488, 4664, 2377,   39,
           4, 2195, 3135,    8,    4, 2974,  343,   39, 3452,    7,    2,
           2,   54,   12, 2360,    2,    4,  172,  136, 3452,    7,    2,
         115,  304,  410,  615,   63,    9,   43,   17,   73,   50,   26,
         775,    7,   31, 2433,  532,    2, 1994,   15, 2039, 4142,   93,
           2,    6,  171,  153,  908,   12,  152,  306, 1595,    8,    2,
         253,   33,  410,    4,  189,  512,   11,  831,   13,  119,    4,
         136,   54, 3509,    2,   26,  260,    6, 2711,    2,  731, 2599,
          15,    2,    2,   29,  166,  163,    2,  795,    2,  469,  198,
          24,    8,  135,   15,   50, 

[參考網站](https://github.com/fchollet/keras/blob/master/examples/imdb_cnn.py)