**Analyzing IMDB data in Keras**



In [0]:
import numpy as np
import keras
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, BatchNormalization
from keras.preprocessing.text import Tokenizer
%matplotlib inline

np.random.seed(5)

Using TensorFlow backend.


**Loading data from IMDB** <BR> Data is already preprocessed

In [0]:
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=2000)

print(x_train.shape)
print(y_train.shape)

Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
(25000,)
(25000,)


**text data is already in vector form** <BR>
In y_train, 1 stands for positive review in 0 for negative 

In [0]:
print(x_train[1])
print(y_train[1])


[1, 194, 1153, 194, 2, 78, 228, 5, 6, 1463, 2, 2, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 2, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2, 1523, 5, 647, 4, 116, 9, 35, 2, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 2, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 2, 5, 163, 11, 2, 2, 4, 1153, 9, 194, 775, 7, 2, 2, 349, 2, 148, 605, 2, 2, 15, 123, 125, 68, 2, 2, 15, 349, 165, 2, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 2, 228, 2, 5, 2, 656, 245, 2, 5, 4, 2, 131, 152, 491, 18, 2, 32, 2, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]
0


In [0]:
#One hot encoding the input data
tokenizer = Tokenizer(num_words=2000)
x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')

In [0]:
print(x_train[1])

[0. 1. 1. ... 0. 0. 0.]


In [0]:
#One hot encoding output data
num_classes = 2
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print(y_train.shape)
print(y_test.shape)
print(y_train[1])

(25000, 2)
(25000, 2)
[1. 0.]


**Building model**

In [0]:
#Building model architexture
model = Sequential()
model.add(Dense(512, activation='relu',input_dim=2000))
model.add(Dropout(0.8))
model.add(Dense(num_classes,activation='softmax',))
model.summary()

#Compliling model using categorical_crossentropy loss and rmsprop optimizer
model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])

Model: "sequential_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_55 (Dense)             (None, 512)               1024512   
_________________________________________________________________
dropout_30 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_56 (Dense)             (None, 2)                 1026      
Total params: 1,025,538
Trainable params: 1,025,538
Non-trainable params: 0
_________________________________________________________________


**Training Model**


In [0]:
trained = model.fit(x_train,y_train,batch_size=32,epochs=10,validation_data=(x_test,y_test),verbose=2)

Train on 25000 samples, validate on 25000 samples
Epoch 1/10
 - 8s - loss: 0.4188 - acc: 0.8192 - val_loss: 0.3133 - val_acc: 0.8763
Epoch 2/10
 - 6s - loss: 0.3492 - acc: 0.8707 - val_loss: 0.3503 - val_acc: 0.8673
Epoch 3/10
 - 6s - loss: 0.3383 - acc: 0.8786 - val_loss: 0.3255 - val_acc: 0.8793
Epoch 4/10
 - 6s - loss: 0.3290 - acc: 0.8865 - val_loss: 0.3317 - val_acc: 0.8794
Epoch 5/10
 - 6s - loss: 0.3261 - acc: 0.8876 - val_loss: 0.3517 - val_acc: 0.8693
Epoch 6/10
 - 6s - loss: 0.3237 - acc: 0.8883 - val_loss: 0.3310 - val_acc: 0.8792
Epoch 7/10
 - 6s - loss: 0.3186 - acc: 0.8906 - val_loss: 0.3337 - val_acc: 0.8778
Epoch 8/10
 - 6s - loss: 0.3207 - acc: 0.8911 - val_loss: 0.3416 - val_acc: 0.8792
Epoch 9/10
 - 6s - loss: 0.3214 - acc: 0.8897 - val_loss: 0.3370 - val_acc: 0.8787
Epoch 10/10
 - 6s - loss: 0.3247 - acc: 0.8908 - val_loss: 0.3459 - val_acc: 0.8761


In [0]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: ", score[1])

Accuracy:  0.87612
