# IMDB Movie Review Sentiment Analysis

## Problem Statement

The data contains highly popular movie reviews on IMDB and was collected by Stanford researchers for a 2011 paper *(Maas et al.)*. It contains 50,000 movie reviews (positive or negative) for training and the same amount again for testing.

The problem is to determine whether a given movie review has a positive or negative sentiment.

Data Source: http://ai.stanford.edu/~amaas/data/sentiment/ *(Maas et al.)*

In [1]:
# import libraries
import numpy as np
from keras.datasets import imdb
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

Using TensorFlow backend.


In [2]:
# fix random seed for reproducibility
seed = 13
np.random.seed(seed)

In [3]:
# load data (but only keep the top n words, zero the rest)
top_words = 5000
(xTrain, yTrain), (xTest, yTest) = imdb.load_data(num_words = top_words)
x = np.concatenate((xTrain, xTest), axis = 0)
y = np.concatenate((yTrain, yTest), axis = 0)

In [4]:
# summarize
print(x.shape, y.shape)
print(np.unique(y))
print(f'Number of words: {len(np.unique(np.hstack(x)))}')

(50000,) (50000,)
[0 1]
Number of words: 4998


In [5]:
# truncate and pad input sequences to make them all the same length for modeling
max_review_length = 500
xTrain = sequence.pad_sequences(xTrain, maxlen = max_review_length)
xTest = sequence.pad_sequences(xTest, maxlen = max_review_length)

In [6]:
# create and fit LSTM network
embedding_vector_length = 32
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length = max_review_length))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.summary()
model.fit(xTrain, yTrain, epochs = 6, batch_size = 64)

W0119 23:39:06.076373 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0119 23:39:06.108290 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0119 23:39:06.121254 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0119 23:39:06.180097 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default 

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
dropout_1 (Dropout)          (None, 500, 32)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dropout_2 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________


W0119 23:39:07.514873 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

W0119 23:39:07.648516 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

W0119 23:39:07.867930 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:2741: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0119 23:39:07.875908 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

W0119 23:39:07.876914 14908

Epoch 1/6


W0119 23:39:08.192068 14908 module_wrapper.py:139] From C:\Users\e93689\AppData\Roaming\Python\Python36\site-packages\keras\backend\tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.



Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


<keras.callbacks.History at 0x1a772c03d68>

In [7]:
# evaluate the model
scores = model.evaluate(xTest, yTest, verbose = 0)
print(f'{model.metrics_names[1]}: {scores[1]*100}')

acc: 84.108


## References 

Brownlee J. 2018. Deep Learning with Python. v1.14

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C., 2011. Learning Word Vectors for Sentiment Analysis. Association for Computational Linguistics, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies 142–150.