# Sentiment Analysis of IMDB Reviews
###### By: MBA(BA) - Group 5
- The IMDB dataset consists of 50,000 movie reviews and its sentiment labels (positive or negative). We now proceed to build a model that predicts whether a review is positive or negative.

## Importing Libraries and the Dataset
- We now import all the required libraries and import the IMDB dataset which is present in the Keras Package itself. It consists of both the reviews and their corresponding labels.

In [1]:
import numpy
from tensorflow import keras
from keras.datasets import imdb
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dropout, Activation, Embedding, Convolution1D, MaxPooling1D, Input, Dense, add, \
                         BatchNormalization, Flatten, Reshape, Concatenate
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.recurrent import LSTM, GRU
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

- We now load the dataset into train and test sets

In [2]:
(X_train, y_train), (X_test, y_test) = imdb.load_data()

- We check for the shape of the train and test data for our verification.

In [3]:
print("train data")
print(X_train.shape)
print(y_train.shape)
print("\ntest data")
print(X_test.shape)
print(y_test.shape)

train data
(25000,)
(25000,)

test data
(25000,)
(25000,)


- We now check for the classes present in both our test and train datasets.

In [4]:
print("Classes in Train Set ")
print(numpy.unique(y_train))
print("Classes in Test Set ")
print(numpy.unique(y_test))

Classes in Train Set 
[0 1]
Classes in Test Set 
[0 1]


- Let us now check for a sample from the x_train dataset and the corresponding label in y_train.

In [5]:
print(X_train[0],"\n")
print(y_train[0])

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 22665, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 21631, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 31050, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32] 

1


##### Note that the Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). Words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data.
- From the sample we can see the encoded review and the corresponding y_train value which is 1. We are still unsure whether the corresponding 1 indicates whether it's a positive or a negative review.

- To address this issue, we now proceed to try and retrieve the word index file mapping the words to the indices using get_word_index()

In [6]:
word_index = keras.datasets.imdb.get_word_index()

- We then reverse the word index to obtain a dictionary mapping the indices to word. This will make it easy for us to decode the sequences. We will only display 20 items from the dictionary for tidiness.

In [7]:
inverted_word_index = dict((i, word) for (word, i) in word_index.items())
list(inverted_word_index.items())[:20]

[(34701, 'fawn'),
 (52006, 'tsukino'),
 (52007, 'nunnery'),
 (16816, 'sonja'),
 (63951, 'vani'),
 (1408, 'woods'),
 (16115, 'spiders'),
 (2345, 'hanging'),
 (2289, 'woody'),
 (52008, 'trawling'),
 (52009, "hold's"),
 (11307, 'comically'),
 (40830, 'localized'),
 (30568, 'disobeying'),
 (52010, "'royale"),
 (40831, "harpo's"),
 (52011, 'canet'),
 (19313, 'aileen'),
 (52012, 'acurately'),
 (52013, "diplomat's")]

- We have now obtained a dictionary mapping the indices to the word. The key is the overall frequency of the word which is the corresponding value in the dictionary.
- We now sort the key value pairs (order of frequency) in the dictionary for better understanding. Note that we will display only 200 sorted key value pairs for tidiness.

In [8]:
for i in sorted (inverted_word_index) :
    if i > 200:
        break
    print ((i, inverted_word_index[i]), end =" ")

(1, 'the') (2, 'and') (3, 'a') (4, 'of') (5, 'to') (6, 'is') (7, 'br') (8, 'in') (9, 'it') (10, 'i') (11, 'this') (12, 'that') (13, 'was') (14, 'as') (15, 'for') (16, 'with') (17, 'movie') (18, 'but') (19, 'film') (20, 'on') (21, 'not') (22, 'you') (23, 'are') (24, 'his') (25, 'have') (26, 'he') (27, 'be') (28, 'one') (29, 'all') (30, 'at') (31, 'by') (32, 'an') (33, 'they') (34, 'who') (35, 'so') (36, 'from') (37, 'like') (38, 'her') (39, 'or') (40, 'just') (41, 'about') (42, "it's") (43, 'out') (44, 'has') (45, 'if') (46, 'some') (47, 'there') (48, 'what') (49, 'good') (50, 'more') (51, 'when') (52, 'very') (53, 'up') (54, 'no') (55, 'time') (56, 'she') (57, 'even') (58, 'my') (59, 'would') (60, 'which') (61, 'only') (62, 'story') (63, 'really') (64, 'see') (65, 'their') (66, 'had') (67, 'can') (68, 'were') (69, 'me') (70, 'well') (71, 'than') (72, 'we') (73, 'much') (74, 'been') (75, 'bad') (76, 'get') (77, 'will') (78, 'do') (79, 'also') (80, 'into') (81, 'people') (82, 'other') (8

- We can now see the most frequent words in the review at the top and the least frequent ones at the bottom. "The" is the most frequent word used in the reviews according to its index followed by "and", "a", etc.

##### Let us now proceed to decode the sample that we had earlier selected: X_train[0]

- We also need to keep in mind that the indices are off by 3 as 0,1, and 2 are specially reserved for padding, start of sequence and unknown. This is done in most of the encoded data where words are indexed by frequency.

In [9]:
decoded_sequence = " ".join(inverted_word_index.get(i-3, '?') for i in X_train[0])
decoded_sequence

"? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the part's of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should b

- After reading the review, we can see that it is a positive one and hence 1 indicates a positive review. We can confirm the same with one of the negative reviews.

In [10]:
print(X_train[500],"\n")
print(y_train[500])

[1, 5, 198, 138, 254, 8, 967, 10, 10, 39, 4, 1158, 213, 7, 650, 7660, 1475, 213, 7, 650, 13, 215, 135, 13, 1583, 754, 2359, 133, 252, 50, 9, 49, 1104, 136, 32, 4, 1109, 304, 133, 1812, 21, 15, 191, 607, 4, 910, 552, 7, 229, 5, 226, 20, 198, 138, 10, 10, 241, 46, 7, 158] 

0


In [11]:
decoded_sequence = " ".join(inverted_word_index.get(i-3, '?') for i in X_train[500])
decoded_sequence

"? and that's why hard to rate br br from the adult point of view hmm student point of view i must say i fell nearly asleep here sure there is some laughing scene all the credit takes here eddie but that can't save the disney type of script and whole movie that's why br br 2 out of 10"

- We can clearly see that this is a negative review and hence the corresponding value of y_train is 0.

##### Therefore 0 indicates a negative review and 1 indicates a positive review.

- We now check for the total number of unique words

In [12]:
print("Total number of words: ")
print(len(numpy.unique(numpy.hstack(X_train))))

Total number of words: 
88585


- We can see that there are almost 89000 words in total. We now check for average review length and also the standard deviation of the length.

In [13]:
review_length=[len(rev) for rev in X_train]
print("Mean Length of Reviews:")
print(numpy.mean(review_length))
print("Standard Deviation of Reviews:")
print(numpy.std(review_length))

Mean Length of Reviews:
238.71364
Standard Deviation of Reviews:
176.49367364852034


In [14]:
print(numpy.mean(review_length) + numpy.std(review_length))

415.2073136485203


- The sum of the mean and the standard deviation in our case is 415, this tells us the average number of words in a majority of the reviews. To not miss out on much information in the reviews, we would consider 450 to be the maximum review length. Any review less than 450 would be padded with 0. Not padding would result in input reviews of variable lengths.

## Developing a Baseline Model - (Single 1D - CNN)
Since the data has already been prepared and pre-processed we proceed to develop a baseline model.<br>
- We would also be reloading the imdb dataset into the train and test sets with a limit of 10000 words, i.e. selecting the top occurring 10000 words in the reviews. This would eliminate the less frequent words that aren't strong enough to change the classification and also reduce the computational requirements.

In [15]:
most_common = 10000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=10000)

- As mentioned earlier, we would now set a limit for the length of the reviews to 450 words.

In [16]:
# pad dataset to a maximum review length in words
pad = 450
X_train = sequence.pad_sequences(X_train, maxlen=pad)
X_test = sequence.pad_sequences(X_test, maxlen=pad)

The Baseline model will consist of:
- An embedding layer which converts the integer representation of the words into word embeddings. We mention the maximum vocabulary size (10000 in our case) and also mention the input length (which in our case is 450).
- One 1D Convolutional layer having 32 filters with a filter size of 3, we use "same" padding so that the input and the output has the same dimensions.
- followed by a maxpooling layer
- and then flatten the output of the pooling layer to give us a long vector
- We then add a fully connected dense layer with 128 neurons
- The last layer is the sigmoid layer which gives us the output between 0 & 1

All the layers other than the last layer will be using the ReLU activation function.

In [17]:
# model
model = Sequential()

# Embedding
model.add(Embedding(most_common, 32, input_length=pad))

# First Convolution1D Layer
model.add(Conv1D(32, kernel_size=3,
                 padding='same', activation='relu'))

model.add(MaxPooling1D(pool_size=2))

# flatten and put a fully connected layer
model.add(Flatten())
model.add(Dense(128, activation='relu'))

# sigmoid
model.add(Dense(1, activation='sigmoid'))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 450, 32)           320000    
                                                                 
 conv1d (Conv1D)             (None, 450, 32)           3104      
                                                                 
 max_pooling1d (MaxPooling1D  (None, 225, 32)          0         
 )                                                               
                                                                 
 flatten (Flatten)           (None, 7200)              0         
                                                                 
 dense (Dense)               (None, 128)               921728    
                                                                 
 dense_1 (Dense)             (None, 1)                 129       
                                                        

- The above model summary tells us about the total number of parameters that are to be trained and also the output dimensions after each layer.
- The none in the output shape represents the batch size.

### Fitting and evaluating the baseline model
- We now proceed to compile the model. The loss function to be optimised is binary_crossentropy.
- The optimizer which we will be using is ADAM.
- The metric which we would use to evaluate is Accuracy.

In [20]:
model.compile(loss='binary_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])

- We first specify few variables such as batch size and epochs.

In [18]:
# batch size, number of classes, epochs
batch_size = 128
epochs = 12

- We proceed to fit the model using the x_train and y_train.

In [21]:
 model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(X_test, y_test))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x24829e48888>

- We can see that the baseline model returns good train and validation accuracies.

###### We now evaluate the baseline model on test data

In [22]:
model.evaluate(X_test, y_test)



[0.9021984934806824, 0.8704800009727478]

- The accuracy is approximately 87%. 

## Developing an Improved Model - (Double 1D - CNN)
We now go ahead and check if it is possible to build an improved model that predicts the sentiment of the reviews with even higher accuracy.
<br><br>
We introduce another 1D Convolution Layer having 64 filters in this model.
<br><br>
All the layers other than the last layer will be using the ReLU activation function.

In [23]:
# model
model1 = Sequential()

# Embedding
model1.add(Embedding(most_common, 32, input_length=pad))

# First Convolution1D Layer
model1.add(Conv1D(32, kernel_size=3, padding='same', activation='relu'))

# Second Convolution1D Layer
model1.add(Conv1D(64, kernel_size=3, padding='same', activation='relu'))

model1.add(MaxPooling1D(pool_size=2))

# flatten and put a fully connected layer
model1.add(Flatten())
model1.add(Dense(128, activation='relu'))

#sigmoid layer
model1.add(Dense(1, activation='sigmoid'))

model1.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 450, 32)           320000    
                                                                 
 conv1d_1 (Conv1D)           (None, 450, 32)           3104      
                                                                 
 conv1d_2 (Conv1D)           (None, 450, 64)           6208      
                                                                 
 max_pooling1d_1 (MaxPooling  (None, 225, 64)          0         
 1D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 14400)             0         
                                                                 
 dense_2 (Dense)             (None, 128)               1843328   
                                                      

- The above model summary tells us about the total number of parameters that are to be trained and also the output dimensions after each layer.
- The none in the output shape represents the batch size.

### Fitting and evaluating the first improved model
- We now proceed to compile the model. The loss function to be optimised is binary_crossentropy.
- The optimizer which we will be using is ADAM.
- The metric which we would use to evaluate is Accuracy.

In [24]:
model1.compile(loss='binary_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])

- We proceed to fit the first improved model using the x_train and y_train.

In [25]:
 model1.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=12,
          verbose=1,
          validation_data=(X_test, y_test))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x248301ee1c8>

- We can see that the improved model has similar training and validation accuracy as our previous model.

###### We now evaluate the first improved model on test data

In [26]:
model1.evaluate(X_test, y_test)



[0.9787335395812988, 0.8580399751663208]

- The accuracy very slightly reduces but is similar to that of our previous model.

## Developing a Second Improved Model - (LSTM 1D - CNN)
We now go ahead and check if it is possible to build another improved model that predicts the sentiment of the review with even higher accuracy.
<br><br>
We remove the second 1D Convolution Layer and add a LSTM Layer with 128 neurons and a dropout of 0.2 in this model.
<br><br>
All the layers other than the last layer will be using the ReLU activation function.

In [30]:
# model
model2 = Sequential()

# Embedding
model2.add(Embedding(most_common, 32, input_length=pad))

# First Convolution1D Layer
model2.add(Conv1D(32, kernel_size=3, padding='same', activation='relu'))

model2.add(MaxPooling1D(pool_size=2))

# LSTM Layer with Dropout
model2.add(LSTM(128,dropout=0.2))
model2.add(Dropout(0.25))

# flatten and put a fully connected layer
model2.add(Flatten())
model2.add(Dense(128, activation='relu'))

#sigmoid layer
model2.add(Dense(1, activation='sigmoid'))

model2.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 450, 32)           320000    
                                                                 
 conv1d_5 (Conv1D)           (None, 450, 32)           3104      
                                                                 
 max_pooling1d_3 (MaxPooling  (None, 225, 32)          0         
 1D)                                                             
                                                                 
 lstm_1 (LSTM)               (None, 128)               82432     
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 flatten_3 (Flatten)         (None, 128)               0         
                                                      

- The above model summary tells us about the total number of parameters that are to be trained and also the output dimensions after each layer.
- The none in the output shape represents the batch size.

### Fitting and evaluating the second improved model
- We now proceed to compile the model. The loss function to be optimised is binary_crossentropy.
- The optimizer which we will be using is ADAM.
- The metric which we would use to evaluate is Accuracy.

In [31]:
model2.compile(loss='binary_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])

- We proceed to fit the second improved model using the x_train and y_train.

In [32]:
 model2.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=12,
          verbose=1,
          validation_data=(X_test, y_test))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x248301f1108>

- We can see that this model also has similar train and validation accuracy as our previous models.

###### We now evaluate the second improved model on test data

In [33]:
model2.evaluate(X_test, y_test)



[0.7024185061454773, 0.8621199727058411]

- The accuracy is similar to that of our previous models.

## Developing a Third Improved Model - (GRU 1D - CNN)
We now go ahead and check if it is possible to build another improved model that predicts the sentiment of the review with even higher accuracy.
<br><br>
We remove LSTM Layer and add a GRU Layer with 128 neurons and a dropout of 0.2 in this model.
<br><br>
All the layers other than the last layer will be using the ReLU activation function.

In [34]:
# model
model3 = Sequential()

# Embedding
model3.add(Embedding(most_common, 32, input_length=pad))

# First Convolution1D Layer
model3.add(Conv1D(32, kernel_size=3, padding='same', activation='relu'))

model3.add(MaxPooling1D(pool_size=2))

# GRU Layer with Dropout
model3.add(GRU(128,dropout=0.2))
model3.add(Dropout(0.25))

# flatten and put a fully connected layer
model3.add(Flatten())
model3.add(Dense(128, activation='relu'))

#sigmoid layer
model3.add(Dense(1, activation='sigmoid'))

model3.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, 450, 32)           320000    
                                                                 
 conv1d_6 (Conv1D)           (None, 450, 32)           3104      
                                                                 
 max_pooling1d_4 (MaxPooling  (None, 225, 32)          0         
 1D)                                                             
                                                                 
 gru (GRU)                   (None, 128)               61824     
                                                                 
 dropout_2 (Dropout)         (None, 128)               0         
                                                                 
 flatten_4 (Flatten)         (None, 128)               0         
                                                      

- The above model summary tells us about the total number of parameters that are to be trained and also the output dimensions after each layer.
- The none in the output shape represents the batch size.

### Fitting and evaluating the third improved model
- We now proceed to compile the model. The loss function to be optimised is binary_crossentropy.
- The optimizer which we will be using is ADAM.
- The metric which we would use to evaluate is Accuracy.

In [35]:
model3.compile(loss='binary_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])

- We proceed to fit the third improved model using the x_train and y_train.

In [36]:
 model3.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=12,
          verbose=1,
          validation_data=(X_test, y_test))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x2483a5289c8>

- We can see that this model also has similar train and validation accuracy as our previous models.

###### We now evaluate the third improved model on test data

In [37]:
model3.evaluate(X_test, y_test)



[0.798789918422699, 0.8645200133323669]

- The accuracy is similar to that of our previous models.

##### It can be observed that all the models are similar in performance in terms of accuracy. The simple 1D CNN model, however, was the fastest to train. The accuracy for this model also seemed minutely more than that of the other models. We would hence consider this as the finalised model.

## Making Predictions using the finalized model
- We now test the model on the complete Imdb review data instead of dividing it into test and train sets to get an idea of how the model performs.
- We will need to concatenate the test and train datasets since load_data gives 4 outputs. We also need to configure the maximum number of frequent words and the number of words in a review.

In [43]:
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=10000)
# pad dataset to a maximum review length in words
pad = 450
X_train = sequence.pad_sequences(X_train, maxlen=pad)
X_test = sequence.pad_sequences(X_test, maxlen=pad)
X = numpy.concatenate((X_train, X_test), axis=0)
y = numpy.concatenate((y_train, y_test), axis=0)

- We check for the shape of the x and y data for our verification.

In [44]:
print("\n test data")
print(X.shape)
print(y.shape)


 test data
(50000, 450)
(50000,)


###### We now evaluate the finalised 1D CNN model on the complete dataset

In [45]:
model.evaluate(X, y)



[0.4511488676071167, 0.9352399706840515]

- The final accuracy is near 93%.

### Making Predictions
- We predict the sentiment for a few reviews manually using the model.
- We now load a decoded review.

In [62]:
decoded_sequence = " ".join(inverted_word_index.get(i-3, '?') for i in X[1234])
decoded_sequence

"? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? this is exactly the reason why many people remain homeless because stupid producers pay their money to make awful films like this instead of ? if they can bother br br this film is even worse than white chicks little man has a lame excuse for posing a character midget as a baby story is awful considering it was written by six people the idea still wouldn't be too bad though if it was original and not a rip off of a cartoon episode it has funny moments but some of them are way over done and some are just stupid the a

- The question marks represent the padding added to match the max length 450 words.
- On reading, we can understand the above is a negative review. Let us now check what the model predicts for the same sample. We first predict for all the entire X.

In [66]:
y_pred = model.predict(X)

- The predicted label of the sample review:

In [67]:
y_pred[1234]

array([1.2858462e-19], dtype=float32)

- We can see that the predicted value is ~0 which tells us that the review is negative.
- We now confirm it with the actual label.

In [63]:
print(y[1234])

0


- We now check for the predictions of another sample

In [85]:
decoded_sequence = " ".join(inverted_word_index.get(i-3, '?') for i in X[12345])
decoded_sequence

"? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? in the sea of crap that hollywood and others continue to put out this is one of those ? in the rough a small simple movie that is very entertaining and leaves you with the feeling that you didn't just waste an hour and a half of your life br br ashley judd is really quite amazing in this movie i had never really been a fan or had noticed her before but going back and seeing this early performance of hers convinced me she's extrem

- On reading, we can understand the above is a positive review. Let us now check what the model predicts for the same sample.

In [87]:
y_pred[12345]

array([0.9989635], dtype=float32)

- We can see that the predicted value is ~1 which tells us that the review is positive.
- We now confirm it with the actual label.

In [86]:
print(y[12345])

1


###### The end