<a href="https://colab.research.google.com/github/kumar4372/sentiment_analysis_hands_on/blob/master/(trainer)_Using_RNN_for_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sentiment Analysis Using Recurrent Neural Network**

---



## choose Hardware accelarator to "GPU" for faster computation. Go to "Runtime" -> "Change runtime type" to change it.


In this tutorial, we will use RNN/LSTM for sentiment analysis on movie review dataset.

**What is sentiment analysis?**

Sentiment Analysis is nothing but finding the sentiments of reviews whether it is positive or negative review.

**Example Code to refer**: https://keras.io/examples/nlp/bidirectional_lstm_imdb/

**Notes**
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.

**Importing Libraries**

We start by importing the required dependencies to preprocess our data and build our model.

In [1]:
# Import the dependencies
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, SimpleRNN,LSTM, GRU
from tensorflow.keras.preprocessing import sequence
print("Imported dependencies.")

Imported dependencies.


**Loading Data**

We will use IMDB sentiment classification dataset which consists of 50,000 movie reviews from IMDB users that are labeled as either positive (1) or negative (0). 

Continue downloading the IMDB dataset, which is, fortunately, already built into keras.

In [2]:
vocab_size = 10000

# Define the training and test dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)  # vocab_size is no. of words to consider from the dataset, ordering based on frequency.

print("Created test and training data.")
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

Created test and training data.
25000 train sequences
25000 test sequences


**Exploring the data**

You can see in the output above that the dataset is labeled into two categories, — 0 or 1, which represents the sentiment of the review. The whole dataset contains 9,998 unique words and the average review length is 234 words, with a standard deviation of 173 words.

In [3]:
import numpy as np

#concatenate whole data
data = np.concatenate((x_train, x_test), axis=0)
targets = np.concatenate((y_train, y_test), axis=0)

print("Categories:", np.unique(targets))
print("Number of unique words:", len(np.unique(np.hstack(data))))
len_sequence_list = [len(i) for i in data]
print("Average Review length:", np.mean(len_sequence_list))
print("Standard Deviation:", round(np.std(len_sequence_list)))

Categories: [0 1]
Number of unique words: 9998
Average Review length: 234.75892
Standard Deviation: 173


We should always check how balanced our training and test data is. This helps in deciding evaluation metrics and observing training progress as well. You will observe that both training and test data is perfectly balanced.

In [4]:
# since labels are only 0 and 1, np.sum will give you total number of examples with label 1.
print("percentage of test sequences with label 1 is", (np.sum(y_test)/len(y_test)*100))
print("percentage of train sequences with label 1 is", (np.sum(y_train)/len(y_train)*100))

percentage of test sequences with label 1 is 50.0
percentage of train sequences with label 1 is 50.0


You can see the first review of the dataset, which is labeled as positive (1). 

In [5]:
print('---review---')
print(x_train[0])
print('---label---')
print(y_train[0])

---review---
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
---label---
1


Now we try to map from word index to word so that we can read the reviews.
We replace every unknown word with a “#”. It does this by using the get_word_index() function.

In [6]:
index = imdb.get_word_index() # from word to index mapping
reverse_index = dict([(value, key) for (key, value) in index.items()]) # from index to word mapping

In [7]:
print(index['there']) # we get 47
print(reverse_index[47])

47
there


In [10]:
train_text = []
test_text = []
for i in range(0,len(x_train)):
  train_text.append(" ".join( [reverse_index.get(j - 3, "#") for j in x_train[i]] ))
for i in range(0,len(x_test)):
  test_text.append(" ".join( [reverse_index.get(j - 3, "#") for j in x_test[i]] ))
print(len(train_text),len(test_text))

25000 25000


In [11]:
print(train_text[0])

# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert # is an amazing actor and now the same being director # father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for # and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also # to the two little boy's that played the # of norman and paul they were just brilliant children are often left out of the # list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you thi

**Data Preparation**

Now it's time to prepare our data. 

As we know, each review consists of different number of words. Some reviews could even be of length 1. e.g. "nice"

We need to fix maximum length of our input sequence. 

In [12]:
from collections import Counter
count_length = Counter(len_sequence_list)
count_length[200]

142

Check how many sentences are of length less than x where x is any integer

In [14]:
sum([count_length[i] for i in range(300)]) # x is 300 here

38501

Here we consider maximum length of our input sequence to be 200.

**Please feel free to choose your own maximum length**

In [15]:
max_review_length = 200
x_train_padded = sequence.pad_sequences(x_train, maxlen=max_review_length)
x_test_padded = sequence.pad_sequences(x_test, maxlen=max_review_length)

To visualize how padding is happening, let's print a few examples of seqeunces for before and after padding


In [16]:
# for sequence length > maximum sequence length, we remove entries from the beginning. 
# If you want to remove entries from the end, you should use truncating='post' in pad_sequences in the above cell
i = 0
print(len(x_train[i]))
print(x_train[i])
print(len(x_train_padded[i]))
print(x_train_padded[i])

218
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
200
[   5   25  100   43  838  112   50  670    2    9  

In [17]:
# for sequence length < maximum sequence length, we put zeros in the beginning. 
# If you want to put zeros in the end, you should use padding='post' in pad_sequences in the above cell
i = 1
print(len(x_train[i]))
print(x_train[i])
print(len(x_train[i]))
print(x_train_padded[i])

189
[1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 8255, 2, 349, 2637, 148, 605, 2, 8003, 15, 123, 125, 68, 2, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 2, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 2, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]
189
[   0    0    0    0    0    0    0    0    0    0    0    1  194 1153
  194 8255   78  228    5    6 1463 4369 5012  134   26    4  715    8
  118 1634   14  3

**BUILDING AND TRAINING THE MODEL**

Now our data is ready for some modelling!

Deep learning models have layers.

The top layer takes in the data we've just prepared, the middle layers do some math on this data and the final layer produces an output we can hopefully make use of.

In our case, our model has three layers, 

1. Embedding layer
2. LSTM layer
3. Dense layer.

Our model begins with the line model = Sequential(). Think of this as simply stating "our model will flow from input to output layer in a sequential manner" or "our model goes one step at a time".

**Embedding layer**

The Embedding layer creates a database of the relationships between words.

model.add(Embedding(max_words, embedding_vector_length, input_length=max_review_length)) is saying: add an Embedding layer to our model and use it to turn each of our words into embedding_vector_length dimensional vector which have some mathematical relationship to each other.

So each of our words will become vectors of dimension embedding_vector_length.

For example, vector of "the" = [0.556433, 0.223122, 0.789654....].

Don't worry for now how this is computed, Keras does it for us.

**LSTM layer**

model.add(LSTM(128)) is saying: add a LSTM layer after our embedding layer in our model and give it 128 units.

**Dense layer**

model.add(Dense(1, activation='sigmoid')) is saying: add a Dense layer to the end of our model and use a sigmoid activation function to produce a meaningful output.

A dense layer is also known as a fully-connected layer. This layer connects the 128 LSTM units in the previous layer to 1 unit. This last unit them takes all this information and runs it through a sigmoid function.

**Please feel free to change the model architecture.**


In [18]:
# Define how long the embedding vector will be
embedding_vector_length = 128

# Define the layers in the model
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(128))
model.add(Dense(1, activation='sigmoid'))

print("Model created.")

Model created.


**Compiling the model**

Now we compile our model, which is nothing but configuring the model for training. We use the “adam” optimizer, an algorithm that changes the weights and biases during training. We also choose binary-crossentropy as loss (because we deal with binary classification) and accuracy as our evaluation metric.

In [19]:
model.compile(loss = 'binary_crossentropy', optimizer='adam',metrics = ['accuracy'])
print("Model compiled, ready to be fit to the training data.")

Model compiled, ready to be fit to the training data.


**Summarize the model**

Making a summary of the model will give us an idea of what's happening at each layer.

In the embedding layer, each of our words is being turned into a vector of dimension 128. Because there are 10000 words (max_words), there are 1,280,000 parameters (128 x 10000).

Parameters are individual pieces of information. The goal of the model is to take a large number of parameters and reduce them down to something we can understand and make use of (less parameters).

The LSTM layer reduces the number of parameters to 131584 = 4 × [128(128+128) + 128].

The final dense layer connects each of the outputs of the LSTM units into one cell (128 + 1).

In [20]:
# Summarize the different layers in the model
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 200, 128)          1280000   
_________________________________________________________________
lstm (LSTM)                  (None, 128)               131584    
_________________________________________________________________
dense (Dense)                (None, 1)                 129       
Total params: 1,411,713
Trainable params: 1,411,713
Non-trainable params: 0
_________________________________________________________________
None


**Fitting the model to the training data**

Now our model is compiled, it's ready to be set loose on our training data.

We'll be training for 3 epochs with a batch_size of 64.

Because of our loss and optimzation functions, the model accuracy should improve after each cycle.

model.fit(X_train, y_train, epochs=3, batch_size=64) is saying: fit the model we've built on the training dataset for 3 cycles and go over 64 reviews at a time.

I use test data as validation data. Use validation_split parameter in model.fit if you want to split training data into train and val.

**Please Feel free to change the number of epochs or batch_size**


In [21]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64,validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


It is time to evaluate our model:

In [22]:
loss, acc = model.evaluate(x_test_padded, y_test,
                            batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.37549883127212524
Test accuracy: 0.8643199801445007


Let's analyze the results by looking at some examples.

In [23]:
y_prob = model.predict(x_test_padded)

In [24]:
y_prob # probabilty scores

array([[0.03657438],
       [0.9958324 ],
       [0.3912676 ],
       ...,
       [0.01443178],
       [0.40329775],
       [0.98985064]], dtype=float32)

In [25]:
y_pred = np.round(y_prob)

In [27]:
y_pred.reshape([len(y_pred)]) # predicted labels

array([0., 1., 0., ..., 0., 0., 1.], dtype=float32)

In [28]:
y_pred = y_pred.reshape([len(y_pred)])

In [29]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # for better printing

In [30]:
results = pd.DataFrame({"review":test_text, "ground_truth":y_test, "prediction":y_pred})

In [31]:
results.head(5)

Unnamed: 0,review,ground_truth,prediction
0,# please give this one a miss br br # # and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite # so all you madison fans give this a miss,0,0.0
1,# this film requires a lot of patience because it focuses on mood and character development the plot is very simple and many of the scenes take place on the same set in frances # the sandy dennis character apartment but the film builds to a disturbing climax br br the characters create an atmosphere # with sexual tension and psychological # it's very interesting that robert altman directed this considering the style and structure of his other films still the trademark altman audio style is evident here and there i think what really makes this film work is the brilliant performance by sandy dennis it's definitely one of her darker characters but she plays it so perfectly and convincingly that it's scary michael burns does a good job as the mute young man regular altman player michael murphy has a small part the # moody set fits the content of the story very well in short this movie is a powerful study of loneliness sexual # and desperation be patient # up the atmosphere and pay attention to the wonderfully written script br br i praise robert altman this is one of his many films that deals with unconventional fascinating subject matter this film is disturbing but it's sincere and it's sure to # a strong emotional response from the viewer if you want to see an unusual film some might even say bizarre this is worth the time br br unfortunately it's very difficult to find in video stores you may have to buy it off the internet,1,1.0
2,# many animation buffs consider # # the great forgotten genius of one special branch of the art puppet animation which he invented almost single # and as it happened almost accidentally as a young man # was more interested in # than the cinema but his # attempt to film two # # fighting led to an unexpected breakthrough in film making when he realized he could # movement by # beetle # and # them one frame at a time this discovery led to the production of amazingly elaborate classic short the # revenge which he made in russia in # at a time when motion picture animation of all sorts was in its # br br the political # of the russian revolution caused # to move to paris where one of his first productions # was a dark political satire # known as # or the # who wanted a king a strain of black comedy can be found in almost all of films but here it is very dark indeed aimed more at grown ups who can appreciate the satirical aspects than children who would most likely find the climax # i'm middle aged and found it pretty # myself and indeed # of the film intended for english speaking viewers of the 1920s were given title cards filled with # and # in order to help # the sharp # of the finale br br our tale is set in a swamp the # # where the citizens are unhappy with their government and have called a special session to see what they can do to improve matters they decide to # # for a king the crowds are # animated in this opening sequence it couldn't have been easy to make so many frog puppets look alive simultaneously while # for his part is depicted as a # white # guy in the clouds who looks like he'd rather be taking a # when # sends them a tree like god who regards them the # decide that this is no improvement and demand a different king irritated # sends them a # br br delighted with this # looking new king who towers above them the # welcome him with a # of # dressed # the mayor steps forward to hand him the key to the # as # cameras record the event to everyone's horror the # promptly eats the mayor and then goes on a merry rampage # citizens at random a title card # reads news of the king's # throughout the kingdom when the now terrified # once more # # for help he loses his temper and # their community with lightning # the moral of our story delivered by a hapless frog just before he is eaten is let well enough alone br br considering the time period when this startling little film was made and considering the fact that it was made by a russian # at the height of that # country's civil war it would be easy to see this as a # about those events # may or may not have had # turmoil in mind when he made # but whatever # his choice of material the film stands as a # tale of universal # # could be the soviet union italy germany or japan in the 1930s or any country of any era that lets its guard down and is overwhelmed by # it's a fascinating film even a charming one in its macabre way but its message is no joke,1,0.0
3,# i generally love this type of movie however this time i found myself wanting to kick the screen since i can't do that i will just complain about it this was absolutely idiotic the things that happen with the dead kids are very cool but the alive people are absolute idiots i am a grown man pretty big and i can defend myself well however i would not do half the stuff the little girl does in this movie also the mother in this movie is reckless with her children to the point of neglect i wish i wasn't so angry about her and her actions because i would have otherwise enjoyed the flick what a number she was take my advise and fast forward through everything you see her do until the end also is anyone else getting sick of watching movies that are filmed so dark anymore one can hardly see what is being filmed as an audience we are # involved with the actions on the screen so then why the hell can't we have night vision,0,0.0
4,# like some other people wrote i'm a die hard mario fan and i loved this game br br this game starts slightly boring but trust me it's worth it as soon as you start your hooked the levels are fun and # they will hook you # your mind turns to # i'm not kidding this game is also # and is beautifully done br br to keep this spoiler free i have to keep my mouth shut about details but please try this game it'll be worth it br br story 9 9 action 10 1 it's that good # 10 attention # 10 average 10,1,1.0


## some samples of wrong predictions

In [32]:
results[results['ground_truth'] != results['prediction']].sample(5)

Unnamed: 0,review,ground_truth,prediction
11788,# once i heard that the greatest and oldest # # heroic poem was transformed into a film it almost became my obsession to see it the first # of its appearance i caught never disappointed me a futuristic interpretation with # our favourite # and tomb # to be in leading roles # appealing though some doubts came to life an important female character in beowulf two hours ago i saw the film after i had read the director's name my world fell apart as i said from that point on there was not many surprises first and foremost the film has nothing to do with the original beowulf if we disregard a couple of violently and # stolen names if they had not stolen the names and # it to be a new story it might have passed as an f class action stupidity with nice costumes and # this way it is simply a crime an attack on a legend and its ideology as well as on common sense ok let me be positive for a second apart from the general # # atmosphere which is nice it also has good music that was it for both the positive part and this comment,0,1.0
18008,# i saw this at the screening at # in # i had some time to kill and decided to check it out it played to about 1000 people in a packed standing room only ballroom br br wow what a ride the script was tight the action tense the pacing perfect the character exposition excellent one thing i really appreciated was that you knew going in that this wasn't a big budget film yet it soon became obvious that the creators pushed their sets and effects as far as they could despite their limitations and it was more than enough br br it's true that this film was targeted at a certain audience # # players the creators make no effort to hide that but other filmmakers could learn a lot from them for in going for the # in scene after scene and not worrying about if mom who happens to be watching will get it they got the biggest laughs time and time again but there's enough # there that mom will be laughing too even if she's not in on every joke i think too many times i see films that try so hard to lower the bar to the lowest common # so that they will appeal to the most people but the movie just ends up suffering for it br br but not this flick indeed this film was so solid that it had the audience wrapped around it's finger from the opening credits and while the viewers around me really wanted to like the film they weren't # # can be among the most critical # out there br br i'm so glad i got to see this in a big crowd at least 10 times the audience was having such a good time that they # into applause at a joke or scene during the film how often does that happen at # it should be no surprise that there was a huge standing # when the closing credits rolled br br for my own part i can't wait for this to be released after it ended one of the producers said they were shooting for a # tv dvd release that date cannot come soon enough,1,0.0
3101,# yeah i guess this movie is kinda dull compared to some of pam # other films the plot is overly familiar the dialog stilted and some of the acting isn't too good but it's worth seeing for the lengthy stretch near the end of the film where we see ms # in a sexy blue with the # half yeah it seems like a # point when discussing an actress of pam # talent but she also happens to be an extremely gorgeous woman and back in the day she had a body that wouldn't quit it's nice to see it being # in a tight rent the dvd and then tell me i'm wrong can't can you that's because you know i'm right and yes i really did give a 10 just for the scenes,1,0.0
22356,# this film was shot in randolph county in central north # in 1968 when a film crew in the state was a rare thing the locations were the of liberty and and the surrounding rural countryside it is not a particularly good movie it did have # # and it brought life to the # for a few minutes br br the plot is standard the cinematography is that fuzzy stuff that came out of the late sixties and early seventies the local folks were thrilled to be a part of the enterprise br br if viewers have difficulty finding a copy of this film a record copy is available in # br br actors not credited include ben jones # # tommy # bill #,0,1.0
19745,# this soap is worse than bad it's # of the many television shows that have had a # influence on british society over the past twenty years # is the prime example for two decades this show has celebrated the # the thug the wide boy the # the the violent the sexually the criminal the ignorant the br br how many times has someone or other # that # mirrors life life on which planet exactly br br it's written about working class characters as imagined by middle class people who have taken a course in creative writing eager to show to their middle class peers how familiar they are with the working class they dream up the # # that is the # of # br br this has a toxic effect on some minds less well # than others to handle fiction and so we find members of the real population assuming the attitudes and # of the inhabitants of br br thus it came to pass that # mirrors life but only after life had been # into # # br br other # have followed in footsteps filled to their # with ugly # faced # headed pot # characters # at each other and # # constantly this is the # as perceived by the writers who produce this trash the writers will grow rich on the proceeds of such # and will go on to enjoy the # things of life in their # meanwhile the # number of new tv induced # will proceed # toward cultural # br br and there you have the new priests and the new creatures of the early 21st century much of this is due to the # power of that # of dancing in the corner of your living room it's your fault gentle reader that's what you chose as the only window through which to look out from your prison,0,1.0


## some samples of correct predictions

In [33]:
results[results['ground_truth'] == results['prediction']].sample(5)

Unnamed: 0,review,ground_truth,prediction
4006,# i have never read sarah # book although i have not read the book the 3 hour movie is very interesting it begins with an interesting storyline with a twisted ending i have to say these 2 actresses are amazing sally # is stunning successfully portrayed the character in love with her mistress and betrayed by her love their romance slowly # as they spend more and more time together the love making scene is very tender and emotional well acted the end is quite intriguing and these 2 ended up together after all they have been thru which is a bless overall it is a great movie to see a very interesting plot with excellent performances,1,1.0
6226,# crazy six is torture it must be albert worst film even blast and # are better i # believe how boring this film is how this even got # i saw this movie about 3 years ago and the only thing i remember is how bad it was this # good bad movie it is simply bad bad bad bad bad movie br br 1 out of 10 # out of,0,0.0
12831,# this film for what it was set out to be succeeded it's a short tragic film although my choice of film are ones that really develop characters and their relationships this film is meant to just give a taste leaving you with the what happens next factor after watching it i really was wanting more more of the characters back story what influences they had to make them into the people they were i think thats what the makers intended the viewing audience to think the acting is amazing there aren't many lines in the film so their body language facial expressions and overall presence needed to be powerful enough to # a scene both franco and # have that element and it shows for them especially franco to take the time to make this obviously says they believed in this film and wanted to be apart of it and for that i appreciated the film for what it was also i'm happy i own it so i can share it with other people that would've never known it existed,1,1.0
9483,# for real though this game is where it's at i'm 20 years old and that's basically where it started for me 4 bit graphics was fabulous i hope you all remember this game with as much # as i do that # is a real,1,1.0
21712,# a rich old lady calls on a # # to woo a # away from her silly soon to be married # br br let us be gay is an interesting little domestic comedy which features some # dialogue courtesy of celebrated screenwriter frances marion good performances while perhaps a bit # at times this can probably be blamed on the difficulties with early sound technology which tended to limit action movement br br norma # can be credited with appearing in this minor film rather than using her # # as irving # # to insist upon only a grade pictures she is especially effective in her first few scenes where # flat makeup makes her almost # her extreme from # to # could only happen in hollywood but it's # # to spend much time worrying about that br br rod # doesn't come off too well as # # husband quite popular during silent days the # were not especially kind to him and his career would suffer here his role is not in the least sympathetic and one has to wonder what # # moves women to desire the # so much br br magnificent marie # is on hand as an eccentric long island # as a great friend of frances marion one can easily imagine that the part was written # for her full of # she is very humorous however the tremendous warmth essential goodness which would very shortly make her hollywood's biggest star are largely missing br br among the supporting cast hopper scores as a # society # as does # playing a comic butler movie # will spot little # moore as # young son elderly mary gordon as her # both uncredited,1,1.0


## Below we have some more sample architectures you can try !!

**Extensions**

Let us use LSTM variants. We use check the accuracy by replacing LSTM cell with GRU cell.

In [56]:
# Define the layers in the model
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
#model.add(LSTM(128))
model.add(GRU(32,dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

#num_params_layer 3 × [h(h+i) + h]  = 3 × [32(32+64) + 32] = 9312

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                15552     
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 33        
Total params: 1,295,585
Trainable params: 1,295,585
Non-trainable params: 0
_________________________________________________________________
None


In [57]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [58]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64,validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [59]:
loss, acc = model.evaluate(x_test_padded, y_test,
                            batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.34270304441452026
Test accuracy: 0.8547999858856201


**Using LSTM stack layers**

In [46]:
model= Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(units=64, return_sequences=True))
model.add(LSTM(units=64, return_sequences=True))
model.add(LSTM(units=4))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
lstm_5 (LSTM)                (None, 200, 64)           49408     
_________________________________________________________________
lstm_6 (LSTM)                (None, 200, 64)           33024     
_________________________________________________________________
lstm_7 (LSTM)                (None, 4)                 1104      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 5         
Total params: 1,363,541
Trainable params: 1,363,541
Non-trainable params: 0
_________________________________________________________________
None


In [47]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [49]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64,validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [51]:
loss, acc = model.evaluate(x_test_padded, y_test, batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.4700876474380493
Test accuracy: 0.7819600105285645


**Using Simple RNN**

In [52]:
# Define the layers in the model
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
#model.add(LSTM(128))
model.add(SimpleRNN(128,dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, 128)               32896     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 129       
Total params: 1,313,025
Trainable params: 1,313,025
Non-trainable params: 0
_________________________________________________________________
None


In [53]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [54]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64, validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [55]:
loss, acc = model.evaluate(x_test_padded, y_test,
                            batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.492692232131958
Test accuracy: 0.7704799771308899
