<a href="https://colab.research.google.com/github/kumar4372/sentiment_analysis_hands_on/blob/master/(trainer)_Using_RNN_for_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sentiment Analysis Using Recurrent Neural Network**

---



## choose Hardware accelarator to "GPU" for faster computation. Go to "Runtime" -> "Change runtime type" to change it.


In this tutorial, we will use RNN for sentiment analysis task on movie review dataset.

**What is sentiment analysis?**

Sentiment Analysis is nothing but finding the sentiments of reviews whether it is positive or negative review.

**Example Code to refer**: https://keras.io/examples/nlp/bidirectional_lstm_imdb/

**Notes**
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.

**Importing Libraries**

We start by importing the required dependencies to preprocess our data and build our model.

In [1]:
# Import the dependencies
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, SimpleRNN,LSTM, GRU
from tensorflow.keras.preprocessing import sequence

# import numpy as np
# from tensorflow import keras
# from tensorflow.keras import layers


print("Imported dependencies.")


Imported dependencies.


**Loading Data**

We will use IMDB sentiment classification dataset which consists of 50,000 movie reviews from IMDB users that are labeled as either positive (1) or negative (0). 

Continue downloading the IMDB dataset, which is, fortunately, already built into Keras.

In [2]:
vocab_size = 10000

# Define the training and test dataset
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)  # vocab_size is no. of words to consider from the dataset, ordering based on frequency.

print("Created test and training data.")
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

Created test and training data.
25000 train sequences
25000 test sequences


**Exploring the data**

You can see in the output above that the dataset is labeled into two categories, — 0 or 1, which represents the sentiment of the review. The whole dataset contains 9,998 unique words and the average review length is 234 words, with a standard deviation of 173 words.

In [3]:
import numpy as np

#concatenate whole data
data = np.concatenate((x_train, x_test), axis=0)
targets = np.concatenate((y_train, y_test), axis=0)

print("Categories:", np.unique(targets))
print("Number of unique words:", len(np.unique(np.hstack(data))))
len_sequence_list = [len(i) for i in data]
print("Average Review length:", np.mean(len_sequence_list))
print("Standard Deviation:", round(np.std(len_sequence_list)))

Categories: [0 1]
Number of unique words: 9998
Average Review length: 234.75892
Standard Deviation: 173


We should always check how balanced our training and test data is. This helps in deciding evaluation metrics and observing training progress as well. You will observe that both training and test data is perfectly balanced.

In [4]:
# labels are only 0 and 1, np.sum will give you total number of examples with label 1.
print("percentage of test sequences with label 1 is", (np.sum(y_test)/len(y_test)*100))
print("percentage of train sequences with label 1 is", (np.sum(y_train)/len(y_train)*100))

percentage of test sequences with label 1 is 50.0
percentage of train sequences with label 1 is 50.0


You can see the first review of the dataset, which is labeled as positive (1). 

In [5]:
print('---review---')
print(x_train[0])
print('---label---')
print(y_train[0])

---review---
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
---label---
1


Above you can see the first review of the dataset, which is labeled as positive (1). The code below retrieves the dictionary mapping word indices back into the original words so that we can read them. It replaces every unknown word with a “#”. It does this by using the get_word_index() function.

In [6]:
index = imdb.get_word_index()
train_text = []
test_text = []
reverse_index = dict([(value, key) for (key, value) in index.items()]) 
for i in range(0,len(x_train)):
  train_text.append(" ".join( [reverse_index.get(i - 3, "#") for i in x_train[i]] ))
for i in range(0,len(x_test)):
  test_text.append(" ".join( [reverse_index.get(i - 3, "#") for i in x_train[i]] ))
print(len(train_text),len(test_text))

25000 25000


In [7]:
print(train_text[0])

# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert # is an amazing actor and now the same being director # father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for # and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also # to the two little boy's that played the # of norman and paul they were just brilliant children are often left out of the # list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you thi

**Data Preparation**

Now it's time to prepare our data. 

As we know, each review consists of different number of words. Some reviews could even be one word long. e.g. "nice"

We need to fix maximum length of our input sequence. 

In [8]:
from collections import Counter
count_length = Counter(len_sequence_list)
count_length[200]

142

Check how many sentences are of length less than x.

In [9]:
sum([count_length[x] for x in range(300)]) # x is 300 here

38501

Here we consider maximum length of our input sequence to be 200. pad_sequences will add 0's to any reviews which don't have a length of 200.

For example, our one word review above would become: "index(nice) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 199 times"

The same goes for any reviews longer than 200 tokens, they will be truncated to a maximum of 200. Feel free to choose your own maximum length

In [10]:
max_review_length = 200
x_train_padded = sequence.pad_sequences(x_train, maxlen=max_review_length)
x_test_padded = sequence.pad_sequences(x_test, maxlen=max_review_length)

To visualize how padding is happening, let's print a few examples of seqeunces for before and after padding


In [11]:
# for sequence length > maximum sequence length, we remove entries from the beginning. 
# If you want to remove entries from the end, you should use truncating='post' in pad_sequences in above cell
i = 0
print(len(x_train[i]))
print(x_train[i])
print(len(x_train[i]))
print(x_train[i])

218
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
218
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4

In [12]:
# for sequence length < maximum sequence length, we put zeros in the beginning. 
# If you want to put zeros in the end, you should use padding='post' in pad_sequences in above cell
i = 1
print(len(x_train[i]))
print(x_train[i])
print(len(x_train[i]))
print(x_train_padded[i])

189
[1, 194, 1153, 194, 8255, 78, 228, 5, 6, 1463, 4369, 5012, 134, 26, 4, 715, 8, 118, 1634, 14, 394, 20, 13, 119, 954, 189, 102, 5, 207, 110, 3103, 21, 14, 69, 188, 8, 30, 23, 7, 4, 249, 126, 93, 4, 114, 9, 2300, 1523, 5, 647, 4, 116, 9, 35, 8163, 4, 229, 9, 340, 1322, 4, 118, 9, 4, 130, 4901, 19, 4, 1002, 5, 89, 29, 952, 46, 37, 4, 455, 9, 45, 43, 38, 1543, 1905, 398, 4, 1649, 26, 6853, 5, 163, 11, 3215, 2, 4, 1153, 9, 194, 775, 7, 8255, 2, 349, 2637, 148, 605, 2, 8003, 15, 123, 125, 68, 2, 6853, 15, 349, 165, 4362, 98, 5, 4, 228, 9, 43, 2, 1157, 15, 299, 120, 5, 120, 174, 11, 220, 175, 136, 50, 9, 4373, 228, 8255, 5, 2, 656, 245, 2350, 5, 4, 9837, 131, 152, 491, 18, 2, 32, 7464, 1212, 14, 9, 6, 371, 78, 22, 625, 64, 1382, 9, 8, 168, 145, 23, 4, 1690, 15, 16, 4, 1355, 5, 28, 6, 52, 154, 462, 33, 89, 78, 285, 16, 145, 95]
189
[   0    0    0    0    0    0    0    0    0    0    0    1  194 1153
  194 8255   78  228    5    6 1463 4369 5012  134   26    4  715    8
  118 1634   14  3

**BUILDING AND TRAINING THE MODEL**

Now our data is ready for some modelling!

Deep learning models have layers.

The top layer takes in the data we've just prepared, the middle layers do some math on this data and the final layer produces an output we can hopefully make use of.

In our case, our model has three layers, 

1. Embedding layer
2. LSTM layer
3. Dense layer.

Please feel free to change the model architecture.

Our model begins with the line model = Sequential(). Think of this as simply stating "our model will flow from input to output layer in a sequential manner" or "our model goes one step at a time".

**Embedding layer**

The Embedding layer creates a database of the relationships between words.

model.add(Embedding(max_words, embedding_vector_length, input_length=max_review_length)) is saying: add an Embedding layer to our model and use it to turn each of our words into embedding_vector_length dimensional vector which have some mathematical relationship to each other.

So each of our words will become vectors of dimension embedding_vector_length.

For example, vector of "the" = [0.556433, 0.223122, 0.789654....].

Don't worry for now how this is computed, Keras does it for us.

**LSTM layer**

model.add(LSTM(128)) is saying: add a LSTM layer after our embedding layer in our model and give it 128 units.

**Dense layer**

model.add(Dense(1, activation='sigmoid')) is saying: add a Dense layer to the end of our model and use a sigmoid activation function to produce a meaningful output.

A dense layer is also known as a fully-connected layer. This layer connects the 128 LSTM units in the previous layer to 1 unit. This last unit them takes all this information and runs it through a sigmoid function.

In [27]:
# Define how long the embedding vector will be
embedding_vector_length = 128

# Define the layers in the model
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(128))
model.add(Dense(1, activation='sigmoid'))

print("Model created.")

Model created.


**Compiling the model**

Now we compile our model, which is nothing but configuring the model for training. We use the “adam” optimizer, an algorithm that changes the weights and biases during training. We also choose binary-crossentropy as loss (because we deal with binary classification) and accuracy as our evaluation metric.

In [28]:
model.compile(loss = 'binary_crossentropy', optimizer='adam',metrics = ['accuracy'])
print("Model compiled, ready to be fit to the training data.")

Model compiled, ready to be fit to the training data.


**Summarize the model**

Making a summary of the model will give us an idea of what's happening at each layer.

In the embedding layer, each of our words is being turned into a vector of dimension 128. Because there are 10000 words (max_words), there are 1,280,000 parameters (128 x 10000).

Parameters are individual pieces of information. The goal of the model is to take a large number of parameters and reduce them down to something we can understand and make use of (less parameters).

The LSTM layer reduces the number of parameters to 82432 = 4 × [128(128+32) + 128].

The final dense layer connects each of the outputs of the LSTM units into one cell (128 + 1).

In [29]:
# Summarize the different layers in the model
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
Total params: 1,411,713
Trainable params: 1,411,713
Non-trainable params: 0
_________________________________________________________________
None


**Fitting the model to the training data**

Now our model is compiled, it's ready to be set loose on our training data.

We'll be training for 3 epochs with a batch_size of 64.

Because of our loss and optimzation functions, the model accuracy should improve after each cycle.

model.fit(X_train, y_train, epochs=3, batch_size=64) is saying: fit the model we've built on the training dataset for 3 cycles and go over 64 reviews at a time.

Feel free to change the number of epochs (more cycles) or batch_size (more or less information each step) to see how the accuracy changes.

This will take a few minutes.

In [30]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64,validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


It is time to evaluate our model:

In [32]:
loss, acc = model.evaluate(x_test_padded, y_test,
                            batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.3286963403224945
Test accuracy: 0.8628799915313721


Let's analyze the results now and look at some examples.

In [34]:
y_prob = model.predict(x_test_padded)

In [35]:
y_prob

array([[0.0681567 ],
       [0.987632  ],
       [0.6876638 ],
       ...,
       [0.02916446],
       [0.10800051],
       [0.9595628 ]], dtype=float32)

In [36]:
y_pred = np.round(y_prob)

In [37]:
y_pred.reshape([len(y_pred)])

array([0., 1., 1., ..., 0., 0., 1.], dtype=float32)

In [38]:
y_pred = y_pred.reshape([len(y_pred)])

In [39]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # for better printing

In [40]:
results = pd.DataFrame({"review":test_text, "ground_truth":y_test, "prediction":y_pred})

In [41]:
results.head(5)

Unnamed: 0,review,ground_truth,prediction
0,# this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert # is an amazing actor and now the same being director # father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for # and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also # to the two little boy's that played the # of norman and paul they were just brilliant children are often left out of the # list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all,0,0.0
1,# big hair big boobs bad music and a giant safety pin these are the words to best describe this terrible movie i love cheesy horror movies and i've seen hundreds but this had got to be on of the worst ever made the plot is paper thin and ridiculous the acting is an abomination the script is completely laughable the best is the end showdown with the cop and how he worked out who the killer is it's just so damn terribly written the clothes are sickening and funny in equal # the hair is big lots of boobs # men wear those cut # shirts that show off their # sickening that men actually wore them and the music is just # trash that plays over and over again in almost every scene there is trashy music boobs and # taking away bodies and the gym still doesn't close for # all joking aside this is a truly bad film whose only charm is to look back on the disaster that was the 80's and have a good old laugh at how bad everything was back then,1,1.0
2,# this has to be one of the worst films of the 1990s when my friends i were watching this film being the target audience it was aimed at we just sat watched the first half an hour with our jaws touching the floor at how bad it really was the rest of the time everyone else in the theatre just started talking to each other leaving or generally crying into their popcorn that they actually paid money they had # working to watch this feeble excuse for a film it must have looked like a great idea on paper but on film it looks like no one in the film has a clue what is going on crap acting crap costumes i can't get across how # this is to watch save yourself an hour a bit of your life,1,1.0
3,# the # # at storytelling the traditional sort many years after the event i can still see in my # eye an elderly lady my friend's mother retelling the battle of # she makes the characters come alive her passion is that of an eye witness one to the events on the # heath a mile or so from where she lives br br of course it happened many years before she was born but you wouldn't guess from the way she tells it the same story is told in bars the length and # of scotland as i discussed it with a friend one night in # a local cut in to give his version the discussion continued to closing time br br stories passed down like this become part of our being who doesn't remember the stories our parents told us when we were children they become our invisible world and as we grow older they maybe still serve as inspiration or as an emotional # fact and fiction blend with # role models warning stories # magic and mystery br br my name is # like my grandfather and his grandfather before him our protagonist introduces himself to us and also introduces the story that stretches back through generations it produces stories within stories stories that evoke the # wonder of scotland its rugged mountains # in # the stuff of legend yet # is # in reality this is what gives it its special charm it has a rough beauty and authenticity # with some of the finest # singing you will ever hear br br # # visits his grandfather in hospital shortly before his death he burns with frustration part of him # to be in the twenty first century to hang out in # but he is raised on the western # among a # speaking community br br yet there is a deeper conflict within him he # to know the truth the truth behind his # ancient stories where does fiction end and he wants to know the truth behind the death of his parents br br he is pulled to make a last # journey to the # of one of # most # mountains can the truth be told or is it all in stories br br in this story about stories we # bloody battles # lovers the # of old and the sometimes more # # of accepted truth in doing so we each connect with # as he lives the story of his own life br br # the # # is probably the most honest # and genuinely beautiful film of scotland ever made like # i got slightly annoyed with the # of hanging stories on more stories but also like # i # this once i saw the # picture ' forget the box office # of braveheart and its like you might even # the # famous # of the wicker man to see a film that is true to scotland this one is probably unique if you maybe # on it deeply enough you might even re # the power of storytelling and the age old question of whether there are some truths that cannot be told but only experienced,0,1.0
4,# worst mistake of my life br br i picked this movie up at target for 5 because i figured hey it's sandler i can get some cheap laughs i was wrong completely wrong mid way through the film all three of my friends were asleep and i was still suffering worst plot worst script worst movie i have ever seen i wanted to hit my head up against a wall for an hour then i'd stop and you know why because it felt damn good upon bashing my head in i stuck that damn movie in the # and watched it burn and that felt better than anything else i've ever done it took american psycho army of darkness and kill bill just to get over that crap i hate you sandler for actually going through with this and ruining a whole day of my life,1,1.0


## some samples of wrong predictions

In [42]:
results[results['ground_truth'] != results['prediction']].sample(5)

Unnamed: 0,review,ground_truth,prediction
8338,# the horror channel plays nothing but erotic soft porn gothic flicks each night from # till about 4 in the morning but their # factor is very limited if one exists at all in fact i am sure i will find a multi million pound # win more scary than anything this channel has to offer br br the # leads the dance deserves special mention because it is i feel the # low of a channel full of # i cannot even begin to tell you how bad this film is but for the purpose of # the minimum 10 lines # by this site i will at least give it a go br br firstly the title is misleading and bears no resemblance to the action on the screen in fact the film might as well have been called # or # for all it has to do with the plot at least they used # at least they had # br br there are no # for miles around and whats even worse there are no dances not one i'm sure they were making two different films by mistake here br br a more suitable title would have been # italian count leads five people to a scary castle and # us silly for ninety minutes ' yes that fits better br br the acting is terrible and and the dubbing appalling and that guy who plays seymour was almost as wooden in his walk as he was in his character abysmal br br the only saving # of this film are a small but slightly interesting lesbian sex scene two small and very interesting # sex scenes and the added attraction in that every single female character gets her # off bonus br br otherwise steer a wide birth away from this one no vampires no dancing no scenes of a brutal or gruesome nature and no way on gods earth i will ever ever ever watch this one again br br no word of a lie this film could put you off motion pictures for life,1,0.0
20295,# warning this review will reveal the ending of the movie scoop if you don't want to know how the movie ends don't read this review br br scoop is so bad you'll think annie hall was a # br br it gets one star because you get to see hugh # naked chest that's the only thing scoop has going for it br br woody allen's # and his # on women young enough at this point to be his # has crippled any ability to make movies he may have had at any point br br the plot seems promising a ghost ian # directs a # headed student scarlett johansson to investigate whether or not an english lord hugh jackman is the notorious # card killer of prostitutes a magician woody allen helps the girl br br promising plot notwithstanding the movie completely lacks charm or humor or atmosphere it's an amazingly # amateurish effort for someone who has made even one previous film never mind dozens perhaps allen has had a stroke that has gone # in the press br br much is made of the fact that unlike in his previous films woody allen now a # has finally allowed a younger male lead to get the girl br br not so in fact the plot is constructed in such a way so that the girl gets no one br br there is an early scene where johansson for no reason central to the movie allows herself to be gotten drunk and # by a powerful older director # is a # for what happens it's a slam # i've gotta go kind of moment it bears no relation to the plot whatsoever and it # johansson in the viewer's eye why did allen add that unnecessary scene to the movie because it shows a powerful director like allen having sex with the female lead allen gets to have his cake and eat it too br br johansson is not yet an actress she doesn't know how to command the screen except by wearing a tight low cut top she # allen in a couple of scenes and that just looks weird and sad br br it doesn't help that her character is scripted as a doll who can't function without a ghost or an elderly and less than awe inspiring magician telling her what to do at every turn br br she is approximately half # age and she comes across as a very vapid screen presence in their scenes together br br audience members not obsessed with breasts deserve better in their # and jackman deserves better too a script that gives the heroine some intelligence and agency and an actress who can convey those qualities br br hugh jackman is similarly cheated by the script allen apparently can't stand it that jackman is so stunningly good looking and young and so he gives jackman nothing to say or do like johansson he is used merely for his good looks this is a shame because as jackman has shown in any number of productions from # to x men he can act br br here's the big plot twist jackman suave charming english lord really is a killer so though the movie says it is all about letting someone else other than allen get the girl she doesn't get anyone jackman the man she's been making love to is a man who murdered a prostitute nice woody nice way to # your heroine for being beyond your grasp br br in a passive aggressive touch allen # his heroine of his own presence as well killing off his character the magician leaving scarlett johansson all alone at the end of the film br br a final note at my screening not a single audience member laughed at any point during the film always a bad sign when a film is advertised as a comedy,0,1.0
7691,# i saw this back in # when it was finally released apparently because # pictures was in # i think the movie had not been released a couple of years earlier br br i have problem remembering details partly because i haven't seen it in a long time but i do remember it as a very dull movie i kept # whether to walk out of it the store was not at all interesting or engaging was a 3rd rate america # imitation br br none of the performances make it worth watching either one of the biggest # since a local newspaper reviewer gave it a high rating,0,1.0
8816,# in this # acclaimed psychological thriller based on true events gabriel robin williams a celebrated writer and late night talk show host becomes captivated by the harrowing story of a young # and his # mother toni collette when # questions # about this boy's story however gabriel finds himself drawn into a # mystery that hides a deadly # according to film's official synopsis br br you really should stop reading these comments and watch the film now br br the how did he lose his leg ending with ms collette planning her new life should be chopped off and sent to deleted scenes land it's # the true nature of her physical and mental # should be obvious by the time mr williams returns to new york possibly her # could be in question but a revelation could have be made certain in either the highway or video tape scenes the film would benefit from a re editing how about a director's cut br br williams and bobby # as jess don't seem initially believable as a couple a scene or two establishing their relationship might have helped set the stage otherwise the cast is # williams offers an exceptionally strong characterization and not a gay # sandra oh as anna joe # as # and # culkin pete # are all perfect br br best of all # donna belongs in the creepy hall of fame ms oh is correct in saying collette might be you know like that guy from # there have been several years when # giving acting awards seemed to reach for women due to a # # of roles certainly they could have noticed collette with some award consideration she is that good and director patrick # definitely evokes hitchcock he even makes getting a # from a # machine suspenseful br br finally writers # # # and terry anderson deserve # from flight # everywhere br br the night # 1 21 # patrick # robin williams toni collette sandra oh # culkin,0,1.0
17953,# lee chang # exceptional secret sunshine is the single most emotionally # experience of the year it is an instantly # brutally honest character piece on the # of loss and a # # # that # with a striking # of thought yet remains as # as the emotions it # through its layered # and stunningly # view of small town dynamics lee # # the traditional korean melodrama by pulling apart the # of excess and ripping to # the # that shape its characters and grounds the proceedings into a # # of stoic realism br br secret sunshine remains an immensely compelling fluid work throughout its # minute runtime its # first hour is filled to the # with # # remarkable # and # # of tone brought about by # # adapted from a short story lee # the film with his sensitivity for the sublime # of life last seen in his # comic and # # understanding how personal # are # when views of our universe are changed lee not only sees the emotional # of a # sorrow through an # scope but also feels the # existential # that # the film when religion becomes a narrative # in # the # of the human experience br br do # # you are my sunshine best actress # at cannes in 2007 is well deserved her performance as the widow shin # remains an # # as a character pulled apart by forces beyond her control the sheer # of this performance is central to the film's # nature with # # one # # after another there's a # sense of collapse that the film to its credit never approaches instead it finds a delicate balance that # the charged # and subsequent # from ordinary # and its # she becomes the centre of the film's universe as well as # filmed in glorious hand held # the film # the # of frames and compositions by becoming visually # just as it is quietly harrowing when the camera never # its # from shin # through times of happiness guilt and # br br lee captures the details of life in the small suspicious town of #  the # of # situations its uncomfortable # and its # # out of personal dramas shin # interactions with the # rarely # # especially when they are merely done out of # to fit in for the sake of her son # # # # the one recurring # is # chan song # ho a bachelor mechanic of uncertain intentions who helps her en route to # in the film's enchanting open sequence set to a captivating stream of # song has # himself as a comedic anti hero in south # biggest films but his nuanced low key delivery here # the director's thought process of never having to reveal more than # necessary br br if pain is # then grief can never truly # and lee finds complexity in # when shin # attempts to head down the path of # only to be faced again with # # she # employs the # of # christianity as a foil to her sorrow but lee knows better than that when he understands that religion in the context of the human # of # and misery is never a simple solution but lee never # the essence of religion as he realises the value of salvation for some through a higher power even if it serves a form of denial in others the scenes in its latter half which deal with religion doesn't allow itself to become # # which is a feat in itself considering how many filmmakers let the momentum of the material take over from what they need to say to be true to its story and characters br br lee's first film since his call to office as his country's minister of culture and # is an # # on human suffering in a film so # and genuine it # reveals that there's nothing as simple as emotional # just the # and # of agony secret sunshine leaves us with tender # pulled out of # and points towards a profound understanding of despair and faith,0,1.0


## some samples of correct predictions

In [43]:
results[results['ground_truth'] == results['prediction']].sample(5)

Unnamed: 0,review,ground_truth,prediction
652,# this movie catches a lot of # but this is usually based on the horrible looking and covered # version of the film that played us television and has also been # to death on vhs and dvd buy companies like # # etc this movie never had a theatrical release in the states although it was picked up by # # in 1973 in spain at the time when there was nudity involved the filmmakers shot two versions one with clothes and one with out the fully uncut english dubbed # print was titled werewolf never sleeps and seems to have been released to home video only in sweden back in the 80's it can be found on ebay and the likes and comes highly recommended my guess is # cut the film down for a r rated release that never happened in 1974 it was released by # to television titled fury of the # and the # version was used for this tv print cut to 12 years later and fury of the # pops up on home video on the # label this version appears to be what # was going to release back in '73 it's the # version with some nudity that would never pass on tv or in a pg movie there are several scenes on the # tape that play out with nudity that are # in the tv print the source for all those dollar # and vhs # but a comparison to the fully uncut # never sleeps reveals that 2 scenes are cut on this version spoilers in next paragraph the scene where # has # # to the wall and # him after he transforms into the werewolf is # after # him into # she starts to remove her clothes and begins making love to the werewolf the werewolf responds positively to these sexual # too this scene certainly ranks as one of the most unusual in the history of horror films and is a delirious treat it's not graphic but the implied # was too much for us audiences or more likely the mpaa # is desperately in love with # and could not possess him hence her whole scheme to mind control # wife and involve her in an affair she wanted to wreck his marriage and she # this while # is in # unfortunately he returns a werewolf but this does not slow her down a bit if she can't physically have him as a man she loves him enough to have sex with him as a werewolf this also helps explain the later scene where the werewolf # down with a woman he spots getting naked before # while # through her window this scene is presented sans nudity in the covered version and really makes no sense in the uncut version it would seem # affections have made the werewolf horny and in need of release so he rapes the first woman he can after escaping the other cut is a complete scene of # in bed with karen and she is seen naked a very similar bedroom scene was cut out of the us version of werewolf shadow werewolf vs the vampire woman as well the film does have it's problems though for certain the director was drunk the bad stand in for the werewolf at points the atrocious english dubbing the inclusion of sequences from the first # film mark of the # aka # bloody terror and the grotesque # of that film's music score throughout etc but seen in it's original widescreen format and uncut ie werewolf never sleeps it is one of the # and most outrageous of the # werewolf series with a plot line # in it's everything but the kitchen sink approach the cut # pan and # full screen copies of this film do it no # and unfortunately that's the version almost everyone commenting on the film have seen the film carries a 1970 # and i'd bet the 1972 release date on the imdb is incorrect the film # werewolf shadow aka werewolf vs the vampire woman in the series and was certainly released before werewolf shadow the ending of werewolf never sleeps fury of the # # directly into the opening of werewolf shadow offering # evidence of this sadly a complete version of this may never get a decent release a perfect release would be the uncut english version but in spanish with english subtitles the english dubbing severely hurts the movie but any spanish language version would reflect the covered version as shown in spain during the franco era where nudity was #,0,0.0
5270,# i watched mask in the 80's and it's currently showing on fox kids in the uk very late at night i remember thinking that it was kinda cool back in the day and had a couple of the toys too but watching it now # me to tears i never realised before of how tedious and bland this cartoon show really was it's just plain awful it is no where near in the same league as the transformers he man or # and was very quickly forgot by nearly everyone once it stopped being made i only watch it on fox kids because # # comes on straight after it that's if mask doesn't put me to sleep first one of the lesser 80's cartoons that i hope to completely forget about again once it finishes airing on fox kids,1,1.0
10105,# this one acts as a satire during the women's rights movement era of course that doesn't mean coach the movie is a wonderful experience to behold it runs into the same vein as # which was better but still tame and is basically standard fare fluff what i mean for this movie being uninteresting is simple to recognize anybody who serves time away from a normal job by training a bunch of # # their way to sudden victory makes waste it's the same feeling you may get after watching this a nice attempt at casting the opposite sex for a man's duty but i expected better things,1,1.0
2319,# carla works for a property # where she excels in being unattractive # and desperate she is also deaf br br her boss offers to hire in somebody to # her heavy # so she uses the opportunity to secure herself some male company help arrives in the form of paul a # # fresh out of prison and clearly # to the mannered routine of an office environment br br an # sexual tension develops between the two of them and carla is determined to keep him on despite his # to embrace the working week when carla is edged out of an important contract she was # by a slimy colleague she exploits paul's # by having him steal the contract back the colleague quickly realises that she's behind the robbery but when he confronts her paul's # to punch people in the face comes in handy too but this # comes at a price br br paul is given a # # by some mob # as a reminder about an # debt he # a plan which # # unique lip reading abilities to rip off a gang of violent bank robbers it's now # turn to enter a frightening new world br br the fourth feature from director jacques # # my # begins as a thoroughly engaging romantic drama between two # losers only to shift # halfway through into an edgy thriller where their # shortcomings turn them into winners the leads are excellent effortlessly convincing us that this odd couple could really connect # first meeting with paul is an enjoyable farce in which she attempts to # his # # and # manners only to discover that he was until very recently a # # # who plays carla has that almost # ability to go from # to gorgeous and back again within a frame vincent # plays paul as a # dog who only really seems at home when he's receiving a beating or # the rip off that is likely to get him killed br br like many french films # my # appears at first to be about nothing in particular until you scratch beneath the surface and find that it's probably about everything the only bum note is a subplot concerning the missing wife of paul's # officer a device that seems contrived only to help steer the main thrust of the story into a neat little # # de # br br it was the french # # of the 60's that first introduced the concept of # to film making and i've always felt that any medium is somewhat # when you have to use a system of # to help define it so it's always a pleasure to discover a film that seems to # genre or better still defy it,0,0.0
19721,# this was the worst movie i've ever seen yet it was also the best movie sci fi original movie's are supposed to be bad that's what makes them fun the line i like my dinosaur meat well done is probably the best quote ever also the plot sounds like something out of a pot induced dream i can imagine it now the writers waking up after a long night of getting high and playing dance dance revolution then putting ideas together for this space marines got to alien planet which is # with dinosaurs and has medieval houses in it to protect a science team studying the planet best idea ever in fact in fits the complete sci fi original movie # guns dinosaurs medieval times space travel terrible acting br br so go watch this movie but don't buy it,1,1.0


## Below we have some more sample architectures you can try !!

**Extensions**

Let us use LSTM variants. We use check the accuracy by replacing LSTM cell with GRU cell.

In [56]:
# Define the layers in the model
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
#model.add(LSTM(128))
model.add(GRU(32,dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

#num_params_layer 3 × [h(h+i) + h]  = 3 × [32(32+64) + 32] = 9312

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                15552     
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 33        
Total params: 1,295,585
Trainable params: 1,295,585
Non-trainable params: 0
_________________________________________________________________
None


In [57]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [58]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64,validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [59]:
loss, acc = model.evaluate(x_test_padded, y_test,
                            batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.34270304441452026
Test accuracy: 0.8547999858856201


**Using LSTM stack layers**

In [46]:
model= Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
model.add(LSTM(units=64, return_sequences=True))
model.add(LSTM(units=64, return_sequences=True))
model.add(LSTM(units=4))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
lstm_5 (LSTM)                (None, 200, 64)           49408     
_________________________________________________________________
lstm_6 (LSTM)                (None, 200, 64)           33024     
_________________________________________________________________
lstm_7 (LSTM)                (None, 4)                 1104      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 5         
Total params: 1,363,541
Trainable params: 1,363,541
Non-trainable params: 0
_________________________________________________________________
None


In [47]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [49]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64,validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [51]:
loss, acc = model.evaluate(x_test_padded, y_test, batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.4700876474380493
Test accuracy: 0.7819600105285645


**Using Simple RNN**

In [52]:
# Define the layers in the model
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_review_length))
#model.add(LSTM(128))
model.add(SimpleRNN(128,dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 200, 128)          1280000   
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, 128)               32896     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 129       
Total params: 1,313,025
Trainable params: 1,313,025
Non-trainable params: 0
_________________________________________________________________
None


In [53]:
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

In [54]:
# Fit the model to the training data
results = model.fit(x_train_padded, y_train, epochs=3, batch_size=64, validation_data=(x_test_padded, y_test))

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [55]:
loss, acc = model.evaluate(x_test_padded, y_test,
                            batch_size=64)
print('Test loss:', loss)
print('Test accuracy:', acc)

Test loss: 0.492692232131958
Test accuracy: 0.7704799771308899
