<a href="https://colab.research.google.com/github/kumar4372/sentiment_analysis_hands_on/blob/master/(participant)_Empty_Using_RNN_for_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Sentiment Analysis Using Recurrent Neural Network**

---



In this tutorial, we will use RNN for sentiment analysis task on movie review dataset.

**What is sentiment analysis?**

Sentiment Analysis is nothing but finding the sentiments of reviews whether it is positive or negative review.

**Example Code to refer**: https://slundberg.github.io/shap/notebooks/deep_explainer/Keras%20LSTM%20for%20IMDB%20Sentiment%20Classification.html

**Notes**
- RNNs are tricky. Choice of batch size is important,
choice of loss and optimizer is critical, etc.
Some configurations won't converge.
- LSTM loss decrease patterns during training can be quite different
from what you see with CNNs/MLPs/etc.

**Importing Libraries**

We start by importing the required dependencies to preprocess our data and build our model.

In [None]:
# Import the dependencies
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN,LSTM, GRU
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence

print("Imported dependencies.")


**Loading Data**

We will use IMDB sentiment classification dataset which consists of 50,000 movie reviews from IMDB users that are labeled as either positive (1) or negative (0). 

Continue downloading the IMDB dataset, which is, fortunately, already built into Keras.

In [None]:
# enter your code

**Exploring the data**

You can see in the output above that the dataset is labeled into two categories, — 0 or 1, which represents the sentiment of the review. The whole dataset contains 9,998 unique words and the average review length is 234 words, with a standard deviation of 173 words.

In [None]:
# enter your code

You can see the first review of the dataset, which is labeled as positive (1). The code below retrieves the dictionary mapping word indices back into the original words so that we can read them. It replaces every unknown word with a “#”. It does this by using the get_word_index() function.

In [None]:
index = imdb.get_word_index()

reverse_index = dict([(value, key) for (key, value) in index.items()]) 

print(" ".join( [reverse_index.get(i - 3, "#") for i in x_train[0]] ))


**Data Preparation**

Now it's time to prepare our data. 

As we know, each review consists of different number of words. Some reviews could even be one word long. e.g. "nice"

Deep learning models look best when all of the data is in a similar shape. 

Here we consider maximum length of our input sequence to be 100. pad_sequences will add 0's to any reviews which don't have a length of 100.

For example, our one word review above would become: "index(nice) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 99 times"

The same goes for any reviews longer than 100 characters, they will be shortened to a maximum of 100.

In [None]:
# enter your code

**BUILDING AND TRAINING THE MODEL**

Now our data is ready for some modelling!

Deep learning models have layers.

The top layer takes in the data we've just prepared, the middle layers do some math on this data and the final layer produces an output we can hopefully make use of.

In our case, our model has three layers, 

1. Embedding layer
2. LSTM layer
3. Dense layer.

Our model begins with the line model = Sequential(). Think of this as simply stating "our model will flow from input to output layer in a sequential manner" or "our model goes one step at a time".

**Embedding layer**

The Embedding layer creates a database of the relationships between words.

model.add(Embedding(max_words, embedding_vector_length, input_length=max_review_length)) is saying: add an Embedding layer to our model and use it to turn each of our words into 32 dimensional vector which have some mathematical relationship to each other.

So each of our words will become vectors of dimension 32.

For example, vector of "the" = [0.556433, 0.223122, 0.789654....].

Don't worry for now how this is computed, Keras does it for us.

**LSTM layer**

model.add(LSTM(128)) is saying: add a LSTM layer after our embedding layer in our model and give it 128 units.

LSTM = Long short-term memory. Think of LSTM's as a tap, a tap whichs decides which words flow through the model and which words don't. This layer uses 100 taps to decide which words matter the most in each review.

**Dense layer**

model.add(Dense(1, activation='sigmoid')) is saying: add a Dense layer to the end of our model and use a sigmoid activation function to produce a meaningful output.

A dense layer is also known as a fully-connected layer. This layer connects the 128 LSTM units in the previous layer to 1 unit. This last unit them takes all this information and runs it through a sigmoid function.

Essentially, the sigmoid function will decide if the information should be given a 1 or a -1. 1 for positive and -1 for negative. This is will decided on based on the information passed through by the LSTM layer.


Lastly, we let Keras print a summary of the model we have just built.

In [None]:
# enter your code

**Compiling the model**

Now we compile our model, which is nothing but configuring the model for training. We use the “adam” optimizer, an algorithm that changes the weights and biases during training. We also choose "binary_crossentropy" as loss (because we deal with binary classification) and "accuracy" as our evaluation metric.

In [None]:
# enter your code

**Summarize the model**

Making a summary of the model will give us an idea of what's happening at each layer.

In the embedding layer, each of our words is being turned into a vector of dimension 32. Because there are 10000 words (max_words), there are 320,000 parameters (32 x 10000).

Parameters are individual pieces of information. The goal of the model is to take a large number of parameters and reduce them down to something we can understand and make use of (less parameters).

The LSTM layer reduces the number of parameters to 82432 = 4 × [128(128+32) + 128].

The final dense layer connects each of the outputs of the LSTM units into one cell (128 + 1).

In [None]:
# enter your code

**Fitting the model to the training data**

Now our model is compiled, it's ready to be set loose on our training data.

We'll be training for 3 epochs with a batch_size of 64.

Because of our loss and optimzation functions, the model accuracy should improve after each cycle.

model.fit(X_train, y_train, epochs=3, batch_size=64) is saying: fit the model we've built on the training dataset for 3 cycles and go over 64 reviews at a time.

Feel free to change the number of epochs (more cycles) or batch_size (more or less information each step) to see how the accuracy changes.

This will take a few minutes.

In [None]:
# enter your code

It is time to evaluate our model:

In [None]:
# enter your code