# Sentiment Analysis using Bidirectional RNN
Sentiment Analysis is the process of determining whether a piece of text is positive, negative, or neutral. It is widely used in social media monitoring, customer feedback and support, identification of derogatory tweets, product analysis, etc. Here we are going to build a Bidirectional RNN network to classify a sentence as either positive or negative using the sentiment-140 dataset.

## Step 1 - Importing the Dataset
First, import the sentiment-140 dataset. Since sentiment-140 consists of about 1.6 million data samples, let’s only import a subset of it. The current dataset has half a million tweets.

In [None]:
#! pip3 install wget
import wget
wget.download("https://nyc3.digitaloceanspaces.com/ml-files-distro/v1/sentiment-analysis-is-bad/data/sentiment140-subset.csv.zip")

!unzip -n sentiment140-subset.csv.zip

## Step 2 - Loading the Dataset
Install pandas library using the pip command. Later, import and read the csv file

In [None]:
import pandas as pd

data = pd.read_csv('sentiment140-subset.csv', nrows=50000)

## Step 3 - Reading the Dataset
Print the data columns.

In [None]:
# CODE HERE

‘Text’ indicates the sentence and ‘polarity’, the sentiment attached to a sentence. ‘Polarity’ is either 0 or 1. 0 indicates negativity and 1 indicates positivity.

In [None]:
# Find the total number of rows in the dataset and print the first 5 rows.
# CODE HERE

## Step 4 - Processing the Dataset
Since raw text is difficult to process by a neural network, we have to convert it into its corresponding numeric representation.

In [None]:
# To do so, initialize your tokenizer by setting the maximum number 
# of words (features/tokens) that you would want to tokenize a sentence to
import re
import tensorflow as tf

max_features = 4000

In [None]:
# fit the tokenizer onto the text
# CODE HERE

In [None]:
# use the resultant tokenizer to tokenize the text
# CODE HERE

In [None]:
# and lastly, pad the tokenized sequences to maintain the same length across all the input sequences
# CODE HERE

In [None]:
# Finally, print the shape of the input vector.
# CODE HERE

## Step 4 - Create a Model
Now, let’s create a Bidirectional RNN model. Use tf.keras.Sequential() to define the model. Add Embedding, SpatialDropout, Bidirectional, and Dense layers.

* An embedding layer is the input layer that maps the words/tokenizers to a vector with embed_dim dimensions.
* The spatial dropout layer is to drop the nodes so as to prevent overfitting. 0.4 indicates the probability with which the nodes have to be dropped.
* The bidirectional layer is an RNN-LSTM layer with a size lstm_out.
*  The dense is an output layer with 2 nodes (indicating positive and negative) and softmax activation function. Softmax helps in determining the probability of inclination of a text towards either positivity or negativity.

In [None]:
mbed_dim = 256
lstm_out = 196

# CODE HERE
# Finally, attach categorical cross entropy loss and Adam optimizer functions to the model.
# CODE HERE

In [None]:
# Print the model summary to understand its layer stack.
# CODE HERE

## Step 5 - Initialize Train and Test Data
Install and import the required libraries.

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split

In [None]:
# Create a one-hot encoded representation of the output labels using the get_dummies() method.
# CODE HERE

In [None]:
# Map the resultant 0 and 1 values with ‘Positive’ and ‘Negative’ respectively.
# CODE HERE

In [None]:
Y = Y.values
# Split train and test data using the train_test_split() method.
# CODE HERE

In [None]:
# Print the shapes of train and test data.
# CODE HERE

## Step 6 - Training the Model
Call the model’s fit() method to train the model on train data for about 20 epochs with a batch size of 128.

In [None]:
# CODE HERE

In [None]:
# Plot accuracy and loss graphs captured during the training process.
# CODE HERE

## Step 7 - Computing the Accuracy
Print the prediction score and accuracy on test data.

In [None]:
# Print the prediction score and accuracy on test data.
# CODE HERE

## Step 8 - Perform Sentiment Analysis
Now's the time to predict the sentiment (positivity/negativity) for a user-given sentence. First, initialize it.

In [None]:
twt = ['I do not recommend this product']
#  Tokenize it.
twt = tokenizer.texts_to_sequences(twt)
# Pad it.
twt = tf.keras.preprocessing.sequence.pad_sequences(twt, maxlen=X.shape[1], dtype='int32', value=0)

In [None]:
#Predict the sentiment by passing the sentence to the model we built.
sentiment = model.predict(twt, batch_size=1)[0]
print(sentiment)

if(np.argmax(sentiment) == 0):
    print(y_arr[0])
elif (np.argmax(sentiment) == 1):
    print(y_arr[1])