# **What are Recurrent Neural Networks and Long Short Term Memory?**

In feed-forward networks, inputs are multiplied by a weight and then bias is added to that and so on and finally we get output from the last layer. But the problem with these types of networks is they do not store memory and cannot be used in sequential data. Even the input and output of this type of network is fixed. We cannot use these types of networks for problems like Stock Price prediction and similar problems. 

This is the reason Recurrent Neural Networks (RNN) was introduced. RNN was designed in a way such that they can catch the sequential / time series data. In RNN, we multiply with the weight associated with the input of the previous state (w1) and weight associated with output for the previous state. And then we pass them to the Tanh function to get the new state. Now to get the output vector we multiply the new state with an output of Tanh function. Deep networks are not preferred in RNN. 

But RNN suffers from a vanishing gradient problem that is very significant changes in the weights that do not help the model learn. To overcome this LSTM was introduced. 

## **Sentiment Analysis using LSTM**

Let us first import the required libraries and data. You can import the data directly from [Kaggle](https://www.kaggle.com/seunowo/sentiment-analysis-twitter-dataset) and use it. There are also many publicly available datasets for sentiment analysis of tweets and reviews. We will use the Twitter Sentiment Data for this experiment. Use the below code to the same. 

In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn keras tensorflow nltk gensim --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils.np_utils import to_categorical
import re

df = pd.read_csv("Sentiment.csv")

We will now explore the data we just imported. We will first see what all is present in the data. We have checked the different columns for that.

In [None]:
print(df.columns)

We will only use the tweets and their corresponding sentiments in this experiment. So we will create a new data frame that will only hold these two columns. We will also check the different sentiments present. Use the below code to the same.

In [None]:
new_df = df[['text','sentiment']]
print(new_df.sentiment)

### **Preprocessing Of Tweets**

We will now preprocess the tweets by excluding unnecessary things from text and convert them to lowercase. Use the below code to perform this.

In [None]:
new_df = new_df[new_df.sentiment != "Neutral"]
new_df['text'] = new_df['text'].str.lower()
new_df['text'] = new_df['text'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))

After this, we will define the vocabulary size that is to be used and use tokenizer to convert them into vectors. We have stored that into the X variable. Use the below code to do so. 

In [None]:
max_fatures = 2000
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(new_df['text'].values)
X = tokenizer.texts_to_sequences(new_df['text'].values)
X = pad_sequences(X)

We then define the LSTM model architecture. Use the below code to define it. The network is similar to Convents networks. The only difference is we have defined two hyperparameters that are embed_dim and lstm_out.  We have then compiled the model using adam optimizer and binary cross-entropy loss.

In [None]:
embed_dim = 128
lstm_out = 196

model = Sequential()
model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())

After this, we encode the sentiments using Label encoder. Use the below code to do that. We have stored the tweets into X and corresponding sentiments into Y.

In [None]:
from sklearn.preprocessing import LabelEncoder
Le = LabelEncoder()
y = Le.fit_transform(new_df['sentiment'])

Then we divide the data set into training and testing sets. Use the below code to do so. After which we passed the training data and validation data to the model. 

In [None]:
Y = pd.get_dummies(new_df['sentiment']).values
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

In [None]:
batch_size = 32
model.fit(X_train, Y_train, epochs = 7, batch_size=batch_size, verbose = 2)

Now we will evaluate the model performance. Use the below code to evaluate the model. 

In [None]:
model.evaluate(X_test,Y_test)

We got 83% accuracy and loss of 0.43.

# **Related Articles:**

> * [Sentiment Analysis using LSTM](https://analyticsindiamag.com/how-to-implement-lstm-rnn-network-for-sentiment-analysis/)

> * [VADER Sentiment Analysis](https://analyticsindiamag.com/sentiment-analysis-made-easy-using-vader/)

> * [Polyglot](https://analyticsindiamag.com/hands-on-tutorial-on-polyglot-python-toolkit-for-multilingual-nlp-applications/)

> * [Textblob](https://analyticsindiamag.com/lets-learn-textblob-quickstart-a-python-library-for-processing-textual-data/)

> * [TextHero Guide](https://analyticsindiamag.com/texthero-guide-a-python-toolkit-for-text-processing/)

> * [Guide to Pattern](https://analyticsindiamag.com/hands-on-guide-to-pattern-a-python-tool-for-effective-text-processing-and-data-mining/)

