## **In this practice session, we will learn how to implement Recurrent Neural Networks for Sentiment Analysis**
## **We will use imdb reviews dataset that is available in the Keras library for the implementation**

### **Data processing**
  *   Import the required libraries from Keras
  *   Load the IMDB reviews dataset from Keras library
  *   Load one instance of the review and sentiment
  *   Pad the input data to make all input information into the same length

### **Build an RNN model**
  *   Construct a simple LSTM model 
  *   Compile the model and fit the data into the model
  *   Evaluate the model on unseen test data
  *   Make model predictions on test data

















## **Importing the libraries**

In [1]:
#import the required libraries for the implementation
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from keras.preprocessing import sequence


## **Load the data from Keras into train and test variables**

In [2]:
#load the imdb dataset into train and test set
from keras.datasets import imdb
vocabulary_size = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)

## **Print one instance of the review and corresponding sentiment**

In [16]:
#understanding how the dataset looks like
words = imdb.get_word_index()
#the words are already vectorized in the dataset, hence we reverse the process to see the word distribution
vects = {i: word for word, i in words.items()}
print('review')
print([vects.get(i, ' ') for i in X_train[6]])
#the sentiment is 1 if the review is positive and 0 if the review is negative
print('sentiment')
print(y_train[6])

review
['the', 'and', 'full', 'involving', 'to', 'impressive', 'boring', 'this', 'as', 'and', 'and', 'br', 'villain', 'and', 'and', 'need', 'has', 'of', 'costumes', 'b', 'message', 'to', 'may', 'of', 'props', 'this', 'and', 'and', 'concept', 'issue', 'and', 'to', "god's", 'he', 'is', 'and', 'unfolds', 'movie', 'women', 'like', "isn't", 'surely', "i'm", 'and', 'to', 'toward', 'in', "here's", 'for', 'from', 'did', 'having', 'because', 'very', 'quality', 'it', 'is', 'and', 'and', 'really', 'book', 'is', 'both', 'too', 'worked', 'carl', 'of', 'and', 'br', 'of', 'reviewer', 'closer', 'figure', 'really', 'there', 'will', 'and', 'things', 'is', 'far', 'this', 'make', 'mistakes', 'and', 'was', "couldn't", 'of', 'few', 'br', 'of', 'you', 'to', "don't", 'female', 'than', 'place', 'she', 'to', 'was', 'between', 'that', 'nothing', 'and', 'movies', 'get', 'are', 'and', 'br', 'yes', 'female', 'just', 'its', 'because', 'many', 'br', 'of', 'overly', 'to', 'descent', 'people', 'time', 'very', 'bland']


## **Pad the data sequence to make the inputs into same length**

In [17]:
#for the RNN to work all our input dependencies must have same length 
total_words = 500
X_train = sequence.pad_sequences(X_train, maxlen=total_words)
X_test = sequence.pad_sequences(X_test, maxlen=total_words)

## **Build a basic LSTM model**

In [19]:
#we will build a simple LSTM model with one embedding layer, one LSTM and one output layer
embedding_size=32
model=Sequential()
model.add(Embedding(vocabulary_size, embedding_size, input_length=max_words))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
lstm_1 (LSTM)                (None, 100)               53200     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 101       
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None


## **Compile the model**

In [20]:
#compile the model by passing the optimizer and loss function and the evaluation metric
model.compile(loss='binary_crossentropy', 
             optimizer='adam', 
             metrics=['accuracy'])

## **Pass required parameters and fit the model**

In [21]:
#fit the data to the model and begin training
batch_size = 128
num_epochs = 5
x_val, y_val = X_train[:batch_size], y_train[:batch_size]
xtrain, ytrain = X_train[batch_size:], y_train[batch_size:]
model.fit(xtrain, ytrain, validation_data=(x_val, y_val), batch_size=batch_size, epochs=num_epochs)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7ff7211a8780>

## **Evaluate the model on test set**

In [22]:
#evaluate the model accuracy on unseen test data
scores = model.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', scores[1])

Test accuracy: 0.8600800037384033


## **Make model predictions**

In [23]:
#make model predictions on test data
print("Prediction: ",model.predict_classes(X_test[1:10]))

Prediction:  [[1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [0]
 [1]
 [1]]


In [24]:
#compare the model prediction with actual data
print("Actual: ",y_test[1:10])

Actual:  [1 1 0 1 1 1 0 0 1]
