# **RNN**
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.

IMDB sentiment classification task

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. IMDB provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

You can download the dataset from http://ai.stanford.edu/~amaas/data/sentiment/  or you can directly use 
" from keras.datasets import imdb " to import the dataset.

Few points to be noted:
Modules like SimpleRNN, LSTM, Activation layers, Dense layers, Dropout can be directly used from keras
For preprocessing, you can use required 

In [11]:
#load the imdb dataset 
from keras.datasets import imdb

vocabulary_size = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)
print('Loaded dataset with {} training samples, {} test samples'.format(len(X_train), len(X_test)))

Loaded dataset with 25000 training samples, 25000 test samples


In [12]:
#the review is stored as a sequence of integers. 
# These are word IDs that have been pre-assigned to individual words, and the label is an integer

print('---review---')
print(X_train[0])
print('---label---')
print(y_train[0])

# to get the actual review
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print('---review with words---')
print([id2word.get(i, ' ') for i in X_train[0]])
print('---label---')
print(y_train[0])

---review---
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]
---label---
1
---review with words---
['the', 'as', 'you', 'wi

In [13]:
#pad sequences (write your code here)
from keras.preprocessing import sequence
from tensorflow.keras.preprocessing.text import Tokenizer
import numpy as np

tk = Tokenizer(lower=False) 
tk.fit_on_texts(X_train)
X_train = tk.texts_to_sequences(X_train)
X_test = tk.texts_to_sequences(X_test)

tlens = [len(samp) for samp in X_train]
maxLength = int(np.ceil(np.mean(tlens)))

X_train = sequence.pad_sequences(X_train, maxlen=maxLength, padding='post', truncating='post')
X_test = sequence.pad_sequences(X_test, maxlen=maxLength, padding='post', truncating='post')

total_words = len(tk.word_index) + 1  


In [14]:
#design a RNN model (write your code)

from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout, SimpleRNN

dim = 32
out = 64

model = Sequential()
model.add(Embedding(total_words, dim, input_length = maxLength))
model.add(SimpleRNN(out))
model.add(Dense(1, activation='sigmoid'))




Adam Optimizer:<br>
Adam optimizer involves a combination of two gradient descent methods: Momentum: This algorithm helps to speed up the gradient descent algorithm by taking into consideration the 'exponentially weighted average' of the gradients. So using averages makes the algorithm converge towards the minima in a faster rate<br>
Binary Cross Entropy as Loss:<br>
Given problem is binary classification so i choose binary cross entropy, binary cross entropy compares each predicted probabilities from dense layer to actual output and then calculates score by penalizing probability based on the distance of expected value. i.e how close from actual value. So this is best in this case.



In [15]:
#train and evaluate your model
#choose your loss function and optimizer and mention the reason to choose that particular loss function and optimizer
# use accuracy as the evaluation metric

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

In [16]:
batch_size = 32
num_epochs = 10

model.fit(X_train, y_train, batch_size = batch_size, epochs = num_epochs)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f87e515ed90>

In [17]:
#evaluate the model using model.evaluate()
model.evaluate(X_test, y_test, verbose=0)

[0.8340334296226501, 0.5077999830245972]

# **LSTM**

Instead of using a RNN, now try using a LSTM model and compare both of them. Which of those performed better and why ?
Ans: LSTM is better as it tackles the vanishing gradience problem and short memory in RNN. 

In [18]:
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout, SimpleRNN

dim = 32
out = 64

model = Sequential()
model.add(Embedding(total_words, dim, input_length = maxLength))
model.add(LSTM(out))

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
batch_size = 32
num_epochs = 10
model.fit(X_train, y_train, batch_size = batch_size, epochs = num_epochs)
model.evaluate(X_test, y_test, verbose=0)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.5828858017921448, 0.7424799799919128]

Perform Error analysis and explain using few examples.

LSTMS are performing better than RNN whis is because<br>
1)Because of the long texts, RNN is forgetting the previous context and continuing to evaluate the end, but LSTM has memory because of the gates<br>
2)Key words are in the start of the samples and RNN does not have information about them.<br>
For example: <br>
1)the as you with brilliant of guy it used victor worst that it keep in long put this of now young of tried to answer instead he entire been its how korean this you<br>
2)grade about hated it for br so ten remain by in of songs are of and and is morality it's her or know would care i i br screen that obvious plot actors new would with paris not have attempt lead or of too would local
