# **RNN**
A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior.

IMDB sentiment classification task

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. IMDB provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

You can download the dataset from http://ai.stanford.edu/~amaas/data/sentiment/  or you can directly use 
" from keras.datasets import imdb " to import the dataset.

Few points to be noted:
Modules like SimpleRNN, LSTM, Activation layers, Dense layers, Dropout can be directly used from keras
For preprocessing, you can use required 

In [7]:
from keras.datasets import imdb

vocabulary_size = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = vocabulary_size)
print('Loaded dataset with {} training samples, {} test samples'.format(len(X_train), len(X_test)))

Loaded dataset with 25000 training samples, 25000 test samples


In [8]:
#the review is stored as a sequence of integers. 
# These are word IDs that have been pre-assigned to individual words, and the label is an integer

print('---review---')
print(X_train[0])
print('---label---')
print(y_train[0])

# to get the actual review
word2id = imdb.get_word_index()
id2word = {i: word for word, i in word2id.items()}
print('---review with words---')
print([id2word.get(i, ' ') for i in X_train[0]])
print('---label---')
print(y_train[0])

---review---
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 2, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 2, 19, 178, 32]
---label---
1
---review with words---
['the', 'as', 'you', 'wi

In [9]:
#pad sequences (write your code here)
from keras.preprocessing import sequence
# we will pad because all the inputs of RNN should be same size, so we will put max wordlimit as 500
maxwords=500
X_train = sequence.pad_sequences(X_train, maxwords)
X_test = sequence.pad_sequences(X_test, maxwords)


In [10]:
#design a RNN model (write your code)

from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout,SimpleRNN

model=Sequential()
model.add(Embedding(vocabulary_size, 32, input_length=maxwords))
model.add(SimpleRNN(100,return_sequences=True))
model.add(SimpleRNN(100))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())



Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 500, 32)           160000    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, 500, 100)          13300     
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, 100)               20100     
                                                                 
 dense_1 (Dense)             (None, 1)                 101       
                                                                 
Total params: 193,501
Trainable params: 193,501
Non-trainable params: 0
_________________________________________________________________
None


In [11]:
#train and evaluate your model
#choose your loss function and optimizer and mention the reason to choose that particular loss function and optimizer
# use accuracy as the evaluation metric

model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [12]:
batch_size = 64
num_epochs = 3
X_valid, y_valid = X_train[:batch_size], y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], y_train[batch_size:]
model.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7feb82363a90>

In [13]:
#evaluate the model using model.evaluate()
scores = model.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', scores[1])

Test accuracy: 0.5753999948501587


# **LSTM**

Instead of using a RNN, now try using a LSTM model and compare both of them. Which of those performed better and why ?


In [14]:
from keras import Sequential
from keras.layers import Embedding, LSTM, Dense, Dropout

modellstm=Sequential()
modellstm.add(Embedding(vocabulary_size, 32, input_length=maxwords))
modellstm.add(LSTM(100))
modellstm.add(Dense(1, activation='sigmoid'))
print(modellstm.summary())

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 500, 32)           160000    
                                                                 
 lstm (LSTM)                 (None, 100)               53200     
                                                                 
 dense_2 (Dense)             (None, 1)                 101       
                                                                 
Total params: 213,301
Trainable params: 213,301
Non-trainable params: 0
_________________________________________________________________
None


In [15]:
modellstm.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

In [16]:
batch_size = 64
num_epochs = 3
X_valid, y_valid = X_train[:batch_size], y_train[:batch_size]
X_train2, y_train2 = X_train[batch_size:], y_train[batch_size:]
modellstm.fit(X_train2, y_train2, validation_data=(X_valid, y_valid), batch_size=batch_size, epochs=num_epochs)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7feb8aa15210>

In [18]:
scores = modellstm.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', scores[1])

Test accuracy: 0.8729599714279175


Perform Error analysis and explain using few examples.

We can see that LSTM got 87% accuracy and simpleRNN got 57%, we can say LSTM is performing better than simpleRNN. I have written the explaination at the last section

# perfoming error analysis on lstm and simpleRNN

In [20]:
yrnn_pred=model.predict(X_test)
yrnn_pred[yrnn_pred>=0.5]=1
yrnn_pred[yrnn_pred<0.5]=0
ylstm_pred=modellstm.predict(X_test)
ylstm_pred[ylstm_pred>=0.5]=1
ylstm_pred[ylstm_pred<0.5]=0

In [44]:
def review(val):
  st=""
  for i in X_test[val]:
    x=id2word.get(i)
    if (x!=' ' and x):
      st+=x
      st+=' '
  return st

In [45]:
k=6
for i in range(len(X_test)):
  if yrnn_pred[i]!=y_test[i] and ylstm_pred[i] == y_test[i] and k>0:
    k-=1
    print('---review---')
    print(review(i))
    print('---label---')
    print(y_test[i])
    print(f"lstm model got the label as {int(ylstm_pred[i])}")
    print(f"Simple RNN model got the label as {int(yrnn_pred[i])}")
    print("We can see that lstm predicted better than rnn")
    print()
    print("---------------------------------------------------")
    print()
  if k==0:
    break

---review---
the wonder own as by is sequence i i and and to of hollywood br of down and getting boring of ever it sadly sadly sadly i i was then does don't close and after one carry as by are be and all family turn in does as three part in another some to be probably with world and her an have and beginning own as is sequence 
---label---
0
lstm model got the label as 0
Simple RNN model got the label as 1
We can see that lstm predicted better than rnn

---------------------------------------------------

---review---
the of and animation and male it and in and explanation and male take no and and and risk this kill in exploitation is vhs fred in of and be male it mentally who and male watch is popular catch know and it and or kill is and and for and male isn't and male her for would well thousands about and heat as it and to of universe form this did her people and to and of hollywood br of you furthermore who film reading to they of here and male lines enemy not like it of help i i o

# Explaination of why LSTM is better than Simple RNN  

- We know that RNNs have feedback loops in the recurrent layer. This lets them maintain information in 'memory' over time. But, it will be difficult to train simple RNNs to solve problems that require learning long-term temporal dependencies. This is because the gradient of the loss function decays exponentially with time, this called the vanishing gradient problem. 

- LSTM is a type of RNN that uses special units in addition to standard units. LSTM units include a 'memory cell' that can maintain information in memory for long periods of time. A set of gates is used to control when information enters the memory, when it's output, and when it's forgotten. 

- By type architecture, LSTMs learns longer-term dependencies and Simple RNN cannot learn longer-term dependencies,

- As we can see the example1 I printed, it has negitivity at first but positivity at last, as Simple RNN doesnt learn longer term dependencies it labels it as positive. but LSTM gives correct label as it leanrs longer term dependencies.