# Recurrent Neural Networks and Long Short Term Memory

## 1. What is meant by Recurrent Neural Networks?

- It is used for sequential data to solve common temporal problems seen in language translation and speech recognition etc
- It is a type of ANN which uses sequential data or time series data
- This algorithms are commonly used for ordinal or temporal problems, such as language translation, natural language processing (nlp), speech recognition, and image captioning
- they are popular applications such as Siri, voice search, and Google Translate
- Recurrent neural networks utilize training data to learn
- They take information from prior inputs to influence the current input and output
- While traditional deep neural networks assume that inputs and outputs are independent of each other, the output of recurrent neural networks depend on the prior elements within the sequence

## 2. What is meant by vanishing and exploding gradient and why is that a problem in RNN?

- During backpropagation, we move backward through the network, calculating the derivative of the cost function J with respect to the weights in every layer

# vanishing
- The vanishing gradient problem describes a situation encountered in the training of neural networks where the gradients used to update the weights shrink exponentially
- As a result, the weights are not updated anymore, and learning stops
- Hence the ANN is not trained efficiently to predict the outputs accurately

# exploding gradient
- The exploding gradient problem describes a situation in the training of neural networks where the gradients used to update the weights grow exponentially
- This prevents the backpropagation algorithm from making reasonable updates to the weights, and learning becomes unstable.
- Hence the network is not efficient to predict the outputs accurately

# Solutions
- Gradient Cliping
- Weight Initialization
- Use the ReLU Activation Function (non-saturating activation functions)
- Reducing the amount of Layers

- RNN is a deep network and uses sigmoid funstions, in deep feedforward neural networks, backpropagation has "the unstable gradient problem".
- Training an RNN is a very difficult task
- It cannot process very long sequences if using tanh or relu as an activation function





## 3. What is meant by Long Short Term Memory?

- Long Short-Term Memory (LSTM) networks are a type of recurrent neural network
- They are capable of learning order dependence in sequence prediction problems
- This is a behavior required in complex problem domains like machine translation, speech recognition, and more.
- LSTMs are a complex area of deep learning
- requirements of a recurrent neural network
    - That the system be able to store information for an arbitrary duration
    - That the system be resistant to noise
    - That the system parameters be trainable

## 4. What is meant by Gated Recurrent Unit?

- Gated recurrent unit is an advancement of the standard RNN
- GRUs are very similar to Long Short Term Memory(LSTM)
- Just like LSTM, GRU uses gates to control the flow of information
- They are relatively new as compared to LSTM
- This is the reason they offer some improvement over LSTM and have simpler architecture
- Another Interesting thing about  GRU is that, unlike LSTM, it does not have a separate cell state (Ct). It only has a hidden state(Ht). Due to the simpler architecture, GRUs are faster to train
- At each timestamp t, it takes an input Xt and the hidden state Ht-1 from the previous timestamp t-1. Later it outputs a new hidden state Ht which again passed to the next timestamp

## 5. Train a bi-directional LSTM on imdb movies sentiment dataset from keras (tutorial available on its website, follow that tutorial) (https://keras.io/examples/nlp/bidirectional_lstm_imdb/)

In [1]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

max_features = 30000  # Only consider the top 20k words
maxlen = 500  # Only consider the first 200 words of each movie review

In [2]:
(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=max_features)

print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")

x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
25000 Training sequences
25000 Validation sequences


In [3]:
# Input for variable-length sequences of integers
inputs = keras.Input(shape=(None,), dtype="int32")

# Embed each integer in a 128-dimensional vector
x = layers.Embedding(max_features, 256)(inputs)

# Add 2 bidirectional LSTMs
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True))(x)
x = layers.Bidirectional(layers.LSTM(128))(x)

# Add a classifier
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, None)]            0         
                                                                 
 embedding (Embedding)       (None, None, 256)         7680000   
                                                                 
 bidirectional (Bidirectiona  (None, None, 256)        394240    
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 256)              394240    
 nal)                                                            
                                                                 
 dense (Dense)               (None, 1)                 257       
                                                                 
Total params: 8,468,737
Trainable params: 8,468,737
Non-train

In [None]:
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=32, epochs=50, validation_data=(x_val, y_val))

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50