### Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNNs) are a class of artificial neural networks specifically designed for processing sequential data. They are widely used in time-series forecasting, speech recognition, natural language processing (NLP), and more.

!["rnn"](../images/4/4-rnn.png)
<br><br>

---

#### What is Time-Series and Sequential Data?

Sequential data refers to any data where order matters. Examples include:

- Time-Series Data: Ordered data points indexed by time (e.g., stock prices, weather data).
- Natural Language Data: Words appear in a particular order in sentences.
- Biological Sequences: DNA sequences follow a specific pattern.

##### Examples of Sequential Data Applications

- Speech Recognition: Converting spoken language into text.
- Sentiment Classification: Analyzing text to determine emotional tone (positive/negative/neutral).
- DNA Sequence Analysis: Identifying patterns in genetic sequences.
  <br><br>

---

#### What is Sequential Dependency in Language and Time-Series Data?

Sequential dependency refers to the relationship between data points across time. For instance:

- In language, the meaning of a word depends on the context set by previous words.
- In time-series data, the next value often depends on previous values (e.g., tomorrow's temperature depends on today’s and yesterday's temperatures).

Mathematically, sequential dependency can be expressed as:

$$
P(x_t | x_{t-1}, x_{t-2}, ..., x_1)
$$

where each value \( x_t \) is dependent on previous values.
<br><br>

---

#### Why Standard Neural Networks Fail for Sequential Data

Standard feedforward neural networks (FNNs) process inputs independently and cannot capture dependencies between elements in a sequence. The main limitations are:

1. Fixed Input Size: Cannot handle variable-length sequences.
2. Lack of Memory: No mechanism to remember previous states.
3. Inefficiency in Sequence Processing: Requires separate models for different sequence lengths.

Due to these limitations, RNNs were introduced to effectively model sequential dependencies.
<br><br>

---

#### RNN Architecture and Working Mechanism

- RNNs introduce the concept of hidden states to retain memory over time.

##### Mathematical Representation:

Given an input sequence \( x_t \), an RNN updates its hidden state \( h_t \) at each time step as follows:

$$
 h_t = f(W \cdot x_t + U \cdot h_{t-1} + b)
$$

where:

- \( x_t \) is the current input.
- \( h\_{t-1} \) is the previous hidden state.
- \( W, U, b \) are learnable parameters.
- \( f \) is an activation function (commonly tanh or ReLU).

The final output can be computed as:

$$
 y_t = g(V \cdot h_t + c)
$$

where \( V, c \) are additional parameters and \( g \) is an activation function.
<br><br>

---

#### What is the Vanishing Gradient Problem?

When training deep RNNs using backpropagation through time (BPTT), gradients may shrink exponentially, causing:

- Loss of long-term dependencies: The model cannot learn dependencies across distant time steps.
- Ineffective training: Weights stop updating due to near-zero gradients.

Mathematically, if we compute the gradient of the loss \( L \) with respect to earlier states:

$$
\frac{\partial L}{\partial W} = \prod_{t=1}^{T} W^T \cdot \frac{\partial L}{\partial h_T}
$$

If \( W \) has small eigenvalues, the gradients decay exponentially, leading to vanishing gradients.

##### Solution: LSTM and GRU

- To address this, architectures like Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) introduce gating mechanisms to better retain long-term dependencies.
  <br><br>

---

#### Applications of RNN in NLP

RNNs are widely used in NLP for:

##### 1. Machine Translation

- Translating text from one language to another (e.g., English to French).

##### 2. Sentiment Analysis

- Classifying text into positive, negative, or neutral sentiments.

##### 3. Named Entity Recognition (NER)

- Identifying entities like names, locations, and dates in text.

##### 4. Speech-to-Text Conversion

- Recognizing spoken words and converting them into text.

##### 5. Text Generation

- Generating realistic and coherent text sequences.


---


#### Real-Life Application of RNN Using the IMDB Dataset

- The dataset link &rarr; [IMDB_Dataset.csv](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)


In [None]:
import string

import nltk
import numpy as np
import pandas as pd
from gensim.models import Word2Vec
from keras_preprocessing.sequence import pad_sequences
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.layers import Dense, Embedding, SimpleRNN
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.text import Tokenizer  # Deprecated

In [2]:
data = pd.read_csv("../data/IMDB_Dataset.csv")
print(data.head())

                                              review sentiment
0  One of the other reviewers has mentioned that ...  positive
1  A wonderful little production. <br /><br />The...  positive
2  I thought this was a wonderful way to spend ti...  positive
3  Basically there's a family where a little boy ...  negative
4  Petter Mattei's "Love in the Time of Money" is...  positive


In [3]:
# Text preprocessing
nltk.download("stopwords")
stop_words = set(stopwords.words("english"))

data["review"] = [
    " ".join(
        [
            word.lower()
            for word in sentence.translate(
                str.maketrans("", "", string.punctuation)
            ).split()
            if word.lower() not in stop_words
        ]
    )
    for sentence in data["review"]
]

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\iscie\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [4]:
# Create DataFrame
df = pd.DataFrame(data)

In [6]:
# Tokenize the review data
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df["review"])
sequences = tokenizer.texts_to_sequences(df["review"])
word_index = tokenizer.word_index
print("Vocab size:", len(word_index))

# Padding process
maxlen = max(len(seq) for seq in sequences)
X = pad_sequences(sequences, maxlen=maxlen)
print("Max length:", maxlen)
print("X shape:", X.shape)

Vocab size: 181543
Max length: 1449
X shape: (50000, 1449)


In [8]:
# Label encoding
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(df["sentiment"])
print("Y", y)
print("Y", y.shape)

Y [1 1 1 ... 0 0 0]
Y (50000,)


In [16]:
# Train-Test split
X_train, X_test, Y_train, Y_test = train_test_split(
    X, y, test_size=0.15, random_state=42
)

In [17]:
# Word Embedding: Word2Vec, Embedding matrix
sentences = [text.split() for text in df["review"]]
word2vec_model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)

embedding_dim = 100
embedding_matrix = np.zeros((len(word_index) + 1, embedding_dim))
for word, i in word_index.items():
    if word in word2vec_model.wv:
        embedding_matrix[i] = word2vec_model.wv[word]

In [18]:
# Build RNN Model
model = Sequential()
model.add(
    Embedding(
        input_dim=len(word_index) + 1,
        output_dim=embedding_dim,
        weights=[embedding_matrix],
        input_length=maxlen,
        trainable=False,
    )
)
model.add(SimpleRNN(units=100, return_sequences=False))
model.add(Dense(1, activation="sigmoid"))

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

In [30]:
# Train RNN Model
model.fit(X_train, Y_train, epochs=5, batch_size=64, validation_data=(X_test, Y_test))

Epoch 1/5
[1m665/665[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m121s[0m 183ms/step - accuracy: 0.6754 - loss: 0.5888 - val_accuracy: 0.6803 - val_loss: 0.5808
Epoch 2/5
[1m665/665[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m126s[0m 190ms/step - accuracy: 0.7011 - loss: 0.5611 - val_accuracy: 0.7116 - val_loss: 0.5516
Epoch 3/5
[1m665/665[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m122s[0m 183ms/step - accuracy: 0.7258 - loss: 0.5353 - val_accuracy: 0.7627 - val_loss: 0.4926
Epoch 4/5
[1m665/665[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 186ms/step - accuracy: 0.7599 - loss: 0.4950 - val_accuracy: 0.6300 - val_loss: 0.6542
Epoch 5/5
[1m665/665[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m121s[0m 183ms/step - accuracy: 0.7538 - loss: 0.5045 - val_accuracy: 0.7924 - val_loss: 0.4545


<keras.src.callbacks.history.History at 0x13341eb5b80>

In [31]:
# Evaluate RNN Model
loss, accuracy = model.evaluate(X_test, Y_test)
print("Test loss:", loss)
print("Test accuracy:", accuracy)

[1m235/235[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 47ms/step - accuracy: 0.7935 - loss: 0.4556
Test loss: 0.4545172154903412
Test accuracy: 0.7924000024795532


In [33]:
# Text classification guessing
def classify_sentence(sentence):
    seq = tokenizer.texts_to_sequences([sentence])
    padded_seq = pad_sequences(seq, maxlen=maxlen)

    prediction = model.predict(padded_seq)
    predicted_class = (prediction > 0.5).astype(int)
    label = "positive" if predicted_class[0][0] == 1 else "negative"
    return label

In [34]:
pos_sentence = "The movie was absolutely amazing! The storyline was captivating from start to finish, and the performances by the cast were outstanding. The visuals were stunning, and the soundtrack perfectly complemented the mood of each scene. It was an emotional rollercoaster, and I thoroughly enjoyed every moment of it. Highly recommend!"
pos_result = classify_sentence(pos_sentence)

print("Guess:", pos_result)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step
Guess: positive


In [35]:
neg_sentence = "The movie was a huge disappointment. The plot was predictable and lacked depth, and the characters felt one-dimensional. The pacing was slow, making it hard to stay engaged, and the special effects were underwhelming. Overall, it didn’t live up to the hype, and I wouldn't recommend it."
neg_result = classify_sentence(neg_sentence)

print("Guess:", neg_result)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step
Guess: negative
