# RNN (Recurrent Neural Network)
**RNN (Recurrent Neural Network)** is a type of neural network designed for processing **sequential data**, such as time series, text, or speech. Unlike traditional feedforward networks (e.g., CNN), RNNs have memory through **hidden states**, allowing to retain information from previous times steps and influence future outputs.

## Core Idea of RNN
RNNs use **recurrent connections**, meaning the output at each time step depends on:
* The **current input** (e.g., a word in a sentence).
* The **previous hidden state** (memory of past data). 

In [2]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Example
texts = ["I love this movie!", 
         "This film is terrible.", 
         "Great story!", 
         "Worst acting ever."]
labels = [1, 0, 1, 0]  # 1=positive, 0=negative

# vocab
word_to_idx = {"<PAD>": 0, "<UNK>": 1}  # padding and unknown word
for text in texts:
    for word in text.lower().split():
        if word not in word_to_idx:
            word_to_idx[word] = len(word_to_idx)

print(word_to_idx)

# Converts text into an index sequence
def text_to_indices(text, word_to_idx):
    return [word_to_idx.get(word.lower(), word_to_idx["<UNK>"]) 
            for word in text.split()]

# Dataset
class TextDataset(Dataset):
    def __init__(self, texts, labels, word_to_idx):
        self.texts = [text_to_indices(text, word_to_idx) for text in texts]
        self.labels = labels
        
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        return torch.tensor(self.texts[idx]), torch.tensor(self.labels[idx])

# pad
def collate_fn(batch):
    texts, labels = zip(*batch)
    texts_padded = torch.nn.utils.rnn.pad_sequence(texts, batch_first=True, padding_value=0)
    return texts_padded, torch.stack(labels)

dataset = TextDataset(texts, labels, word_to_idx)
dataloader = DataLoader(dataset, batch_size=2, collate_fn=collate_fn)

{'<PAD>': 0, '<UNK>': 1, 'i': 2, 'love': 3, 'this': 4, 'movie!': 5, 'film': 6, 'is': 7, 'terrible.': 8, 'great': 9, 'story!': 10, 'worst': 11, 'acting': 12, 'ever.': 13}


In [3]:
class SimpleRNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, output_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        self.rnn = nn.RNN(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        embedded = self.embedding(x)  # [batch, seq_len, embed_dim]
        _, hidden = self.rnn(embedded)  # hidden: [1, batch, hidden_dim]
        output = self.fc(hidden.squeeze(0))  # [batch, output_dim]
        return output

# hyper-parameter
VOCAB_SIZE = len(word_to_idx)
EMBED_DIM = 16
HIDDEN_DIM = 32
OUTPUT_DIM = 1  

model = SimpleRNN(VOCAB_SIZE, EMBED_DIM, HIDDEN_DIM, OUTPUT_DIM)

In [4]:
# Loss function and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train loop
for epoch in range(10):
    for batch_texts, batch_labels in dataloader:
        optimizer.zero_grad()
        predictions = model(batch_texts).squeeze(1)
        loss = criterion(predictions, batch_labels.float())
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

Epoch 1, Loss: 0.7415
Epoch 2, Loss: 0.4603
Epoch 3, Loss: 0.2376
Epoch 4, Loss: 0.1139
Epoch 5, Loss: 0.0552
Epoch 6, Loss: 0.0289
Epoch 7, Loss: 0.0168
Epoch 8, Loss: 0.0107
Epoch 9, Loss: 0.0074
Epoch 10, Loss: 0.0054


In [8]:
def predict_sentiment(model, word_to_idx, sentence):
    model.eval()
    indices = text_to_indices(sentence, word_to_idx)
    tensor = torch.LongTensor(indices).unsqueeze(0)  # [1, seq_len]
    with torch.no_grad():
        output = torch.sigmoid(model(tensor))
    return "Positive" if output.item() > 0.5 else "Negative"

# Tests
print(predict_sentiment(model, word_to_idx, "This is great!")) 
print(predict_sentiment(model, word_to_idx, "I hate it"))      

Negative
Positive


## Pros & Cons
### Pros
1. **Sequence Modeling**
* RNNs are **natively designed** to process sequential data (e.g., time series, text, speech) by maintaining a **hidden state* that captures temporal dependencies.
2. **Variable-Length Inputs**
* Unlike CNNs or feedforward networks, RNNs can handle **inputs of varying lengths** (e.g., sentences of different lengths).
3. **Parameter Sharing**
* The same weights are reused across time steps, reducing the number of parameters compared to fully connected networks.
4. **Memory of Past Information**
* Hidden states act as a "memory" of previous inputs.

### Cons
1. **Vanishing/Exploding Gradients**
* In long sequences, gradients can **vanish** (become too small) or **explode** (become too large), making training difficult.
2. **Short-Term Memory**
* Standard RNNs struggle to retain **long-range dependencies** (e.g., a word at the start of a paragraph influencing the end).
3. **Computationally Slow**
* RNNs process data **sequentially** (one time step at a time), preventing parallelization.
4. **Difficulty with Very Long Sequences**
* Even with LSTM/GRU, extremely long sequences (e.g., books, hour-long audio) remain challenging.