## Understanding the Need for Sequence Modeling in NLP

Natural Language Processing (NLP) tasks often involve working with sequential data, such as sentences, documents, or dialogues. Unlike static data, sequences have an inherent order and dependencies between elements. For example, the meaning of a word in a sentence often depends on the context provided by preceding or succeeding words. Traditional machine learning models struggle to capture these dependencies effectively, which limits their applicability in tasks like language modeling, machine translation, and sentiment analysis.

### Limitations of Conventional Approaches
1. **Fixed Input Size**: Standard neural networks like feedforward networks are designed for fixed-size inputs and outputs, making them unsuitable for variable-length sequences.
2. **Context Independence**: They process inputs independently, ignoring the relationships between elements of the sequence.
3. **Memory Constraints**: There is no mechanism to retain or leverage the context of prior inputs for decision-making.

To address these limitations, **Recurrent Neural Networks (RNNs)** were developed, offering a robust framework for sequence modeling.



bs, seq_len, emb_size=2,256,300
case1 : Pooling features 2, 300
case2 : No Poolling 2,256*300


f((bs, seq_len, emb_size)) -> (bs, seq_len, emb_size)
f((bs, seq_len, emb_size)) -> (bs, sent_emb)



## Recurrent Neural Networks (RNNs)

RNNs are a type of neural network designed to process sequential data by maintaining a hidden state that captures information from previous elements of the sequence. This makes them well-suited for modeling sequences in NLP and other domains like speech recognition and time-series forecasting.

### How RNNs Work
At each time step $ t$, the RNN processes the current input $ x_t$ and the hidden state from the previous step $ h_{t-1}$, producing an updated hidden state $ h_t$. This process can be described mathematically as:

$$
h_t = f(W_h h_{t-1} + W_x x_t + b)
$$

Here:
- $ W_h$ and $ W_x$ are weight matrices.
- $ b$ is the bias term.
- $ h_t$ is the hidden state at time $ t$, which serves as the "memory" of the network.

The output $ y_t$ at each time step can be computed as:

$$
y_t = g(W_y h_t + c)
$$

Where $ W_y$ is the output weight matrix, $ c$ is the output bias, and $ g$ is the output activation function.

RNNs use backpropagation through time (BPTT) to update weights, enabling them to learn dependencies over time. However, they may struggle with long-term dependencies due to issues like vanishing or exploding gradients.


## Implementing an RNN Model in PyTorch

### Step 1: Generate Random Data
We'll create random sequential data for demonstration purposes. This data will consist of input sequences and their corresponding outputs.

In [1]:
import torch

bs, seq_len, emb_size=2,10,300
xt=torch.rand((bs, seq_len, emb_size))
print(xt.shape)

torch.Size([2, 10, 300])


In [2]:
hidden_dim=256

h0=torch.rand((bs, hidden_dim))
print(h0.shape)

torch.Size([2, 256])


### Step 2: Define an RNN Model
Using PyTorch, we'll implement a basic RNN model to process the generated data. The model will consist of an embedding layer (for input processing), a recurrent layer, and a fully connected layer (for output generation).


In [3]:
class MyRNN(torch.nn.Module):

    def __init__(self, input_dim ,hidden_size, output_dim):
        super(MyRNN, self).__init__()

        self.linear1 = torch.nn.Linear(input_dim  + hidden_size, hidden_size)
        self.activation= torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(hidden_size, output_dim)
    def forward(self, xt, h):
        
        bs, seq_len, emb_size=xt.shape
        
        hidden_outputs=[]
        for t in range(seq_len):
            
            x=torch.cat([xt[:,t],h], 1)
            h = self.linear1(x)
            h= self.activation(h)
            hidden_outputs.append(h)
        hidden_states=torch.stack(hidden_outputs).permute(1,0,2) #seq_len, bs, hidden_dim     
        predictions= self.linear2(hidden_states)    
        return hidden_states, h , predictions

In [4]:
rnn=MyRNN(emb_size,hidden_dim, 1)
hidden_states, h,predictions= rnn(xt,h0)
hidden_states.shape


torch.Size([2, 10, 256])

In [5]:
h.shape

torch.Size([2, 256])

### Step 3:  Defining an RNN  in PyTorch
Defining an RNN (Recurrent Neural Network) in PyTorch involves a few steps, typically leveraging `torch.nn` modules and classes. Here's a structured explanation:


####  Understand Key Components

1. **RNN Layer**:
   - `nn.RNN(input_size, hidden_size, num_layers, batch_first)`: 
     - `input_size`: Number of features in the input.
     - `hidden_size`: Number of features in the hidden state.
     - `num_layers`: Number of stacked RNN layers.
     - `batch_first`: If `True`, the input and output tensors are shaped as `(batch, seq, feature)`.


In [None]:
#batch first= BS, seq_len, embding
#not batch first= seq_len, BS, embding

In [18]:
rnn = torch.nn.RNN(input_size=emb_size, hidden_size=hidden_dim, num_layers=1, batch_first=True)

In [19]:
len(rnn(xt))

2

In [20]:
output=rnn(xt)

In [21]:
output[0].shape

torch.Size([2, 10, 256])

In [22]:
output[1].shape

torch.Size([1, 2, 256])