    Recurrent Neural Network (RNN) : 
        It's a type of artificial neural network designed for processing sequential data or data with a temporal dimension.
        
        It's a type of neural network where the output from the previous step is fed as input to the current step.

An RNN processes input sequences step-by-step, where at each time step $ t $, it takes an input $ x_t $ and produces an output $ h_t $, which also serves as the internal state or memory. This internal state at time $ t $ is influenced by the current input $ x_t $ as well as the previous state $ h_{t-1} $, thus allowing the network to maintain a form of memory across different time steps.
   
The basic formula for computing the output $ h_t $ of an RNN at time $ t $ is as follows:

$$ h_t = \text{Activation} (W_{xh} \cdot x_t + W_{hh} \cdot h_{t-1} + b_h) $$

Where:
- $ h_t $ is the output/internal state at time $ t $.
- $ x_t $ is the input at time $ t $.
- $ W_{xh} $ is the weight matrix applied to the current input $ x_t $.
- $ W_{hh} $ is the weight matrix applied to the previous state $ h_{t-1} $.
- $ b_h $ is the bias term.
- $ \text{Activation} $ is an activation function applied element-wise, commonly a hyperbolic tangent (tanh) or ReLU (Rectified Linear Unit) function.   

    One limitation of Basic RNNs :
        vanishing or exploding gradient problem
        
        training is very difficult task.
        
        network struggles to learn long-range dependencies
            due to difficulties in propagating information over many time steps
            
    To address these we have 
        Long Short-Term Memory networks (LSTMs)
        Gated Recurrent Units (GRUs)

![image.png](attachment:image.png)

    Advantages of Recurrent Neural Network
        In this it remembers each and every piece of information through time.

    Types of RNN
        One to One 
            behaves the same as any simple Neural network, also known as Vanilla Neural Network (VNN)
            only one input and one output.
            
        One to Many 
            a single input that generates a sequence of outputs.
            
            eg : generating a sequence of words given an image or a sentence
            
        Many to One 
            a sequence of inputs and produces a single output
            
            eg : sentiment analysis -> predicts the sentiment of the sentence
            
        Many to Many
            Sequence-to-sequence :
                a sequence of inputs and produces a sequence of outputs.
                
                eg : translation of one language to another language

# Variation Of Recurrent Neural Network (RNN)

    Bidirectional Neural Network (BiNN)
    Long Short-Term Memory (LSTM)

### Bidirectional Neural Network (BiNN)

    This variant of RNN in which input data is processed in both the direction(forward and backward), 
    then the output of both direction are combined to produce the input.

### Long Short-Term Memory (LSTM)

    designed to address the vanishing gradient problem and effectively capture long-term dependencies in sequential data.
    
        Forget Gate: Determines what information to discard from the cell state.
        Input Gate: Modifies the cell state by adding new information.
        Output Gate: Determines the output based on the modified cell state.

    key components are the cell state and hidden state.
    
    
    The gates regulate the flow of information, enabling the LSTM to remember or forget information based on the input data.

1. **Cell State:** The cell state, often denoted as $C_t$, is the "memory" of the LSTM unit. It runs through the entire sequence and carries information over different time steps. The cell state allows the LSTM to retain information for long periods, selectively update or erase information through the gates, and pass relevant information to the next time step.

2. **Hidden State (or Output):** The hidden state, also known as the output state or $h_t$, carries information that will be passed to the next LSTM unit (if in a sequence) or used as the final output in tasks where only one output is needed. The hidden state is selectively updated based on the current input, the previous hidden state, and the cell state.

These components interact through various mechanisms within the LSTM unit:

- **Forget Gate:** $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$

- **Input Gate:** 
  $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$
  $\tilde{C}_t = \text{tanh}(W_c \cdot [h_{t-1}, x_t] + b_c)$
  $C_t = f_t \cdot C_{t-1} + i_t \cdot \tilde{C}_t$

- **Output Gate:** 
  $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$
  $h_t = o_t \cdot \text{tanh}(C_t)$

Here, 
- $f_t$ is the forget gate output.
- $i_t$ is the input gate output.
- $\tilde{C}_t$ is the candidate update for the cell state.
- $C_t$ is the updated cell state.
- $o_t$ is the output gate output.
- $h_t$ is the hidden state output.

These components work together within the LSTM unit to control the flow of information, manage long-term dependencies, and process sequential data effectively.



In Long Short-Term Memory (LSTM) networks, the key components are the cell state and the hidden state, which work together within an LSTM unit to process sequential data.

Cell State: The cell state, often denoted as 
�
�
C 
t
​
 , is the "memory" of the LSTM unit. It runs through the entire sequence and carries information over different time steps. The cell state allows the LSTM to retain information for long periods, selectively update or erase information through the gates, and pass relevant information to the next time step.

Hidden State (or Output): The hidden state, also known as the output state or 
ℎ
�
h 
t
​
 , carries information that will be passed to the next LSTM unit (if in a sequence) or used as the final output in tasks where only one output is needed. The hidden state is selectively updated based on the current input, the previous hidden state, and the cell state.

These components interact through various mechanisms within the LSTM unit:

Forget Gate: Determines which parts of the cell state to forget or discard. It takes as input the previous hidden state 
ℎ
�
−
1
h 
t−1
​
  and the current input 
�
�
x 
t
​
  and outputs a forget gate value 
�
�
f 
t
​
 , which is multiplied element-wise with the previous cell state 
�
�
−
1
C 
t−1
​
 . It controls the information retention or deletion from the cell state.

Input Gate: Decides what new information to incorporate into the cell state. It consists of two parts: the input gate and a candidate update. The input gate decides which parts of the candidate update (derived from the current input and previous hidden state) will be added to the cell state. The candidate update is first calculated using a tanh activation function and then multiplied by the input gate's output to produce the update to the cell state.

Update Cell State: The cell state is updated by combining the information retained from the previous cell state (modified by the forget gate) and the new information chosen by the input gate. This step creates the updated cell state 
�
�
C 
t
​
 .

Output Gate: Determines what part of the cell state will be output as the hidden state. It decides what information the LSTM will output based on the updated cell state. The hidden state is derived by applying the output gate's value (which is calculated using the current input and previous hidden state) to the updated cell state through a tanh activation function.

These components work together within the LSTM unit to control the flow of information, manage long-term dependencies, and process sequential data effectively.