## Recurrent Neural Networks (RNNs): Architecture and Forward Propagation

### 1. The Necessity of RNNs

Standard neural network architectures, such as ANNs and CNNs, often fail to perform effectively on sequential data. RNNs address limitations that arise when handling sequences:

*   **Variable Length Input:** Sequential data (like movie reviews) can be of any length, meaning the number of words varies widely. ANNs generally require a fixed input size, making them unsuitable for such variability. Attempting to use ANNs with padded or highly variable sequences drastically increases the cost of computation.
*   **Sequential Meaning (Memory):** Sequential data contains inherent meaning, where subsequent elements depend on previous ones. If an ANN processes all elements simultaneously, the hidden information within the sequence is lost.
*   **RNN Solution:** RNNs are a specialised class of neural networks equipped with a **memory feature**. This allows them to remember past inputs and effectively process sequential data.

### 2. RNN Data Input Format

When inputting data into an RNN, it must be structured based on time steps and features.

*   **Standard Input Shape:** The data is typically inserted in the form of $(\text{Time Steps}, \text{Input Features})$.
*   **Example (Sentiment Analysis):** In a movie review sentiment analysis problem, if the vocabulary size is 5, each word can be represented by a vector of 5 numbers (e.g., $1\ 0\ 0\ 0\ 0$). If a review has 3 words, the input shape would be $(3, 5)$ (3 time steps, 5 input features).
*   **Batched Input:** When sending multiple reviews (a batch) simultaneously, the shape becomes $(\text{Batch Size}, \text{Time Steps}, \text{Input Features})$.

### 3. RNN Architecture Overview

RNNs share similarities with ANNs (Feed Forward Networks), possessing an Input Layer, a Hidden Layer (or multiple), and an Output Layer. However, the key distinctions are:

#### Processing Method
Unlike ANNs, which are purely feed-forward (information moves only from input to output), RNNs process the input **one by one** based on time steps ($T=1, T=2$, etc.), rather than feeding the entire input sequence at once.

#### The Concept of State and Recurrence
The defining feature of an RNN is the **concept of state** or **feedback**.

*   The hidden layer in an RNN sends information (a feedback loop) back to itself. This internal connection is what distinguishes an RNN.
*   The architecture setup is typically:
    *   **Input Layer:** The number of nodes corresponds to the number of input features (e.g., 5 nodes for a 5-feature word vector).
    *   **Hidden Layer:** Contains a chosen number of nodes (e.g., 3 nodes).
    *   **Output Layer:** For a binary classification problem (like positive/negative sentiment), this layer uses a single node with a Sigmoid activation function.

#### Weights and Biases
Like other neural networks, RNNs are fully connected and use weight matrices (e.g., $W_I, W_H, W_O$) and biases associated with each node. The output of a hidden node goes both to the output layer and back to the hidden layer for the next time step.

### 4. Forward Propagation: Unfolding Through Time

Prediction in an RNN is achieved through a process called **unfolding through time**. This mechanism explains how the recurrent layer (which acts as a loop) processes sequential information.

#### Key Steps in Calculation

The recurrent layer (often represented by a box) receives two potential inputs, $X_t$ (current input) and $H_{t-1}$ (previous output/state):

1.  **Current Input Operation:** The current input vector ($X_t$) is processed via a weighted connection ($W_I$).
2.  **Previous State Operation:** The output from the previous time step ($H_{t-1}$) is received as an input via the recurrent weighted connection ($W_H$).
3.  **Addition:** Inside the RNN, the results of the two weighted connections are added together.
4.  **Activation:** This summed result is then passed through an activation function (typically $\text{tanh}$ by default, but others like $\text{ReLU}$ are possible). This produces the current hidden state, $H_t$.
    $$\text{Simplified: } H_t = \text{Activation Function}(X_t \cdot W_I + H_{t-1} \cdot W_H + \text{Bias})$$

<img src="https://www.appliedaicourse.com/blog/wp-content/uploads/2024/10/How-does-the-RNN-Neural-Network-Work-1024x657.webp">

#### Processing Time Steps

*   **Time Step T=1 (Initial Step):** Only the first word/input ($X_{11}$) is sent. Since there is no previous output ($H_0$), a **zero vector** (e.g., $1 \times 3$ vector of zeros) is provided as the input for the previous state to maintain consistency.
*   **Time Step T > 1:** The network uses the same weights ($W_I$ and $W_H$). It receives the current word ($X_t$) and the output ($H_{t-1}$) from the preceding time step as inputs.

#### Output Calculation ($O_t$)

The calculated hidden state ($H_t$) for the current time step is used to compute the actual output ($O_t$):

1.  $H_t$ is multiplied by the output weight matrix ($W_O$).
2.  This result is passed through an output activation function.
    *   For binary classification, this is typically **Sigmoid**.
    *   For multi-class classification, this is **Softmax**.
    *   For regression, this is **Linear**.

### 5. Key RNN Concepts

#### Parameter Sharing (Weight Sharing)
Throughout the unfolding process, the **same weight values** ($W_I, W_H, W_O$) are used repeatedly across every time step. This concept is known as parameter sharing or weight sharing.

#### Sequence Processing and Memory
RNNs effectively process the hidden information within a sequence. Because the output calculation at the final step includes components derived from the outputs of all previous steps, the network can store and utilise past information. A simple RNN is generally capable of processing a sequence and storing information from approximately the past **10 time steps**.

***