<img src="../../../figs/holberton_logo.png" alt="logo" width="500"/>

# Recurrent Neural Networks

## 0. RNN Cell

<img src="figs/RNN.png" alt="logo" width="350"/>

### Overall idea

Our goal is to implement `RNNCell` class, representing a single cell in a simple Recurrent Neural Network (RNN). The class is designed to handle the forward propagation of one time step, managing the transition of hidden states and the generation of outputs based on input data.

### Key Steps

#### 1. Initialization of Parameters (Weights and Biases)

- Initialize weights for hidden state and input (`Wh`).
- Initialize weights for output (`Wy`).
- Initialize biases for hidden state (`bh`) and output (`by`).

#### 2. Forward Propagation:

##### Concatenate Inputs:
- Combine previous hidden state (`h_prev`) and current input (`x_t`).

##### Compute Hidden State:
- Apply `tanh` activation to the linear combination of concatenated inputs and weights (`Wh`), plus bias (`bh`).

<img src="figs/tanh.png" alt="logo" width="300"/>


##### Generate Output:
- Apply softmax activation to the linear combination of the new hidden state and output weights (`Wy`), plus bias (`by`).

<img src="figs/softmax.png" alt="logo" width="150"/>


#### 2. Return Values:
- Provide the next hidden state (`h_next`) and the current output (`output_t`).







In [1]:
#!/usr/bin/env python3
"""RNN Cell Module"""
import numpy as np


class RNNCell:
    """
        Represents a cell of a simple RNN:
    """

    def __init__(self, i, h, o):
        """
            Key concept: an RNN cell uses both the current input 
            and the previous hidden state to determine the next hidden state
            
            The concatenation of the input data and hidden state allows the cell 
            to process the combined information together
        """
        self.Wh = np.random.normal(size=(i+h, h)) # weight matrix for the concatenated input and hidden state
        self.Wy = np.random.normal(size=(h, o))   # weight matrix for the output
        self.bh = np.zeros((1, h))                # bias for the hidden state, initialized to zeros
        self.by = np.zeros((1, o))                # bias for the output, initialized to zeros

    def forward(self, h_prev, x_t):
        """
            Performs forward propagation for one time step
        """

        def softmax(x):
            """computes the softmax activation, used to convert the output logits into probabilities"""
            return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)

        # Concatenation of Hidden State and Input
        h_x = np.concatenate((h_prev, x_t), axis=1)

        # The next hidden state is calculated using the tanh activation function 
        # applied to the linear combination of h_x and the weights Wh, plus the bias bh.
        h_next = np.tanh(np.dot(h_x, self.Wh) + self.bh)
        
        # The output is computed by applying the softmax function to the 
        # linear combination of the new hidden state h_next and the weights Wy, plus the bias by
        output_t = softmax(np.dot(h_next, self.Wy) + self.by)

        # return the next hidden state and the output of the cell for the current time step
        return h_next, output_t

### 0-Main

In [2]:
import numpy as np

np.random.seed(0)
rnn_cell = RNNCell(10, 15, 5)
print("Wh:", rnn_cell.Wh)
print("Wy:", rnn_cell.Wy)
print("bh:", rnn_cell.bh)
print("by:", rnn_cell.by)
rnn_cell.bh = np.random.randn(1, 15)
rnn_cell.by = np.random.randn(1, 5)
h_prev = np.random.randn(8, 15)
x_t = np.random.randn(8, 10)
h, y = rnn_cell.forward(h_prev, x_t)
print(h.shape)
print(h)
print(y.shape)
print(y)

Wh: [[ 1.76405235  0.40015721  0.97873798  2.2408932   1.86755799 -0.97727788
   0.95008842 -0.15135721 -0.10321885  0.4105985   0.14404357  1.45427351
   0.76103773  0.12167502  0.44386323]
 [ 0.33367433  1.49407907 -0.20515826  0.3130677  -0.85409574 -2.55298982
   0.6536186   0.8644362  -0.74216502  2.26975462 -1.45436567  0.04575852
  -0.18718385  1.53277921  1.46935877]
 [ 0.15494743  0.37816252 -0.88778575 -1.98079647 -0.34791215  0.15634897
   1.23029068  1.20237985 -0.38732682 -0.30230275 -1.04855297 -1.42001794
  -1.70627019  1.9507754  -0.50965218]
 [-0.4380743  -1.25279536  0.77749036 -1.61389785 -0.21274028 -0.89546656
   0.3869025  -0.51080514 -1.18063218 -0.02818223  0.42833187  0.06651722
   0.3024719  -0.63432209 -0.36274117]
 [-0.67246045 -0.35955316 -0.81314628 -1.7262826   0.17742614 -0.40178094
  -1.63019835  0.46278226 -0.90729836  0.0519454   0.72909056  0.12898291
   1.13940068 -1.23482582  0.40234164]
 [-0.68481009 -0.87079715 -0.57884966 -0.31155253  0.05616534

## 1. RNN

The `rnn` function performs **forward propagation for a simple Recurrent Neural Network (RNN) over multiple time steps** using an instance of `RNNCell`. It processes a sequence of inputs, updating hidden states and generating outputs for each time step.

### Key Steps

#### 1. Initialize Hidden States and Outputs
- Set up arrays to store all hidden states and outputs

#### 2. Iterate Over Time Steps
- For each time step, update the hidden state and generate the output using the `RNNCell`

#### 3. Return results
- Return arrays containing all hidden states and outputs

In [3]:
#!/usr/bin/env python3
"""RNN Module"""
import numpy as np


def rnn(rnn_cell, X, h_0):
    """
        Performs forward propagation for a simple RNN:
    """

    # Array to store all hidden states
    H = np.zeros((X.shape[0] + 1, h_0.shape[0], h_0.shape[1]))        # include initial hidden state
    H[0] = h_0                                                        # Set the first hidden state to h_0

    # Array to store all outputs
    Y = np.zeros((X.shape[0], X.shape[1], rnn_cell.by.shape[1]))      # to match the time steps, batch size, and output dimensionality 

    # Loop over each time step (i) and corresponding input (x_t)
    for i, x_t in enumerate(X):
        H[i + 1], Y[i] = rnn_cell.forward(H[i], x_t)

    # Return all hidden states and all outputs
    return H, Y

## 1. Main

In [4]:
np.random.seed(1)
rnn_cell = RNNCell(10, 15, 5)
rnn_cell.bh = np.random.randn(1, 15)
rnn_cell.by = np.random.randn(1, 5)
X = np.random.randn(6, 8, 10)
h_0 = np.zeros((8, 15))
H, Y = rnn(rnn_cell, X, h_0)
print(H.shape)
print(H)
print(Y.shape)
print(Y)

(7, 8, 15)
[[[ 0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.        ]
  [ 0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.        ]
  [ 0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.        ]
  [ 0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.        ]
  [ 0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.        ]
  [ 0.          0.          0.          0.          0.
    0.          0.          0.          0.          0.
    0.   

## 2. GRU Cell

<img src="figs/gru.png" alt="logo" width="400"/>

The `GRUCell` class represents a **single cell in a Gated Recurrent Unit (GRU) network**. A GRU is an advanced type of RNN that uses gating mechanisms (update and reset gates) to better capture dependencies in the data, mitigating issues like the vanishing gradient problem.

### Key Steps

#### Initialization:
- Initialize weights and biases for update gate, reset gate, intermediate hidden state, and output.

#### Forward Propagation:
- Compute the reset gate (`r_t`).
- Compute the update gate (`z_t`).
- Compute the candidate hidden state (`h_tilde`).
- Compute the next hidden state (`h_next`).
- Compute the output (`output_t`).

### Step by Step Explanation

Each gate in the GRU cell serves a specific purpose, controlling the flow of information through the network. The **reset gate determines how much of the previous hidden state should be forgotten, the update gate controls the balance between retaining the old hidden state and incorporating the new candidate hidden state, and the next hidden state is a combination of these elements**. The output is then generated based on the new hidden state, typically using a softmax function for classification tasks.

#### Compute Reset Gate

$$
 r_t = \frac{1}{1 + e^{\left(-(h_x \cdot W_r + b_r)\right)}} 
$$

##### Why

- **Purpose**: The reset gate (`r_t`) controls how much of the previous hidden state (`h_prev`) should be forgotten. It helps the network decide whether to ignore parts of the hidden state that are not relevant to predicting the next state.
- **Mechanism**: If `r_t` is close to `0`, the reset gate effectively "resets" the hidden state, forcing the network to focus on the new input (`x_t`). If `r_t` is close to `1`, it allows the network to keep the information from the previous hidden state.

##### Why

- **Concatenation**: Combine the previous hidden state (`h_prev`) and the current input (`x_t`) into a single vector (`h_x`).
- **Linear Transformation**: Apply a linear transformation using the reset gate weights (`Wr`) and bias (`br`). This transformation projects the concatenated input-hidden state vector into the appropriate dimensionality for the reset gate.
- **Sigmoid Activation**: The sigmoid function ensures that the output values of `r_t` are between `0` and `1`, which is necessary for the gate mechanism. 

#### Compute Update Gate

$$
z_t = \frac{1}{1 + e^{\left(-(h_x \cdot W_z + b_z)\right)}}
$$


##### Why
- **Purpose**: The update gate (`z_t`) determines how much of the new hidden state ($h_{\text{tilde}}$) will be used to update the current hidden state ($h_{\text{next}}$). 
- It decides the extent to which the network should retain the existing information versus incorporating the new information.

- **Mechanism**: If `z_t` is close to `1`, the network updates most of the hidden state with the new candidate hidden state ($h_{\text{tilde}}$). If `z_t` is close to `0`, the network retains most of the previous hidden state.

##### How
- **Concatenation**: Combine `h_prev` and `x_t` into `h_x`.
- **Linear Transformation**: Apply a linear transformation using the update gate weights (`Wz`) and bias (`bz`). This transformation projects the concatenated input-hidden state vector into the appropriate dimensionality for the update gate.
- **Sigmoid Activation**: The sigmoid function ensures that the output values of `z_t` are between `0` and `1`, which is necessary for the gate mechanism.


#### Compute Candidate Hidden State

$$
rh_x = \text{concatenate}((r_t * h_{\text{prev}}, x_t), \text{axis}=1)
$$

and also

$$
h_{\text{tilde}} = \tanh(rh_x \cdot W_h + b_h)
$$


##### Why

- **Purpose**: The candidate hidden state ($h_{\text{tilde}}$) represents the potential new hidden state, taking into account the reset gate's influence. It incorporates the new input and the reset-modified previous hidden state.

- **Mechanism**: The reset gate determines which parts of the previous hidden state should be considered (or ignored) when computing the candidate hidden state.


##### How
- **Reset Gate Application**: Modify the previous hidden state ($h_prev$) by element-wise multiplying it with the reset gate ($r_t$). This results in $r_t \cdot h_{\text{prev}}$, which selectively forgets parts of the hidden state.

- **Concatenation**: Combine $r_t * h_{\text{prev}}$ and $x_t$ into a single vector ($rh_x$).
- **Linear Transformation**: Apply a linear transformation using the weights for the candidate hidden state ($Wh$) and bias ($bh$). This transformation projects the concatenated reset-modified hidden state and input vector into the appropriate dimensionality for the candidate hidden state.
- **Tanh Activation**: The tanh activation function is used to ensure that the values of $h_{\text{tilde}}$ are between `-1` and `1`, providing a bounded and smooth non-linearity. 


#### Compute Next Hidden State

$$
h_{\text{next}} = (1 - z_t) * h_{\text{prev}} + z_t * h_{\text{tilde}}
$$

##### Why
- **Purpose**: The next hidden state ($h_{\text{next}}$) combines the previous hidden state ($h_{\text{prev}}$) and the candidate hidden state ($h_{\text{tilde}}$) based on the update gate ($z_t$). This helps the network decide how much of the new information should be mixed with the existing information.
- **Mechanism**: The update gate determines the weight of the contribution from the candidate hidden state versus the previous hidden state.

##### How
Update Gate Application: Use the update gate ($z_t$) to create a weighted combination of $h_{\text{prev}} and h_{\text{tilde}}$.

##### Linear Combination

 Retains part of the previous hidden state.
$$
(1 - z_t) * h_{\text{prev}}
$$
and incorporates part of the candidate hidden state.
$$
z_t * h_{\text{tilde}}
$$

#### Compute Output

$$
\text{output}_t = \text{softmax}(h_{\text{next}} \cdot W_y + b_y)
$$


##### Why

- **Purpose**: The output (`output_t`) is the final output of the GRU cell for the current time step. It uses the next hidden state $(h_{\text{next}})$ to generate a meaningful representation for the output.
- **Mechanism**: The softmax function is typically used for multi-class classification tasks to convert the logits into probabilities.

In [None]:
#!/usr/bin/env python3
"""GRU Cell Module"""
import numpy as np


class GRUCell:
    """
        Represents a gated recurrent unit:
    """

    def __init__(self, i, h, o):
        """
            Weight and Bias Initialization
            
            Weights for update gate, reset gate, and intermediate hidden state respectively (Wz, Wr, Wh)
            Weights for output (Wy)
            
        """
        self.Wz = np.random.normal(size=(i+h, h))
        self.Wr = np.random.normal(size=(i+h, h))
        self.Wh = np.random.normal(size=(i+h, h))
        self.Wy = np.random.normal(size=(h, o))
        
        """
            Biases for update gate, reset gate, and intermediate hidden state
            Bias for output
        """
        self.bz = np.zeros((1, h))
        self.br = np.zeros((1, h))
        self.bh = np.zeros((1, h))
        self.by = np.zeros((1, o))

    def forward(self, h_prev, x_t):
        """
            Performs forward propagation for one time step
        """

        def softmax(x):
            """Compute softmax activation function"""
            return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)

        # Concatenate Input and Hidden State
        h_x = np.concatenate((h_prev, x_t), axis=1)

        # Compute Reset Gate
        r_t = 1 / (1 + np.exp(-(np.dot(h_x, self.Wr) + self.br)))

        # Compute Update Gate 
        z_t = 1 / (1 + np.exp(-(np.dot(h_x, self.Wz) + self.bz)))

        # Compute Candidate Hidden State
        rh_x = np.concatenate((r_t * h_prev, x_t), axis=1)     # Combine reset-modified hidden state and input
        h_tilde = np.tanh(np.dot(rh_x, self.Wh) + self.bh)     # Apply the tanh activation to compute the candidate hidden state
        
        # Compute Next Hidden State
        h_next = (1 - z_t) * h_prev + z_t * h_tilde

        # Compute Output
        output_t = softmax(np.dot(h_next, self.Wy) + self.by)

        return h_next, output_t

## 2. Main

In [None]:
np.random.seed(2)
gru_cell = GRUCell(10, 15, 5)
print("Wz:", gru_cell.Wz)
print("Wr:", gru_cell.Wr)
print("Wh:", gru_cell.Wh)
print("Wy:", gru_cell.Wy)
print("bz:", gru_cell.bz)
print("br:", gru_cell.br)
print("bh:", gru_cell.bh)
print("by:", gru_cell.by)
gru_cell.bz = np.random.randn(1, 15)
gru_cell.br = np.random.randn(1, 15)
gru_cell.bh = np.random.randn(1, 15)
gru_cell.by = np.random.randn(1, 5)
h_prev = np.random.randn(8, 15)
x_t = np.random.randn(8, 10)
h, y = gru_cell.forward(h_prev, x_t)
print(h.shape)
print(h)
print(y.shape)
print(y)

## 3. LSTM Cell

LSTM cell uses various gates (forget gate, input gate, output gate) and memory cells (cell state) to control the flow of information and selectively retain or discard information from previous time steps. This allows it to capture long-term dependencies and make predictions based on the current input and the past context.

<img src="figs/lstm.png" alt="logo" width="400"/>

Here are the key steps involved in implementing an LSTM cell:


1. **Input Transformation**: Concatenate the previous hidden state `h_prev` and the current input `x_t` to form an input vector `h_x`.


2. **Forget Gate**: Compute the activation of the forget gate `ft` by applying the sigmoid function to a linear transformation of `h_x` using weights `Wf` and biases `bf`.


3. **Input/Update Gate**: Compute the activation of the input/update gate it by applying the sigmoid function to a linear transformation of `h_x` using weights `Wu` and biases `bu`.


4. **Candidate Value**: Compute the candidate value `cct` by applying the hyperbolic tangent (tanh) function to a linear transformation of `h_x` using weights `Wc` and biases `bc`.


5. **Cell State Update**: Update the cell state `c_next` by combining the previous cell state `c_prev` with the candidate value cct and the forget and input gate activations. Multiply `c_prev` element-wise with ft and add the element-wise multiplication of `it` and `cct`.

In [None]:
#!/usr/bin/env python3
"""LSTM cell for a RRN"""

import numpy as np


class LSTMCell:
    """ LSTMCell unit """

    def __init__(self, i, h, o):
        """
        Initializer constructor
        Args:
            i: the dimensionality of the data
            h: the dimensionality of the hidden state
            o: he dimensionality of the outputs
        """

        # weight for the cell
        self.Wf = np.random.normal(size=(i + h, h))
        self.Wu = np.random.normal(size=(i + h, h))
        self.Wc = np.random.normal(size=(i + h, h))
        self.Wo = np.random.normal(size=(i + h, h))
        self.Wy = np.random.normal(size=(h, o))

        # Bias of the cell
        self.bf = np.zeros((1, h))
        self.bu = np.zeros((1, h))
        self.bc = np.zeros((1, h))
        self.bo = np.zeros((1, h))
        self.by = np.zeros((1, o))

    def softmax(self, x):
        """softmax function"""
        return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)

    def sigmoid(self, x):
        """Sigmoid function"""
        return 1 / (1 + np.exp(-x))

    def forward(self, h_prev, c_prev, x_t):
        """
        forward propagation for one time step in a LSTM
        Args:
            h_prev: numpy.ndarray of shape (m, h) containing the previous
                    m: hidden state
            x_t: numpy.ndarray of shape (m, i) that contains the data
                 input for the cell
        Returns: h_next, y
                 h_next: the next hidden state
                 y: the output of the cell
        """
        # previous hidden cell state
        h_x = np.concatenate((h_prev.T, x_t.T), axis=0)

        # forget gate activation vector
        ft = self.sigmoid((h_x.T @ self.Wf) + self.bf)

        # input/update gate activation vector
        it = self.sigmoid((h_x.T @ self.Wu) + self.bu)

        # candidate value
        cct = np.tanh((h_x.T @ self.Wc) + self.bc)
        c_next = ft * c_prev + it * cct

        # output gate
        ot = self.sigmoid((h_x.T @ self.Wo) + self.bo)

        # compute hidden state
        h_next = ot * np.tanh(c_next)

        # final output of the cell
        y = self.softmax((h_next @ self.Wy) + self.by)

        return h_next, c_next, y

## 3. Main

In [None]:
np.random.seed(3)
lstm_cell = LSTMCell(10, 15, 5)
print("Wf:", lstm_cell.Wf)
print("Wu:", lstm_cell.Wu)
print("Wc:", lstm_cell.Wc)
print("Wo:", lstm_cell.Wo)
print("Wy:", lstm_cell.Wy)
print("bf:", lstm_cell.bf)
print("bu:", lstm_cell.bu)
print("bc:", lstm_cell.bc)
print("bo:", lstm_cell.bo)
print("by:", lstm_cell.by)
lstm_cell.bf = np.random.randn(1, 15)
lstm_cell.bu = np.random.randn(1, 15)
lstm_cell.bc = np.random.randn(1, 15)
lstm_cell.bo = np.random.randn(1, 15)
lstm_cell.by = np.random.randn(1, 5)
h_prev = np.random.randn(8, 15)
c_prev = np.random.randn(8, 15)
x_t = np.random.randn(8, 10)
h, c, y = lstm_cell.forward(h_prev, c_prev, x_t)
print(h.shape)
print(h)
print(c.shape)
print(c)
print(y.shape)
print(y)

## 4. Deep RNN

The `deep_rnn` function performs forward propagation for a deep RNN (Recurrent Neural Network) composed of **multiple layers of `RNNCell`** instances.

### Initialization

- `H`: Array to store all hidden states across layers and time steps, initialized with zeros.
- `Y`: Array to store all outputs across time steps, initialized with zeros.

### Forward propagation loop

- Iterate over each time step and each layer:
    - `for i, x_t in enumerate(X)`: Loop over each time step `i` and corresponding input `x_t`.
    - `for l, rnn_cell in enumerate(rnn_cells)`: Loop over each layer `l` and its corresponding RNNCell.
        - Perform forward propagation using the current hidden state `H[i, l]` and input `x_t` for the current layer.
        - Update `H[i + 1, l]` with the new hidden state computed by `rnn_cell.forward`.
        - Update `Y[i]` with the output `Y` computed by `rnn_cell.forward`.

In [None]:
#!/usr/bin/env python3
"""Deep RNN Module"""
import numpy as np


def deep_rnn(rnn_cells, X, h_0):
    """ Performs forward propagation for a simple RNN """

    H = np.zeros((X.shape[0] + 1, h_0.shape[0],
                  h_0.shape[1], h_0.shape[2]))
    H[0] = h_0

    Y = np.zeros((X.shape[0], X.shape[1], rnn_cells[-1].by.shape[1]))

    for i, x_t in enumerate(X):
        for l, rnn_cell in enumerate(rnn_cells):
            H[i + 1, l], Y[i] = rnn_cell.forward(H[i, l], x_t)
            x_t = H[i + 1, l]

    return H, Y


## 4 - Main

In [None]:
np.random.seed(1)
cell1 = RNNCell(10, 15, 1)
cell2 = RNNCell(15, 15, 1)
cell3 = RNNCell(15, 15, 5)
rnn_cells = [cell1, cell2, cell3]
for rnn_cell in rnn_cells:
    rnn_cell.bh = np.random.randn(1, 15)
cell3.by = np.random.randn(1, 5)
X = np.random.randn(6, 8, 10)
H_0 = np.zeros((3, 8, 15))
H, Y = deep_rnn(rnn_cells, X, H_0)
print(H.shape)
print(H)
print(Y.shape)
print(Y)

## 5. Bidirectional Cell Forward

### Forward and Backward Processing:

A bidirectional RNN consists of two RNNs:
- Forward RNN: Processes the sequence from start to end.
- Backward RNN: Processes the sequence from end to start.
- Each RNN computes hidden states based on the information it sees in its respective direction.

### Concatenated Outputs:

- At each time step, the hidden state of the bidirectional RNN is the concatenation of the hidden states from both the forward and backward RNNs.
- This concatenated hidden state encodes information from both directions of the sequence.

<img src="figs/birnn.png" alt="logo" width="700"/>

### Advantages

- **Contextual Understanding**: captures dependencies in both past and future contexts, improving the understanding of each sequence element.

- **Improved Prediction Accuracy**: utilizes information from both directions, leading to more accurate predictions, especially in tasks requiring comprehensive context.

- **Rich Feature Representation**: concatenates hidden states to provide richer feature representations, improving learning and extraction of nuanced sequential patterns.


In [None]:
#!/usr/bin/env python3
"""Bidirectional Cell Forward Module"""
import numpy as np


class BidirectionalCell:
    """
        Represents a bidirectional cell of a RNN:
    """

    def __init__(self, i, h, o):

        # Initialize weights for forward and backward RNNs
        self.Whf = np.random.normal(size=(i+h, h))      #  Weight matrix for forward RNN
        self.Whb = np.random.normal(size=(i+h, h))      #  Weight matrix for backward RNN
        self.Wy = np.random.normal(size=(h*2, o))       # Weight matrix for output layer (concatenated hidden states)

        # Initialize biases for forward and backward RNNs
        self.bhf = np.zeros((1, h))                     # forward RNN
        self.bhb = np.zeros((1, h))                     # backward RNN
        self.by = np.zeros((1, o))                      # output

    def forward(self, h_prev, x_t):
        """ Performs forward propagation for one time step """

        # Concatenate previous hidden state and current input
        h_x = np.concatenate((h_prev, x_t), axis=1)

        # Compute next hidden state using tanh activation function
        h_next = np.tanh(np.dot(h_x, self.Whf) + self.bhf)

        return h_next


## 5. Main

In [None]:
np.random.seed(5)
bi_cell =  BidirectionalCell(10, 15, 5)
print("Whf:", bi_cell.Whf)
print("Whb:", bi_cell.Whb)
print("Wy:", bi_cell.Wy)
print("bhf:", bi_cell.bhf)
print("bhb:", bi_cell.bhb)
print("by:", bi_cell.by)
bi_cell.bhf = np.random.randn(1, 15)
h_prev = np.random.randn(8, 15)
x_t = np.random.randn(8, 10)
h = bi_cell.forward(h_prev, x_t)
print(h.shape)
print(h)

## 6. Bidirectional Cell Backward

The backward method in the BidirectionalCell class **computes the previous hidden state for the backward RNN**. 

It **concatenates the current input and the next hidden state**, then **applies a tanh activation function** to the weighted sum of this concatenation and biases. 

This process **generates the previous hidden state (`h_prev`)**, essential for capturing dependencies in the reverse direction of sequential data. 


In [None]:
#!/usr/bin/env python3
"""Bidirectional Cell Forward Module"""
import numpy as np


class BidirectionalCell:
    """
        Represents a bidirectional cell of a RNN:
    """

    def __init__(self, i, h, o):

        # Initialize weights for forward and backward RNNs
        self.Whf = np.random.normal(size=(i+h, h))      #  Weight matrix for forward RNN
        self.Whb = np.random.normal(size=(i+h, h))      #  Weight matrix for backward RNN
        self.Wy = np.random.normal(size=(h*2, o))       # Weight matrix for output layer (concatenated hidden states)

        # Initialize biases for forward and backward RNNs
        self.bhf = np.zeros((1, h))                     # forward RNN
        self.bhb = np.zeros((1, h))                     # backward RNN
        self.by = np.zeros((1, o))                      # output

    def forward(self, h_prev, x_t):
        """ Performs forward propagation for one time step """

        # Concatenate previous hidden state and current input
        h_x = np.concatenate((h_prev, x_t), axis=1)

        # Compute next hidden state using tanh activation function
        h_next = np.tanh(np.dot(h_x, self.Whf) + self.bhf)

        return h_next


    def backward(self, h_next, x_t):
        """Calculates the hidden state in the backward direction """

        # Concatenate next hidden state and current input
        h_x = np.concatenate((h_next, x_t), axis=1)

        # Compute previous hidden state using tanh activation function
        h_prev = np.tanh(np.dot(h_x, self.Whb) + self.bhb)

        return h_prev


## 6 - Main

In [None]:
np.random.seed(6)
bi_cell =  BidirectionalCell(10, 15, 5)
bi_cell.bhb = np.random.randn(1, 15)
h_next = np.random.randn(8, 15)
x_t = np.random.randn(8, 10)
h = bi_cell.backward(h_next, x_t)
print(h.shape)
print(h)

## 7. Bidirectional Output

The goal is to calculate predictions for a bidirectional RNN across multiple time steps and layers. 

The method initializes an output matrix `Y` to store probabilities for each time step and layer, based on the hidden states `H`. 

For each time step, it computes raw outputs by applying weights Wy to each hidden state h and adding biases by. These raw outputs are then converted into probabilities using the softmax function, ensuring they represent the likelihood of each class for the given input sequence. This method effectively transforms sequential hidden states into actionable predictions, essential for tasks like sequence classification or prediction in natural language processing and time-series analysis.

In [None]:
#!/usr/bin/env python3
"""Bidirectional Cell Forward Module"""
import numpy as np


class BidirectionalCell:
    """
        Represents a bidirectional cell of a RNN:
    """

    def __init__(self, i, h, o):

        # Initialize weights for forward and backward RNNs
        self.Whf = np.random.normal(size=(i+h, h))      #  Weight matrix for forward RNN
        self.Whb = np.random.normal(size=(i+h, h))      #  Weight matrix for backward RNN
        self.Wy = np.random.normal(size=(h*2, o))       # Weight matrix for output layer (concatenated hidden states)

        # Initialize biases for forward and backward RNNs
        self.bhf = np.zeros((1, h))                     # forward RNN
        self.bhb = np.zeros((1, h))                     # backward RNN
        self.by = np.zeros((1, o))                      # output

    def forward(self, h_prev, x_t):
        """ Performs forward propagation for one time step """

        # Concatenate previous hidden state and current input
        h_x = np.concatenate((h_prev, x_t), axis=1)

        # Compute next hidden state using tanh activation function
        h_next = np.tanh(np.dot(h_x, self.Whf) + self.bhf)

        return h_next


    def backward(self, h_next, x_t):
        """Calculates the hidden state in the backward direction """

        # Concatenate next hidden state and current input
        h_x = np.concatenate((h_next, x_t), axis=1)

        # Compute previous hidden state using tanh activation function
        h_prev = np.tanh(np.dot(h_x, self.Whb) + self.bhb)

        return h_prev

    def output(self, H):
        """Calculates all outputs for the RNN """
        
        # Initialize Y with zeros, where H.shape[0] is the number of time steps, H.shape[1] is the number of layers,
        # and self.Wy.shape[1] is the output dimensionality.
        Y = np.zeros((H.shape[0], H.shape[1], self.Wy.shape[1]))

        # Iterate over each time step i and corresponding hidden state h in H.
        for i, h in enumerate(H):
            # Calculate the raw output scores for each layer using dot product of h and weight matrix Wy, and add bias by.
            outputs = np.dot(h, self.Wy) + self.by
            
            """
                  Compute the softmax function to convert raw outputs into probabilities.
                  np.exp(outputs) calculates the exponentials of the outputs,
                  np.sum(np.exp(outputs), axis=1, keepdims=True) sums the exponentials across the second axis (classes),
                  and keeps the resulting array in the same shape as np.exp(outputs).
            """
            Y[i] = np.exp(outputs) / np.sum(np.exp(outputs), axis=1,
                                            keepdims=True)

        return Y   # calculated probabilities for each time step and layer


## 7 - Main

In [None]:
np.random.seed(7)
bi_cell =  BidirectionalCell(10, 15, 5)
bi_cell.by = np.random.randn(1, 5)
H = np.random.randn(6, 8, 30)
Y = bi_cell.output(H)
print(Y.shape)
print(Y)

## 8. Bidirectional RNN

In [None]:
#!/usr/bin/env python3
"""Bidirectional RNN Module"""
import numpy as np


def bi_rnn(bi_cell, X, h_0, h_t):
    """Performs forward propagation for a simple RNN """

    Hf = np.zeros((X.shape[0], h_0.shape[0], h_0.shape[1]))
    Hb = np.zeros(Hf.shape)
    Hf[0] = bi_cell.forward(h_0, X[0])
    Hb[-1] = bi_cell.backward(h_t, X[-1])

    for i in range(1, len(X)):
        x_tf = X[i]
        x_tb = X[-(i + 1)]

        Hf[i] = bi_cell.forward(Hf[i - 1], x_tf)
        Hb[-(i + 1)] = bi_cell.backward(Hb[-i], x_tb)

    H = np.concatenate((Hf, Hb), axis=-1)

    return H, bi_cell.output(H)

In [None]:
np.random.seed(8)
bi_cell =  BidirectionalCell(10, 15, 5)
X = np.random.randn(6, 8, 10)
h_0 = np.zeros((8, 15))
h_T = np.zeros((8, 15))
H, Y = bi_rnn(bi_cell, X, h_0, h_T)
print(H.shape)
print(H)
print(Y.shape)
print(Y)

### Happy Coding