# Chapter 5: Recurrent Neural Network

## 5.3 Implementing RNNs

### 5.3.1 Implementing RNN Layer

RNN layer is shown as below.
RNN receives input $x_{t}$ and previous hidden state $h_{t-1}$ and returns the next hidden state $h_{t}$ as shown below.
![5.3.1](./fig/5_3_1.drawio.svg)

Forward propagation of RNN layer is given by the following formula.
$$
    h_t = tanh(h_{t-1} W_h + x_{t} W_{x} + b)
$$

Matrix size of $h_{t-1}$ and $W_{h}$ are `$(N, H)$` and `(H, H)` respectively.
Also $x_{t}$ and $W_{x}$ are `(N, D)` and `(D, H)`, then $h_t$ is `(N, H)`.

Note that, N is number of samples, H is hidden size, D is input size.

backward propagation of RNN layer is shown as below.
![5.3.1_2](./fig/5_3_1_2.drawio.svg)

In [1]:
import numpy as np

class RNN:
    def __init__(self, Wx, Wh, b):
        self.params = [Wx, Wh, b]
        self.grads = [np.zeros_like(Wx), np.zeros_like(Wh), np.zeros_like(b)]
        self.cache = None

    def forward(self, x, h_prev):
        Wx, Wh, b = self.params
        t = np.matmul(h_prev, Wh) + np.matmul(x, Wx) + b
        h_next = np.tanh(t)

        self.cache = (x, h_prev, h_next)
        return h_next

    def backward(self, dh_next):
        # N: batch_size
        # D: input_size
        # H: hidden_size
        # Wx = (D, H), Wh = (H, H), b = (H,)
        # x = (N, D), h_prev = (N, H), h_next = (N, H)
        Wx, Wh, b = self.params
        x, h_prev, h_next = self.cache

        dt = dh_next * (1 - h_next ** 2)
        db = np.sum(dt, axis=0)
        dWh = np.dot(h_prev.T, dt)
        dh_prev = np.dot(dt, Wh.T)
        dWx = np.dot(x.T, dt)
        dx = np.dot(dt, Wx.T)
        
        self.grads[0][...] = dWx
        self.grads[1][...] = dWh
        self.grads[2][...] = db

        return dx, dh_prev

## 5.3.2 Implementing Time RNN

Time RNN is a stack of RNN layers.
