## Forward propagation implementation of SimpleRNN

In [6]:
import numpy as np
def forward(xt,h):
  for i in range(n_sequences):
    at = xt[:,i,:] @ w_x  + h @ w_h + b
    h = np.tanh(at)
  return h
  

## Experiment of forward propagation with small sequence

In [7]:
x = np.array([[[1, 2], [2, 3], [3, 4]]])/100 # (batch_size, n_sequences, n_features)
w_x = np.array([[1, 3, 5, 7], [3, 5, 7, 8]])/100 # (n_features, n_nodes)
w_h = np.array([[1, 3, 5, 7], [2, 4, 6, 8], [3, 5, 7, 8], [4, 6, 8, 10]])/100 # (n_nodes, n_nodes)
batch_size = x.shape[0] # 1
n_sequences = x.shape[1] # 3
n_features = x.shape[2] # 2
n_nodes = w_x.shape[1] # 4
h = np.zeros((batch_size, n_nodes)) # (batch_size, n_nodes)
b = np.array([1, 1, 1, 1]) # (n_nodes,)

In [24]:
forward(x,h)

array([[0.79494228, 0.81839002, 0.83939649, 0.85584174]])

## Implementation of backpropagation

\alpha$: learning rate

$\frac{\partial L}{\partial W_x}$: Loss $L$ slope for $W_x$

$\frac{\partial L}{\partial W_h}$: Loss $L$ slope for $W_h$

$\frac{\partial L}{\partial B}$: slope of loss $L$ with respect to $B$

The backpropagation formula for the slope is:

$\frac{\partial h_t}{\partial a_t} = \frac{\partial L}{\partial h_t} × (1-tanh^2(a_t))$

$\frac{\partial L}{\partial B} = \frac{\partial h_t}{\partial a_t}$

$\frac{\partial L}{\partial W_x} = x_{t}^{T}\cdot \frac{\partial h_t}{\partial a_t}$

$\frac{\partial L}{\partial W_h} = h_{t-1}^{T}\cdot \frac{\partial h_t}{\partial a_t}$

*$\frac{\partial L}{\partial h_t}$ is the sum of the state error and the output error from the previous time. This is because h is used for both the output and the state transmitted to the next layer during forward propagation.

The formula of the error sent to the previous time and layer is as follows.

$\frac{\partial L}{\partial h_{t-1}} = \frac{\partial h_t}{\partial a_t}\cdot W_{h}^{T}$

$\frac{\partial L}{\partial x_{t}} = \frac{\partial h_t}{\partial a_t}\cdot W_{x}^{T}$

In [38]:

def backward(dh):
  h = [np.random.randn(*dh.shape) for i in range(n_sequences) ]
  dB = np.random.randn(*b.shape) 
  dWx = np.random.randn(*w_x.shape)
  dWh = np.random.randn(*w_h.shape)
  # h = np.zeros((batch_size, n_nodes))
  dA = dh * (1 - np.tanh(dh)**2)
  for i in range(n_sequences, 0, -1):
    dB += np.sum(dA, axis=0)
    dWx += x[:, i-1, :].T @ dA
    dWh += h[i-1].T @ dA

  # print(dA.shape)
  # print(w_x.shape)
  dx = dA @ w_x.T

  dh = dA @ w_h.T
  return dx, dh

In [39]:
backward(forward(x,h))

(array([[0.07116008, 0.10237068]]),
 array([[0.07116008, 0.08898292, 0.10237068, 0.1246286 ]]))