<a href="https://colab.research.google.com/github/jordancoil/colab_notebooks/blob/main/RNN_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np

The basic idea of an RNN is that the activations of the current hidden layer depend on:
- the current input
- the activations of the previous hidden layer

Things to keep in mind:
- for the first hidden layer, what are the previous activations?
- ...

History
- Simple RNNs originally from Elman, 1990 (Elman Networks)

In [None]:
class RNN():
  def __init__(self, input_size, seq_len, hidden_size, output_size, activation_fn):
    # initialize weights
    # x_t to hidden_t
    self.wt = np.zeros((seq_len, input_size, hidden_size))
    self.bwt = np.zeros((seq_len, 1, hidden_size))

    # hidden_t-1 to hidden_t
    self.ut = np.zeros((seq_len, hidden_size, hidden_size))
    self.but = np.zeros((seq_len, 1, hidden_size))

    # hidden_t to y_t
    self.vt = np.zeros((seq_len, hidden_size, output_size))
    self.bvt = np.zeros((seq_len, 1, output_size))

    # initialize containers for activations, etc.
    # self.input_activations = np.zeros((seq_len, hidden_size))
    self.hidden_activations = np.zeros((seq_len, hidden_size))
    self.outputs = np.zeros((seq_len, output_size))
    # ...?

    self.activation_fn = activation_fn
    return
  
  def forward(self, x):
    for t in x:
      # timestep 't' in sequence
      hidden_input = self.wt[t] * x[t] + self.bwt[t]
      if t > 1:
        hidden_input += self.ut[t] * self.hidden_activations[t-1] + self.but[t]

      self.hidden_activations[t] = self.activation_fn(hidden_input)

      self.outputs[t] = self.activation_fn(self.hidden_activations[t] * self.vt[t] + self.vbt[t])
    return
  
  def backward(self):
    # TODO
    return