Understanding Recurrent Neural Networks

Sources:

Andrej Karpathy blog: The Unreasonable Effectiveness of Recurrent Neural Networks http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Stanford cs231n (spring 2017) lecture 10: Recurrent Neural Networks https://www.youtube.com/watch?v=6niqTuYFZLQ



### Context:

Neural networks like CNNs typically require some fixed-size input and produce a fixed-size output. 

RNNs can operate on every item of a sequence; so the length of that sequence can very in size. Not just the input, the output can also vary in size.

This has several advantages: you can have various combo of input size vs output size.

You can also iterate over fixed-sized inputs on an RNN


![img](./img/rnn-in-out-size.jpg)

### How does a RNN work?

Like a static variable in a class that gets updated every time some method is called, a hidden state in a RNN cell retains some of the things it has seen, and is updated by new inputs.

The __Recurrent__ part of RNN:

Example: for every word in a sentence, run the word through the RNN function.

The function takes the first word `x`, and a hidden state `h` (initialized to 0) as inputs, and produce a new updated hidden state `h1`. `h1` is then fed into the same function with the next word `x1` to produce a new hidden state `h2`. This process is repeated until the end of the sentence.

Both `x` and `h` has their own set of weights (represented as a fully connected layer).

If you want to produce an output at each step, you can introduce another set of weights for the calculation of `y`.

That last hidden state can be considered to be the summary of the input sentence.

As a program, it looks like this:

In [None]:
class RNN:
    '''A single recurrent cell'''
    def step(self, x):
        # update the hidden state
        self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
        # compute the output vector (if needed)
        y = np.dot(self.W_hy, self.h)
        return y

Notice that there are __three sets of weights__! One (`W_hh`) for the hidden state; one (`W_xh`) for the input; and one (`W_hy`) for the output.

The hidden state `self.h` is initialized with a zero vector.

`np.tanh` function implements a non-linearity that squashes the activations to the range `[-1, 1]`.

For each step in a sequence, you'd run the line below to update the hidden state:

In [None]:
rnn.step(x) # x is an input vector