# RNNs
We want a layer which takes in not just an input vector $x$ but a(n ordered) sequence of vectors $x_1,\ldots,x_n$, where $x_i \in R^d$.

One thing we could do is stack them on top of one another, to create a big vector $\vec x \in R^{nd}$, and then apply a fully connected layer.
But we want to be smarter than that.

* An RNNcell is just a function $f(x,h)$ taking two inputs: the first is the input vector $x \in R^d$, the second is called the *hidden* vector $h \in R^k$.
* The output of $y = f(x,h) \in R^k$ has the same size as the hidden vector.
* An RNN is built by stacking RNNcells.
* Say the input is the sequence $(x_1,\ldots,x_n)$ with $x_i \in R^d$. We proceed step by step.
    * $h_0 := \vec 0$ (this can also be initialized differently)
    * $h_1 := f(x_1,h_0)$
    * $h_2 := f(x_2,h_1)$
    * $h_3 := f(x_3,h_2)$
    * ...
    * $h_n := f(x_n,h_{n-1})$
* The output of the RNN is the whole sequence $(h_1,\ldots,h_n)$.


## Let's see this in action using `pytorch`.

In [1]:
import torch
import torch.nn as nn

### We start by fixing n,k,d from above.
(since it's pytorch, we must also work in batches)

In [2]:
# n = seq_len = 15
seq_len = 15
# d = input_size = 7
input_size = 7
# k = hidden_size = 5
hidden_size = 5

# batch_len (aka batch_size) = 4
batch_len = 4

### We now declare the RNN layer.

In [3]:
rnnlayer = nn.GRU(input_size=input_size,
                 hidden_size=hidden_size,
                 num_layers=1,
                 bias=True,
                 batch_first=True,
                 dropout=0,
                 bidirectional=False)

## Let's go over the parameters:

- `input_size` and `hidden_size` we know about
- `num_layers` says how many cells should be stacked on top of one another (the default is 1) [let's ignore this]
- `bias` is the usual: adding a constant term (default = True).
- `batch_first` decides if `input.shape` is (batch_size,seq_len,input_size) or (seq_len,batch_size,input_size). Same for outputs.
    - `batch_first` is FALSE by default!
- `dropout` is dropout, default = 0 [let's ignore this]
- `bidirectional` is if we want a bidirectional RNN, default = False. [let's ignore this]

#### Let's define some bogus inputs.

In [5]:
# bogus input
batch_size = 4
x = torch.randn(batch_size,seq_len,input_size)

In [4]:
x.shape

torch.Size([4, 15, 7])

## Inputs
If we look closely, `rnnlayer` actually takes two inputs:

- the first is the squence of input vectors, whose shape is (batch_len, seq_len, input_size).
- the second is zero-th hidden vector, of shape: (1, batch_len, hidden_size).

The reason for that funny 1 is that the real shape is (num_layers*num_directions, batch_len, hidden_size)
- for us `num_layers = 1`, `num_directions = 1` (as bidirectional=False).

By default, the second input is 0.

## Let's have a look at the output.

In [14]:
# output is a tuple
H, h_n = rnnlayer(x)

- `H` is the collection of all hidden vectors, with shape (batch_len, seq_len, hidden_size).

(the actual shape is (batch_len, seq_len, num_directions*hidden_size), but num_directions=1 for us)

In [16]:
H.shape

torch.Size([4, 15, 5])

- `h_n` is the output of the last cell
- `h_n` has shape (1,batch_len, hidden_size)

The true shape is (num_layers*num_directions, batch_len, hidden_size), but num_layers*num_directions = 1 for us.
- if you have more than layer, this allows you to separate the output from each, which may be useful

In [17]:
h_n.shape

torch.Size([1, 4, 5])