In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

# RNN as a layer

During one time step $\tt$, the RNN
- Takes element $\tt$ of $\x$ as input: $\x_\tp$
- Computes a new latent state $\h_\tp$
- Optionally outputs element $\tt$ of the output: $\y_\tp$

$$
\h_\tp, \y_\tp = f(\x_\tp;  \h_{(t-1)})
$$

$\h_\tp$ is used in the next time step of the RNN but may not be externally visible.

Let's describe the inputs/outputs of an RNN layer from the perspective of what is and is not visible.


If we draw a box around the unrolled RNN, we can see the "API":

<table>
    <tr>
        <th><center><strong>RNN many to one API</strong></center></th>
    </tr>
    <tr>
        <td><img src="images/RNN_layer_API_many_to_one.jpg"></td>
    </tr>
</table>


- The input sequence $\x$ of length $T$ is depicted as coming from below
- The output of the layer is $\y_{(T)}$ 
- Everything inside the box is *not visible*
- Until the entire sequence $\x$ has been processed

- Output $\y$ is available to be fed to another layer (i.e., not the same RNN layer)
- Latent state $\h$ is retained by the RNN layer

## Many to one

The above API was for an RNN layer computing a many to one function
- Sequence input, single vector as output

A many to one mapping is particularly useful
- If one considers $\y_{(T)}$ a fixed length summary of variable length sequence $[\x_{(1)}, \ldots, x_{(T)} ]$
- Which is amenable for processing by a layer requiring a fixed length input



## Many to many

We can show the API for an RNN layer computing a many to many function
- Sequence input, sequence of vector output

Essentially, the "internal" (inside the box) workings are exposed to the user, rather than hidden.

<table>
    <tr>
        <th><center>RNN many to many API</center></th>
    </tr>
    <tr>
        <td><img src="images/RNN_layer_API_many_to_many.jpg"></td>
    </tr>
</table>


In order to get Keras to implement the many to many API, 
optional arguments are used when constructing the layer
- `return_sequences`
- `return_states` 
- both default to `False` in Keras.

These control  whether the RNN layer returns a sequence
$$
       [ \h_{(1)}, \ldots, \h_{(T)} ] \\
       [ \y_{(1)}, \ldots, \y_{(T)} ]
$$
or just
$$
\h_{(T)} \\
\y_{(T)}
$$

## One to many

It may seem strange to generate a sequence output from a single input, but consider
- Feeding the output of step $(\tt-1)$ as *input* to step $\tt \gt 1$
$$\x_\tp = \y_{(\tt-1)}$$

A picture should help

<table>
    <tr>
        <th><center>RNN one to many API</center></th>
    </tr>
    <tr>
        <td><img src="images/RNN_layer_API_one_to_many.png"></td>
    </tr>
</table>

This will be particularly useful when the outputs $\y_\tp$ have an element of randomness
- A new output sequence is generated even when the same input "seed" $\x$ is used

We will show how an architecture like this can be used to *generate*
- A story (sequence of words)
- From a single (or small length sequence) "seed" word

# Combining RNN layers

There are some typical paradigms in which layers are combined.

## Stacked RNN layers

By feeding the output sequence into another RNN layer, we can achieve stacked layers



<table>
    <tr>
        <th><center>RNN Stacked layers</center></th>
    </tr>
    <tr>
        <td><img src="images/RNN_layers_stacked.jpg" width=80%></td>
    </tr>
</table>
​

## Encoder/Decoder architecture

An Encoder/Decoder architecture has
- An Encoder RNN layer, implementing a many to one relationship
- Followed by a Decoder RNN layer, implementing a one to many relationship


<table>
    <tr>
        <th><center>RNN Encoder/Decoder</center></th>
    </tr>
    <tr>
        <td><img src="images/RNN_layer_API_Encoder_Decoder.png"</td>
    </tr>
</table>
​


- The input sequence $[\x_{(1)} \dots \x_{(\bar{T})}]$
- Is summarized by $\bar{\h}_{(\bar{T})}$, the final latent state of the Encoder RNN
- Which is used to seed the Decoder RNN
- Producing new sequence $[\hat{\y}_{(1)} \dots \hat{\y}_{(T)}]$

Note that $T$ is not neccesarily equal to $\bar{T}$
- The Decoder is seeded by a singleton
- So the output length $T$ is no longer dependent on the length $\bar{T}$ of input $\x$
- Language translation: not necessarily a one-to-one correspondence between word $\tt$ of each language

Recall that $\bar{\h}_{(\bar{\tt})}$ is a fixed length encoding of the input prefix $\x_{(1)}, \ldots, \x_{(\bar{\tt})}$

So $\bar{\h}_{(\bar{T})}$, which initializes the Decoder, is a summary of the entire input sequence $\x$.
                                                                           
This fact enables us to decouple the Encoder from the Decoder
- The consumption of input $\x$ and product of output $\hat{\y}$ do not have to be synchronized
- Allowing for the possibility that $T \ne \bar{T}$


The combination of the two is used to solve a class of problems called *Sequence to Sequence*
- Transform one sequence to another 
- Language translation: sequence of English words to sequence of Mandarin symbols
- Captioning: sequence of image frames to sequence of words describing the movie

# Conclusion

We explained how an RNN may compute several types of relationships
- Many to one
- Many to many
- One to many

This variety arises because both input and output may be sequences.

Sequence to Sequence problems (a variant of "many to many") is a particularly important class of problems that can be solved with RNN's.

In [2]:
print("Done")

Done
