# Chapter 8: Recurrent Neural Networks

## Introduction

In this chapter we introduce recurrent neural networks, specifically LSTMs.

## Recurrent Neural Networks

Up to now we have been covering feedforward neural networks. These are networks, whose graph representations have no loops in them.

One major drawback of feedforward NNs is that their inputs are of fixed dimension. They cannot, for example, handle sequences of arbitrary length.

To handle sequences of arbitrary length, one approach is to have a model which processes the sequence element by element. The input to the model is then the current element of the input sequence and the previous model output.

## Recurrent Neural Networks

The simplest possible NN that has this structure can be defined as follows. Suppose $x_1, x_2, \dots, x_n$ is the input sequence of vectors and we wish to produce an output sequence $o_1, o_2, \dots, o_n.$ We can then compute the output sequence recursively as follows:

$$
  o_k = h(W_1x_k + W_0o_{k-1}),
$$
where $W_1, W_0$ are weight matrices and $h$ is an activation function.

## Recurrent Neural Networks

The following diagram illustrates this architecture:

![](../images/rnn.svg){fig-align="center"}

## Recurrent Neural Networks

Neural networks with this structure are called **recurrent neural networks** (RNNs).

The major advantage of RNNs over feedforward NNs is that they can handle inputs of arbitrary length. The major disadvantage of RNNs over feedforward NNs is that they are less efficient when training.

For example, when we were generating text with a transformer we could get the predictions for the entire input sequence at once in parallel. However, if we were training a RNN we would have to compute predictions one at a time, sequentially.

## Long Short-Term Memory

In practice, the simple RNNs as in the previous example are not used. The problem that arises is that such simple models have very short memory. For example, by the time the model gets to the end of a sentence it has already forgotten the start of it.

To remedy this **long short-term memory models** (LSTMs) were introduced. 

## Long Short-Term Memory

As the name suggests LSTMs are supposed to have *long* short-term memory. They achieve this by having an extra vector, called model's hidden state, that they update during each step and pass it to the next.

LSTM cell looks like this:

![](../images/lstm.png){fig-align="center"}

## Extras

Another popular RNN architecture is called [gated recurrent units (GRUs)](https://en.wikipedia.org/wiki/Gated_recurrent_unit).

The formulas and derivation of backpropagation on RNNs is a bit more complicated when compared to the feedforward case. If you are interested in the math, check out this expository [paper](https://arxiv.org/abs/1610.02583).

## Practice Task

Build a LSTM model for generating lithuanian names character by character.

You can find the names dataset [here](https://github.com/jputrius/ml_intro/tree/main/data/names). Even better, try writing a script that downloads the names from [VLKK](https://vardai.vlkk.lt/) and cleans it. [Beutiful soup](https://beautiful-soup-4.readthedocs.io/en/latest/) package should come in handy for this.