# Recurrent Neural Networks (RNNs)

## Intuitition
Ordinary neural networks have some limitations: they take a fixed size input and give a fixed sized output. For example, they take in a number of pixels, and give out a list of predictions for character likelihood in a vector.

Another problem is that they have know concept of memory or context. For example, if you were using them to predict words in a sentence, they would just take in all of the words of a sentence at once, and predict the next word, not taking into account which words might have come previously in their calculations, just predicting that the next word should be a proper noun, or whatever.

RNNs solve this problem. RNNs process sequences. Normal neural networks can be thought of as calculating an output based on some input. RNNs can be thought of as calculating an output based on the input BUT ALSO on the history of other inputs as well.

Imagine a loop.

The first time you go through the loop, you feed in some input and get some output.

The next time you go through the look, you feed in some input AND you feed in the previous output as input too.

Thus you can get predictions for sequences of data based on previous sequences and new input.

For this reason, RNNs have many applications such as stock prediction, video frame captioning (take into account previous frames), segment-to-segment machine translation (translate by grammatical segements instead of processing word-by-word), etc.

## Note on LTSMs
In practice, the RNNs are implemented as LTSM (Long Short Term Memory). This means that when generating predictions, they can choose use some sub-networks to choose to ignore some data, hold some data, and select some data for the prediction.

Since RNNs just take in a vector and spit out a vector, you can easily feed one RNN into another.

## Note on tanh
RNNs use hyperbolic tan as a squashing function to stop gradient explosions. I.e, it keeps the values between -1 to 1 to stop the gradients blowing up to huge/tiny numbers, which might happen if they were doubled/halfed each iteration.

It's very similar to the sigmoid function.

<img src="https://raw.githubusercontent.com/pekoto/fast.ai/master/images/tanh.jpg" width=500 height=350>

## LTSM Example
Ref: https://www.youtube.com/watch?v=WCUNPb-5EYI

<img src="https://raw.githubusercontent.com/pekoto/fast.ai/master/images/lstm0.jpg" width=600 height=450>

First, we take our previous input, our new input, and our list of predictions. We get a list of predictions: Doug, or saw. Doug and saw are put into memory. Our trained weights predict the next most likely word is saw. This gives us a new list of predictions: Doug, Spot, and Jane.

<img src="https://raw.githubusercontent.com/pekoto/fast.ai/master/images/lstm1.jpg" width=600 height=450>

We then repeat the process with our new predictions. When we get to the "ignore" layer, based on our previous memory, we know we've already seen Doug, so we ignore it, leaving us to choose either Jane or Spot.
<img src="https://raw.githubusercontent.com/pekoto/fast.ai/master/images/ltsm2.jpg" width=600 height=450>

In this way, RNNs can take account of what they previously saw when making new predictions.

## A very simple RNN
Ref: https://www.youtube.com/watch?v=UNmqTiOnRfg

Imagine we have three food types represented by one-hot encoded vectors. Each good gets cooked in a sequence, depending on the weather.

So in this example, food is our previous output, and weather is our new input. We need to remember the last food cooked, and combine it with the weather.

