### RNN vs LSTMs

- Compared to regular NN they can make use of information as input from another hidden layer neuron in order to improve its prediction
- RNNs suffer from the vanishing gradient problem
- They're great with handling small time steps 8-10 but their effectiveness drops significantly after that


### Basics of LSTM (keyword: cell, gates)

A rough version of what an LSTM does is summarized in the image below:

<img src="part-4_images/lstm_basic.png" alt="LSTM basic" style="width: 650px;"/>

It's composed of 4 gates which are essentially performing the calculations.

However, a more accurate depiction of how LSTM cell looks like involves several activation functions and an addition, with the cell being responsible for passing LTM and STM information, to the next cell if available.

We have to inputs, and two outputs.

<img src="part-4_images/lstm_cell.png" alt="LSTM" style="width: 550px;"/>


### Learn Gate (multiplication, tanh, sigmoid, ignore factor)

What this gate does is it takes Short Term Memory ($STM_{t-1}$) and the event (E), it combines them and removes some of the information out of it using an ignore factor ($i_{t}$).

- STM and E are combined
- then multiplied with the Weight matrix plus the bias then it gets squished by a tanh function
- the result of that $N_{t}$ gets multiplied with an ignore factor ($i_{t}$)
- $i_{t}$ is essentially another neural network with input STM and E but with a new matrix, and an activation function is a sigmoid to keep it between 0 and 1

<img src="part-4_images/learn-gate.png" alt="Learn Gate" style="width: 450px"/>

### Forget Gate (multiplication, sigmoid, forget factor)

The forget gate contributes to preserving only the essential information from the LTM input, a feature that is accomplished by multiplying LTM as input with the forget factor. The forget factor is obtained with the help of the STM.

<img src="part-4_images/forget-gate.png" alt="Forget Gate" style="width: 450px"/>

### Remember Gate (addition)

In this step, the outputs from the learn gate and the forget gate are added which results into a  new LTM.

<img src="part-4_images/remember-gate.png" alt="Remember Gate" style="width: 450px"/>

### Use Gate (tanh, sigmoid, multiplication, output)

It receives as input from Learn Gate and Forget Gate to generate a new output by multiplication.

<img src="part-4_images/use-gate.png" alt="Use Gate" style="width: 450px"/>



### Other architectures

#### Gated Recurrent Unit (GRU)

<img src="part-4_images/gru.png" alt="GRU" style="width: 450px"/>

#### Peephole connections

It addresses the issue of LTM being used as input for the calculation of the forget factor. The original forget factor is concatenated with LTM matrix.

<img src="part-4_images/peephole-connections.png" alt="Peephole connections" style="width: 450px"/>

This technique can be applied to any of the forget-type nodes.

### Overview LSTM

- Is not affected by vanishing gradient problem
- An LSTM cell is composed of 4 gates which determine the LTM and the STM (output)
- All of these 4 gates use activation function to squish the input from 0 to 1 or from -1 to 1 (tanh) making them suitable for backpropagation.
- As information is passed between these gates, they decide what to remember and what to discard, which information is passed on to the next gate
- Addition and multiplication play different roles which essentially contribute to solving the vanishing gradient

Article that provides an overview of [LSTM](https://skymind.ai/wiki/lstm#long)