# Long Short-Term Memory Networks (LSTMs)

Lesson outline:
- LSTM Overview
- LSTM Architecture and Gates
- Other Architectures

By the end of the lesson, you'll be able to:
- Explain how LSTMs overcome the limitations of RNNs
- Implement the LSTM architecture

## Intro to LSTMs

Problem: RNNs are good in storing short term memory but can't keep long term memory due to the vanishing gradients problem. This is why LSTMs are introduced.

LSTM cells can decide whether:
- what goes into the cell
- what retains within the cell
- what passes to the output

**Closeup of LSTM cell / neuron**: the LSTM cell allows a recurrent system to learn over many time steps without the fear of losing information due to the vanishing gradient problem. It is fully differentiable, therefore allowing us to use backpropagation when updating the weights easily.

![afbeelding.png](attachment:65882d18-271d-436f-b786-64b48b8552dc.png)

### Basic components of LSTMs

Each neuron has:
- input:
  - Long Term Memory (LTM)
  - Short Term Memory (STM)
  - Input Vector (event)
- gates:
  - Forget gate (takes LTM)
  - Learn gate (takes STM & event)
  - Remember gate (takes LTM, STM & event)
  - Use gate (takes LTM, STM & event)
- output:
  - New Long-Term Memory
  - New Short-Term Memory
 
In total, the LSTM architecture has the following parts:
- The LSTM architecture includes the inputs of the input vector, LTM, and STM; Forget, Learn, Remember and Use Gates; the Hidden State; the outputs of LTM and STM; and the Cell State.

**Learn gate**
The learn gates takes STM and the Event and combines them, and then ignores a bit of it to ignore unimportant information. We do this by multiplying it by an ignore function: ***Updates the short-term memory with new information***

![afbeelding.png](attachment:a5e1974f-10e7-47e4-8f87-60eb273aa15e.png)

**Forget gate**
Takes the LTM and decides which parts to forget and which not. We do this by multiplying the LTM by a forget function *f*: ***Chooses which parts of the long-term memory are important***

![afbeelding.png](attachment:a5f4a06f-fd9c-45b8-9b92-53c0aa13230c.png)

**Remember gate**
Takes the LTM coming out of the forget gate and the STM coming out of the learn gate and simply combines them. Mathmatically this looks like: ***Outputs the long-term memory***

![afbeelding.png](attachment:101fa55c-63cc-4d14-aded-c7fcd1a75af6.png)

**Use gate** (or output gate)
Take the LTM that comes out of the forget gate and the STM that comes out of the learn gate to come up with a new LTM. Mathmatically this looks like: ***Outputs the short-term memory***

![afbeelding.png](attachment:e99f7456-e9af-464f-bd64-38971af20a27.png)

In summary, an LSTM looks like:

![afbeelding.png](attachment:1da15a13-6d79-47f4-8e9a-ebfc95efaabd.png)

## Differences LSTM and GRU

GRU:
- has only two gates: the Reset Gate and Update Gate
- GRU Triggers
  - If reset is close to zero, we ignore the previous hidden state
  - If gamma is close to one, we decide whether we should update the cell state with the current activation value
- The LSTM uses the Forget Gate and Output Gate to perform the same set of operations.