# **LSTMS and RNNs**

# What is a defect of RNNs?

RNNs are unable to keep data over many iterations because they run into the problem of vanishing gradients. 

# What are the inputs and outputs to an LSTM cell?

An LSTM cell has the following inputs:
1. Hidden State
2. Cell State
3. Actual Input

An LSTM cell has the following outputs: 
1. Cell State
2. Hidden State
3. Actual output

The hidden state and the actual output of an LSTM cell are the same.

# What corresponds to long term memory and what corresponds to short term memory?

Long term memory corresponds to the cell state whilc short term memory corresponds to the hidden state. This hidden state is also the output of the LSTM cell.

# What are the various gates in an LSTM?

An LSTM has the following gate
1. Forget
2. Input 
3. Update

# What do each of the gates do?

**Forget Gate**

The forget gate is used to modify the cell state, which holds the long term memory. The LTM is modified in such a way that it loses some information.

**Input Gate**

This input gate is used to update the LTM based on the input and the hidden state.

**Update Gate**

This final gate is used to modify the hidden state, for the next timestep. 

# How does the forget gate work?

The forget gate requires both the input at the current timestep and the hidden state from the previous time step. 

Input : $x_t$

Hidden State : $h_{t-1}$

$f_t = \sigma({W_f[x_t, h_{t-1}]} + b_f)$ 

This can be considered to be a forget factor, which is multiplied with the previous cell state $C_{t-1}$

$\therefore$ the Forget Gate works as: $C_{t-1} * f_t$

# How does the Input Gate work?

The input gate requires two input values:

1. Hidden State from previous timestep : $h_{t-1}$

2. Input at current timestep : $x_t$

This is a bit of a complicated gate which works as follows:

We initially calculate two intermediate values, 

$i_t = \sigma(W_i\cdot[x_t, h_{t-1}] + b_i)$

$\tilde{C_t} = \tanh({W_c \cdot [x_t, h_{t-1}] + b_c})$

It is also in this gate, that the cell state is modified. 

$C_t = i_t * C_{t-1} + f_t * \tilde{C_t}$

This is the modified cell state which is propogated to the next timestep.

When $i_t$ is multiplied with $C_{t-1}$, it is element wise multiplication. Likewise, when $f_t$ is multiplied with $\tilde{C_t}$, it is also elementwise multiplication. 

# How does the update gate work?

The purpose of the update gate is to modify the hidden state that is propogated to the next timestep. It requires two external inputs and an internal input. 

**External Inputs :** 

1. Hidden State from previous timestep : $h_{t-1}$

2. Input from current timestep : $x_t$

**Internal Input :**

1. New Cell State : $C_t$

We calculate an intermediate output called $o_t$ which is multiplied with the $\tanh$ of the cell state of the current timestep, $C_t$

$o_t = \sigma ( W_o \cdot [x_t, h_{t-1}] + b_o)$

$\therefore h_t = o_t * \tanh{C_t}$

# What intuition can you obtain from the working of the gates in an LSTM?

Whenever a sigmoid, $\sigma()$, activation function is used, it corresponds to losing some information. 

Likewise, whenever a tanh, $\tanh()$, activation is used, it corresponds to the cell learning something. 

[Understanding LSTMS](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)