## Tutorial + Building a LSTM using Keras

*Juarez Monteiro*<br> 
*Source: adventuresinmachinelearning.com*

---

## 1. Recurrent neural networks
---

A LSTM network is a kind of **Recurrent Neural Network**. A RNN is a neural network that **attempts to model time or sequence dependent behaviour** – such as language, stock prices, electricity demand and so on. This is performed by feeding back the output of a neural network layer at time $\mathbf{t}$ to the input of the same network layer at time $\mathbf{t + 1}$. It looks like this:

![](https://i0.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/09/Explicit-RNN.jpg?w=363&ssl=1)

RNNs are “unrolled” programmatically during training and prediction, so we get something like the following:

![](https://i2.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/09/Recurrent-neural-network.png?resize=768%2C251&ssl=1)

Here you can see that at each time step, a new word is being supplied – the output of the previous F (i.e. $h_{t-1}$) is supplied to the network at each time step also.

The problem with vanilla RNNs, constructed from regular neural network nodes, is that as we try to model dependencies between words or sequence values that are separated by a significant number of other words, we experience the **vanishing gradient problem** (and also sometimes  the exploding gradient problem). **This is because small gradients or weights** (values less than 1) **are multiplied many times over through the multiple time steps, and the gradients shrink asymptotically to zero**. This means the weights of those earlier layers won’t be changed significantly and therefore the network won’t learn long-term dependencies.

LSTM networks are a way of solving this problem.

## 2. LSTM networks
---

As mentioned previously, in this Keras LSTM tutorial we will be building an LSTM network for text prediction. **An LSTM network is a recurrent neural network that has LSTM cell blocks in place of our standard neural network layers**. These cells have various components called the input gate, the forget gate and the output gate – these will be explained more fully later. Here is a graphical representation of the **LSTM cell**:

![](https://i2.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/09/LSTM-diagram.png?w=669&ssl=1)

Notice **first**, on the left hand side, we have our new word/sequence value $x_t$ being concatenated to the previous output from the cell $h_{t−1}$. The first step for **this combined input is for it to be squashed via a tanh layer**. The **second step is that this input is passed through an input gate**. An **input gate is a layer of sigmoid activated nodes** whose output is multiplied by the squashed input. These input gate sigmoids can act to “kill off” any elements of the input vector that aren’t required. A sigmoid function outputs values between 0 and 1, so the weights connecting the input to these nodes can be trained to output values close to zero to “switch off” certain input values (or, conversely, outputs close to 1 to “pass through” other values).

**The next step** in the flow of data through this cell is the **internal state / forget gate loop**. **LSTM cells** have an **internal state** variable $st$. This variable, **lagged one time step** i.e. $s_{t−1}$ **is added to the input data to create an effective layer of recurrence**. This addition operation, instead of a multiplication operation, helps to reduce the risk of vanishing gradients. However, this recurrence loop is controlled by a forget gate – this works the same as the input gate, but instead helps the network learn which state variables should be “remembered” or “forgotten”.

Finally, we have an output layer tanh squashing function, the output of which is controlled by an output gate. This gate determines which values are actually allowed as an output from the cell $h_t$.

**Input**

First, the input is squashed between -1 and 1 using a tanh activation function. This can be expressed by:

$$g = tanh(b^g + x_tU^g + h_{t-1}V^g)$$

## 3. LSTM word embedding and hidden layer size
---

## 4. The Keras LSTM architecture
---

## 5. Building the Keras LSTM model
---

## 6 The text preprocessing code
---

## 7 Creating the Keras LSTM data generators
---

## 8. Creating the Keras LSTM structure
---

## 9. Compiling and running the Keras LSTM model
---

## 10. The Keras LSTM results
---