Recurrent Neural Networks

* Objectives:
    * Review of what neural networks are (Multilayer Perceptron)
    * Simple RNN vs MLP
    * Benefits of Intralayer Recurrent Connections
    * Example of RNN in text data
    * Multilayer RNNs
    * Keras Neural Network API For RNN
    * LSTM

1) Recurrent Neural Networks Basics
* What distinguishes neural networks is that connections between neurons can form a **directed cycle**. This gives a network the ability to maintain a state based on previous input. So it can model **temporal, sequential** behavior
* RNN Use Cases:
    * Pattern recognition: Handwriting, Captioning Images
    * Sequential data: Speech Recognition, Stock price prediction, generating text, and news stories

2) Comparison of Vanilla MLP and Vanilla RNN
![mlp_vs_rnn](mlp_vs_rnn.png)
* The double arrow in RNN indicates a weight in each direction (2 weights)
* How many weights are in each architecture?
    * Vanilla MLP weights:
        * $W_{h\rightarrow y} = 4$
        * $W_{x\rightarrow h} = 8$
    * Vanilla RNN weights:
        * $W_{h\rightarrow y} = 4$
        * $W_{h\rightarrow h} = 16$
        * $W_{x\rightarrow h} = 8$

3) Benefits of Intra-layer Recurrent connections
* The previous state of a node in a recurrent hidden layer (`H_prev`) can affect the value of itself or other nodes in the layer in the present time (it's a directed cycle)
* This gives the net the ability to model sequential data
* Feedforward and backpropagation work the same way
* Learn $W_{h\rightarrow h}$ like all other weights. In a trained model all the weights are fixed. It's the activations of the nodes that changes with changes in sequence

4) Example: learn Dr. Seuss text with RNN
* As the model trains, it will eventually write some new books!
* Dictate the model architecture that works most effective
* Looking at how the model predicts new characters

5) Moving into Multilayer RNNs
* **Multilayer RNNs** - multiple layers (and more nodes in each layer) allow more difficult sequences to be learned. 
    * (-) They are also harder to train
    * (-) Exploding and vanishing gradients cause convergence problems, too
* Vanilla RNN Example:
![vanilla_rnn](vanilla_rnn.png)
* Multilayer RNN Example:
![multilayer_rnn](multilayer_rnn.png)

6) Keras For RNN
* **Keras** - a high-level neural networks API, written in Python and capable of running on top of either Tensorflow, CNTK, or Theano
    * Will become Tensorflow's default API
    * Available Recurrent Layers:
        * Recurrent
        * SimpleRNN
        * Long Short-Term Memory (LSTM)
        * Gated Recurrent Unit (GRU)

7) **Long Short-Term Memory (LSTM)**
![lstm](lstm.png)
* LSTM is an architecture proposed in 1997
* LSTM network is well-suited when there are time lags of unknown size and bound between important events
* LSTM practical applications:
    * Natural language text compression
    * Handwriting recognition
    * Speech recognition
    * Translation
* Example: Use LSTM in Keras to predict stock prices