<img src="./pics/DL.png" width=110 align="left" style="margin-right: 10px">

# Introduction to Deep Learning

## 07. Recursive Networks

---

## [Recurrent Neural Networks (RNN)](https://keras.io/layers/recurrent/)

<img src="https://upload.wikimedia.org/wikipedia/commons/b/b5/Recurrent_neural_network_unfold.svg" alt="Recurrent neural network unfold.svg" height="213" width="640"><br>By <a href="//commons.wikimedia.org/wiki/User:Ixnay" title="User:Ixnay">François Deloche</a> - <span class="int-own-work" lang="en">Own work</span>, <a href="https://creativecommons.org/licenses/by-sa/4.0" title="Creative Commons Attribution-Share Alike 4.0">CC BY-SA 4.0</a>, <a href="https://commons.wikimedia.org/w/index.php?curid=60109157">Link</a></p>

> _A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Unlike feedforward neural networks, RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition._ - [source](https://en.wikipedia.org/wiki/Recurrent_neural_network)

So, basically the neuron has memory and remembers it's previous informations by using the results of the previous inputs. 
> _A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop: as you can see above, this chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists._ - [source](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) 

The biggest problem with this setup is easily understood, if we consider the following sentence: "I grew up in _France_... that's why I speak fluent _French_." The related information are far away from each other and it's easy for the network to miss these nuances and however in theory they can learn them, but in practice they often unable to do so.  
But a more complex version of them is able to overcome this difficulty, they're called LSTMs.

#### [Long-Short Term Memory (LSTM) Networks](https://keras.io/layers/recurrent/#lstm)

They are build to have long term memory, and have the same kind of chained structure, but the modules themselves are different.

<img src="https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" width="600"><br>By <a href="https://colah.github.io/about.html">Christopher Olah</a>, <a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/">Link</a>

> _Long short-term memory (LSTM) is an artificial recurrent neural network, (RNN) architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections that make it a "general purpose computer" (that is, it can compute anything that a Turing machine can). It can not only process single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Bloomberg Business Week wrote: "These powers make LSTM arguably the most commercial AI achievement, used for everything from predicting diseases to composing music."_  
_A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell._  
_LSTM networks are well-suited to classifying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series. LSTMs were developed to deal with the exploding and vanishing gradient problems that can be encountered when training traditional RNNs. Relative insensitivity to gap length is an advantage of LSTM over RNNs, hidden Markov models and other sequence learning methods in numerous applications._ - [source](https://en.wikipedia.org/wiki/Long_short-term_memory)


It's main building blocks are:
- Cell state — Acts as a highway that transports relative information along the sequence chain.
- Forget gate — Decides which information should be kept and which should be discarded.
- Input gate — Updates the cell state.
- Output gate — Decides what the next hidden state(contains information on previous inputs) should be.

#### Further reading:

- https://colah.github.io/posts/2015-08-Understanding-LSTMs/
- https://medium.com/datadriveninvestor/a-high-level-introduction-to-lstms-34f81bfa262d
- https://skymind.ai/wiki/lstm
- https://www.dlology.com/blog/how-to-use-return_state-or-return_sequences-in-keras/

---

### In Practice

#### Build a sentiment predictor on movie reviews

Based on [this](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) tutorial.

In [None]:
from keras.models import Sequential

from keras.layers import Dense
from keras.layers import Embedding
from keras.layers import LSTM

from keras.datasets import imdb

from keras.preprocessing import sequence

In [None]:
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)

In [None]:
max_review_length = 500
# pad sequences will fill every doc in the corpus to a given length
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)

In [None]:
embedding_vector_length = 32

model = Sequential([
    Embedding(input_dim=top_words,                 # number of words in the vocab
              output_dim=embedding_vector_length,  # size of the embedding vector
              input_length=max_review_length),     # size of the documents
    LSTM(units=100),
    Dense(1, activation='sigmoid')
])

In [None]:
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])
model.summary()

In [None]:
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)

In [None]:
score = model.evaluate(X_test, y_test, batch_size=16)
print('test loss: {}, test accuracy: {}'.format(*score))

#### Exercise: 

This [tutorial](https://stackabuse.com/time-series-analysis-with-lstm-using-pythons-keras-library/).