<div style="font-size:40px; font-weight:bold; margin:20px; margin-bottom:100px; text-align: justify;text-shadow: 1px 1px 1px #919191,
        1px 2px 1px #919191,
        1px 3px 1px #919191,
        1px 4px 1px #919191,
        1px 5px 1px #919191,
        1px 6px 1px #919191,
        1px 7px 1px #919191,
        1px 8px 1px #919191,
        1px 9px 1px #919191,
        1px 10px 1px #919191,
    1px 18px 6px rgba(16,16,16,0.4),
    1px 22px 10px rgba(16,16,16,0.2),
    1px 25px 35px rgba(16,16,16,0.2),
    1px 30px 60px rgba(16,16,16,0.4)">Long Short Term Memory Recurrent Networks</div>

<div style="font-style: italic; font-weight: bold; font-size:35px; text-align:center; font-family: Garamond">by Rubén Cañadas Rodríguez</div>

<div style="font-size: 30px; margin: 20px; margin-bottom: 40px; margin-left: 0px; line-height: 40pt">

<div style="font-size: 30px; font-family: Garamond; font-weight: bold; margin: 30px; margin-left: 0px; margin-bottom: 10px; ">Contents</div>
<ol>
<li>Introduction</li>
<li>Recurrent Neural Networks (RNNs)</li>
<li>Long Short Term Memory (LSTM)</li> 
<li>Text generation</li> 
<li>Coding</li> 
</ol>
</div>
<div style="font-size: 30px; font-weight: bold; margin-bottom: 20px; margin-top: 30px"> Introduction </div>
<div style="text-align:justify; font-family: Garamond; font-size:20px; margin: 20px; margin-left: 0px; line-height: 24pt">
In this tutorial we will be adressing the problem of generating new data (generative model) based on previous data. This generative model is quite different from autoencoders and GANs based methods since the dataset is previous data.
This allows to model times series data that cannot be addresed from Vanilla Neural Networks or Convolutional ones. In this case we are going to use a library built upon Tensorflow named Keras which is a high-level library allowing to code sophisticated deep lerning architectures in just a few lines of code. From Tensforflow 2.0 version Keras is included in Tensorflow! In other tutorials we will implement deep learning architecture using Tensorflow 2.0 and explaining its new options.
</div>
<div style="font-size: 30px; font-weight: bold; margin-bottom: 20px; margin-top: 30px"> Recurrent Neural Networks (RNNs) </div>
<div style="text-align:justify; font-family: Garamond; font-size:20px; margin: 20px; margin-left: 0px; line-height: 24pt">
Recurrent Networks are a special deep learning architecture that allows to work with sequence data. For example we could use time series data to predict the next data values in time. A very visual example is when we have the stock prices for a period of time and we want to predict the future values based on the past. RNNs can be used for such task as we can see in the following figure:
<img src="images/rnn_prediction.png" width="80%" style="text-align: center; margin: 30px; margin-bottom: 40px"> 
How does this type of architecture allow data prediction? To achieve this goal we have to re-design the traditional neural networks. In conventional architectures we have a fixed-sized vector of inputs and fixed vector of outputs. Howevever, in RNNs we can have vectors of sequences allowing us for example to perform sentiment analysis where the input is a sequence of data and the output is a label (e.g. sad, happy, excited..) is known as a many-to-one architecture. Also (as we will see in this tutorial) RNNs can be used for text generation. In this case, the input is a sequence of data and the output is also a sequence, therefore, it is a many-to-many architecture. This allows to generate meaningful text. In the next figure we can see the different types of RNNs depending on the form of input/output data.
<img src="images/rnn_arc.jpg" width="80%" style="text-align: center; margin: 30px; margin-bottom: 40px; margin-leftt:100px"> 
RNNs work pretty well for short sequences. Nevertheless, in practice, where long-term dependencies are need in extensive sequences, conventional RNNs fail due to a lack of "memory". This happens due to the backpropagation where gradients get smaller and smaller and thus, the network becomes more complicated to train. For more information with the problems of gradient descent when dealing with long-term dependencies chek this paper by Bengio and its co-workers <a href="http://ai.dinfo.unifi.it/paolo//ps/tnn-94-gradient.pdf" style="text-decoration:none">Bengio, et al. (1994)</a> Different solutions have been proposed concerning some variations of the conventional RNNs: Long Short Term Memory networks (LSTM) and Gated Recurrent Units (GRU).
<div style="font-size: 30px; font-weight: bold; margin-bottom: 20px; margin-top: 30px"> Long Short Term Memory (LSTM)
</div>
<div style="text-align:justify; font-family: Garamond; font-size:20px; margin: 20px; margin-left: 0px; line-height: 24pt">
The key idea is that the network can learn what to store in the long-term state, what to throw away and what to read from it. There are two different states: the cell state which is the long-term state $C_{t}$ and the short-term (hidden state) $h_{t}$
As the long-term state traverses the network from left to right, you can see that it first goes through a forget gate, dropping some of the memories, and then it adds some new memories via the addition operation, which adds the memories that were selected by an input gate. This allows the neural network which information of the past is relevant and what "memory" to preserve. In the following figure, a schematic picture of a LSTM cell is shown. 
</div>
<img src="lstm.png" width="80%" style="text-align: center; margin: 30px; margin-bottom: 40px; margin-leftt:100px"> 
<div style="font-size: 30px; font-weight: bold; margin-bottom: 20px; margin-top: 30px"> Text generation </div>
One application of LSTM is text generation in which given a sequence of characters from given data, train a model to predict the next character in the sequence. In this tutorial we are going to use a quote dataset, where thousands of quotes said by important people throughout the history are gathered. Then, we will try to build a model that can create quotes with logical meaning. The first step is to load the dataset and preprocess it so that the method can use the data. Strange characters are removed and also white spaces. Then new sentences are created. In each new sentence we add an additional letter. For example, if we have the sentence: I live in New York, this list of sentences will be created: ["I ", " l", li", "iv", "ve", " i", "in", "n ", " N", "Ne", "ew", "w ", " Y", "Yo", "or", "rk"]. Also a list of the next character for each sentence is created. Then, a three-dimensional tensor (X) is fed where the first dimension is the number of sentences, the second dimension is the number of chars, and the third dimension is the mapping from char-to-index. For the prediction (Y) the tensor is two-dimensional: the first dimension is the number of sentences, and the second dimension the corresponding next character (char-to-indices). This way the recurrent neural network with LSTM cells is fit and trained. Depending on the number of epochs and the batch size we will obtain better or worse results. 
<div style="font-size: 30px; font-weight: bold; margin-bottom: 20px; margin-top: 30px"> Coding </div>

In [None]:
import pandas as pd
import re #regular expressions for treating text
import numpy as np
from nltk import word_tokenize #Natural Language Processing package
from nltk import word_tokenize
from keras.models import Sequential, Model
from keras.layers.embeddings import Embedding
from keras.models import model_from_json
from keras.layers import Input, Activation, Dense, Dropout
from keras.layers import LSTM, Bidirectional

In [None]:
class DataPreparation(object):
    
    def __init__(self, max_lenght, step):
        
        self._csv = "QUOTE.csv"
        self.__df = pd.read_csv(self._csv)
        self.__quotes = list(self.__df.quote + "\n") #adding a break at the end of each quote!
        self.__chars_to_remove = ['#', '$', '%', '(', ')', '=', ';' ,':',  '*', '+', '£' , '—','’']
        self.__cleaned_quotes = []
        self._chars = None
        self._char_indices = None
        self._indices_char = None
        self._sentences = []
        self._next_chars = []
        self._max_length = max_lenght
        self._step = step #From where to start the next sentences ej: "If you live to " the next will be
        #with a step of 6: " live to be a h" and the next "to be a hundred" so we build sentences using
        #step of 6 words between. This is what is known as Bag of N-grams. This parameter can be tweaked. 
        
    def __str__(self):  
        return "{}".format(self.__quotes)
    
    def __len__(self): 
        return len(self.__quotes)
        
    
    def __getitem__(self, index):
        return self.__quotes[index]
    
    @property
    def chars_to_remove(self):
        return self.__chars_to_remove #Getter method for obtaining the default chars to remove
    
    @chars_to_remove.setter
    def chars_to_remove(self, list_of_chars):
        self.__chars_to_remove = list_of_chars #Setter method for changing the chars to remove from sentences
        
        
    def __remove_unused_chars(self):
        
        """
        This method removes caracthers that we do not want to include in our model
        saved in self.__chars_to_remove attribute. Also we remove more than two spaces
        in ours sentences. This method appends the results to cleaned_chars variable
        """
        
        for quote in self.__quotes:
            for char in self.__chars_to_remove:
                new_quote = quote.replace(char, ' ')
            pattern = re.compile(r'\s{2,}') # create the pattern: regular expression for replacing more than two white spaces
            quote = re.sub(pattern, ' ', quote)
            self.__cleaned_quotes.append(quote)
                  
    def __obtain_char_indices(self):
        
        self.__remove_unused_chars() #Creatin cleaned_chars variables that was initialized as empty list
        text = ' '.join(self.__cleaned_chars)
        self._chars = sorted(list(set(text))) #We extract all the characters (not repeated ) that are present in the sentences
        self._char_indices = dict((c, i) for i, c in enumerate(chars)) #To each character we assign a number
        self._indices_char = dict((i, c) for i, c in enumerate(chars)) #The contrary, to each number we asssign a character
        
    
    def _generate_sentences(self):
        
        for quote in self.__cleaned_quotes:
            for i in range(0, len(quote) - self._max_length, self._step):
                sentences.append(quote[i: i + self._max_length]) #sentence of lenght maxlen
                next_chars.append(quote[i + self._max_length]) #next char after sentence of max lenght
            self._sentences.append(quote[-self._max_length:])
            self._next_chars.append(quote[-1])
        self._sentences = self._sentences[:100] #Optional to reduce time consumption we limit the number of sentences

    
    def _vectorization(self):
        
        if not self._sentences:
            raise ValueError("_generate_sentences method has to be applied before vectorizing!! Otherwise execute generate_and_vectorize ")
        
        x = np.zeros((len(self._sentences), self._max_length, len(self._chars)), dtype=np.bool) #Three dimensional tensor: for each sentence and 
        #each char of the sentence we assign an index corresponding to a particular char
        y = np.zeros((len(self._sentences), len(self._chars)), dtype=np.bool) #Two dimensional tensor, for each sentence (of maxlen) we assign a next
        #char that the LSTM will have to guess given X. 
        for i, sentence in enumerate(self._sentences):
            for t, char in enumerate(sentence): 
                x[i, t, self._char_indices[char]] = 1 # Tensor[sentences, chars, indices_char]
            y[i, self._char_indices[self._next_chars[i]]] = 1 # Tensor [sentence, next_char_in_sentence]
            
        return x,y
        
    def generate_and_vectorize(self):
        
        self._generate_sentences()
        return self._vectorization() 


<table style="width:100%; margin: 20px; margin-left:-300px">
  <tr>
    <th>Predictors (X train)</th>
    <th>Labels (Y train)</th>
  </tr>
  <tr>
    <td>they</td>
    <td>are</td>
  </tr>
  <tr>
    <td>they are</td>
    <td>learning</td>
  </tr>
  <tr>
    <td>they are learning</td>
    <td>artificial</td>
  </tr>
      <tr>
    <td>they are learning artificial</td>
    <td>inteligence</td>
  </tr>
</table>

In [None]:
class TrainLSTM(DataPreparation):

    def __init__(self, epochs=5, batch_size=10000, max_lenght=15, step=1):
        super(TrainLSTM, self).__init__(max_lenght, step)
        self.__epochs = epochs
        self.__batch_size = batch_size
        self.__model = Sequential()
    
    
    def __str__(self):
        return "Model parameters: \n batch size: {}\n number of epochs: {}".format(self.__batch_size, self.__epochs)
    
    @property    
    def num_epochs(self):
        return self.__epochs
    
    @property
    def batch_size(self):
        return self.__batch_size
        
    def model(self):

        self.__model.add(Bidirectional(LSTM(256, return_sequences= True, 
                                     input_shape=(self._max_length, len(self._chars))), name = 'bidirectional'))
        self.__model.add(Dropout(0.1, name = 'dropout_bidirectional_lstm'))
        self.__model.add(LSTM(64, input_shape=(self._max_length, len(self._chars)), name = 'lstm'))
        self.__model.add(Dropout(0.1,  name = 'drop_out_lstm'))
        self.__model.add(Dense(15 * len(self._chars), name = 'first_dense'))
        self.__model.add(Dropout(0.1,  name = 'drop_out_first_dense'))
        self.__model.add(Dense(5 * len(self._chars), name = 'second_dense'))
        self.__model.add(Dropout(0.1,  name = 'drop_out_second_dense'))
        self.__model.add(Dense(len(self._chars), name = 'last_dense'))
        self.__model.add(Activation('softmax', name = 'activation'))
        self.__model.compile(optimizer='adam', loss='categorical_crossentropy')

    def train(self):
        
        model.fit([x], y, batch_size=self.__batch_size, epochs=self.__epochs)
        
