# LSTM (Long Short Term Memory)

There is a branch of Deep Learning that is dedicated to processing time series. These deep Nets are **Recursive Neural Nets (RNNs)**. LSTMs are one of the few types of RNNs that are available. Gated Recurent Units (GRUs) are the other type of popular RNNs.

This is an illustration from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (A highly recommended read)

![RNNs](../images/RNN-unrolled.png)

Pros:
- Really powerful pattern recognition system for time series

Cons:
- Cannot deal with missing time steps.
- Time steps must be discretised and not continuous.

Also read [The Unreasonable Effectiveness of RNNs](karpathy.github.io/2015/05/21/rnn-effectiveness/) by Andrej Karpathy. Finish with having a browse through this [Stackoverflow Question](https://stackoverflow.com/questions/38714959/understanding-keras-lstms).

In [11]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization, LSTM, Embedding, TimeDistributed

In [12]:
def chr2val(ch):
    ch = ch.lower()
    if ch.isalpha():
        return 1 + (ord(ch) - ord('a'))
    else:
        return 0
    
def val2chr(v):
    if v == 0:
        return ' '
    else:
        return chr(ord('a') + v - 1)

In [13]:
with open("./data/sonnets.txt") as f:
    text = f.read()
    
text_num = np.array([chr2val(c) for c in text])
print(text[:100])
print(text_num[:100])

ï»¿The Project Gutenberg EBook of Shakespeare's Sonnets, by William Shakespeare

This eBook is for t
[143   0   0  20   8   5   0  16  18  15  10   5   3  20   0   7  21  20
   5  14   2   5  18   7   0   5   2  15  15  11   0  15   6   0  19   8
   1  11   5  19  16   5   1  18   5   0  19   0  19  15  14  14   5  20
  19   0   0   2  25   0  23   9  12  12   9   1  13   0  19   8   1  11
   5  19  16   5   1  18   5   0   0  20   8   9  19   0   5   2  15  15
  11   0   9  19   0   6  15  18   0  20]


The range of numbers for the letters are between:

In [14]:
[min(text_num), max(text_num)]

[0, 143]

Prepare the data

In [15]:
len_vocab = 27
sentence_len = 40
# n_chars = len(text_num)//sentence_len*sentence_len
num_chunks = len(text_num)-sentence_len

def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    
    slice_size = batch_size * seq_length
    n_batches = len(int_text) // slice_size
    x = int_text[: n_batches*slice_size]
    y = int_text[1: n_batches*slice_size + 1]

    x = np.split(np.reshape(x,(batch_size,-1)),n_batches,1)
    y = np.split(np.reshape(y,(batch_size,-1)),n_batches,1)
    return x, y

x = np.zeros((num_chunks, sentence_len))
y = np.zeros(num_chunks)
for i in range(num_chunks):
    x[i,:] = text_num[i:i+sentence_len]
    y[i] = text_num[i+sentence_len]

# x = np.reshape(x, (num_chunks, sentence_len, 1))

In [16]:
x.shape

(119711, 40)

## Many to One Model

In [17]:
model = Sequential()
model.add(Embedding(len_vocab, 64))
model.add(LSTM(64))
model.add(Dense(len_vocab, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 64)          1728      
_________________________________________________________________
lstm_2 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dense_2 (Dense)              (None, 27)                1755      
Total params: 36,507
Trainable params: 36,507
Non-trainable params: 0
_________________________________________________________________


In [18]:
Embedding?

In [None]:
np.random.choice(3,10,p=[0.99, 0.01, 0])

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64)

In [None]:
for i in range(10):
    model.fit(x,y, batch_size=128, epochs=1)
    sentence = []
    idx = np.random.choice(len(x),1)
    x_test = x[idx]
    if idx==len(x)-1:
        idx -= 1
#     sentence.append(val2chr(idx[0]))
    for i in range(100):
        p = model.predict(x_test)
        idx2 = np.random.choice(27,1,p=p.ravel())
        x_test = np.hstack([x_test[:,1:], idx2[None,:]])
        sentence.append(val2chr(idx2[0]))

    print(''.join(sentence))
    print('-'*20)
    print(''.join([val2chr(int(v)) for v in x[idx+1,:].tolist()[0]]))
    print('='*40)

Epoch 1/1

In [None]:
idx2.shape

In [None]:
p

In [None]:
sum(p.ravel())

## Many to Many LSTM

In the previous layer we predicted one time step given the last 40 steps. This time however, we are predicting the 2nd to 41st character given the first 40 characters. Another way of looking at this is that, at each **character input** we are predicting the subsequent character.

In [None]:
len_vocab = 27
sentence_len = 40
# n_chars = len(text_num)//sentence_len*sentence_len
num_chunks = len(text_num)-sentence_len

x = np.zeros((num_chunks, sentence_len))
y = np.zeros((num_chunks, sentence_len))
for i in range(num_chunks):
    x[i,:] = text_num[i:i+sentence_len]
    y[i,:] = text_num[i+1:i+sentence_len+1]
y = y.reshape(y.shape+(1,))

In [None]:
# batch_size = 64

model = Sequential()
model.add(Embedding(len_vocab, 64)) # , batch_size=batch_size
model.add(LSTM(256, return_sequences=True)) # , stateful=True
model.add(TimeDistributed(Dense(len_vocab, activation='softmax')))

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

In [None]:
for i in range(10):
    sentence = []
    letter = [np.random.choice(len_vocab,1)[0]] #choose a random letter
    for i in range(100):
        sentence.append(val2chr(letter[-1]))
        p = model.predict(np.array(letter)[None,:])
        letter.append(np.random.choice(27,1,p=p[0][-1])[0])
    print(''.join(sentence))
    print('='*100)
    model.fit(x,y, batch_size=128, epochs=1)

In [None]:
letter = [np.random.choice(len_vocab,1)[0]] #choose a random letter
for i in range(100):
    sentence.append(val2chr(letter[-1]))
    p = model.predict(np.array(letter)[None,:])
    letter.append(np.random.choice(27,1,p=p[0][-1])[0])
print(''.join(sentence))
print('='*100)

### Notes:
1. The shape of `y` is now the same as x, as we are not predicting just one character any more.
2. In the following cell, it is important to notice that I did not need to use a 40 length character as an input to the predictions. See below:

In [None]:
x.shape