# LSTM (Long Short Term Memory)

There is a branch of Deep Learning that is dedicated to processing time series. These deep Nets are **Recursive Neural Nets (RNNs)**. LSTMs are one of the few types of RNNs that are available. Gated Recurent Units (GRUs) are the other type of popular RNNs.

This is an illustration from http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (A highly recommended read)

![RNNs](./RNN-unrolled.png)

Pros:
- Really powerful pattern recognition system for time series

Cons:
- Cannot deal with missing time steps.
- Time steps must be discretised and not continuous.

In [17]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, BatchNormalization, LSTM, Embedding, TimeDistributed

In [2]:
def chr2val(ch):
    ch = ch.lower()
    if ch.isalpha():
        return 1 + (ord(ch) - ord('a'))
    else:
        return 0
    
def val2chr(v):
    if v == 0:
        return ' '
    else:
        return chr(ord('a') + v - 1)

In [3]:
with open("sonnets.txt") as f:
    text = f.read()
    
text_num = np.array([chr2val(c) for c in text])
print(text[:100])
print(text_num[:100])

THE SONNETS
by William Shakespeare




I

From fairest creatures we desire increase,
That thereby be
[20  8  5  0 19 15 14 14  5 20 19  0  2 25  0 23  9 12 12  9  1 13  0 19  8
  1 11  5 19 16  5  1 18  5  0  0  0  0  0  9  0  0  6 18 15 13  0  6  1  9
 18  5 19 20  0  3 18  5  1 20 21 18  5 19  0 23  5  0  4  5 19  9 18  5  0
  9 14  3 18  5  1 19  5  0  0 20  8  1 20  0 20  8  5 18  5  2 25  0  2  5]


The range of numbers for the letters are between:

In [4]:
[min(text_num), max(text_num)]

[0, 26]

## Many to One Model
Prepare the data

In [13]:
len_vocab = 27
sentence_len = 40
# n_chars = len(text_num)//sentence_len*sentence_len
num_chunks = len(text_num)-sentence_len

x = np.zeros((num_chunks, sentence_len))
y = np.zeros(num_chunks)
for i in range(num_chunks):
    x[i,:] = text_num[i:i+sentence_len]
    y[i] = text_num[i+sentence_len]

# x = np.reshape(x, (num_chunks, sentence_len, 1))

In [14]:
x.shape

(95610, 40)

In [15]:
model = Sequential()
# TODO: 
# 1. Add a Embedding layer https://keras.io/layers/embeddings/ 
# (the first argument is len_vocab, second the number of hidden units)
# 2. Add a LSTM with a suitable number of hidden units
# 3. Add a final Dense layer, keep in mind that this is a multiclass classification problem
# (how many output classes are there and what is the activation)
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 64)          1728      
_________________________________________________________________
lstm_2 (LSTM)                (None, 64)                33024     
_________________________________________________________________
dense_2 (Dense)              (None, 27)                1755      
Total params: 36,507.0
Trainable params: 36,507
Non-trainable params: 0.0
_________________________________________________________________


In [77]:
for i in range(10):
    model.fit(x,y, batch_size=128, epochs=1)
    sentence = []
    idx = np.random.choice(len(x),1)
    x_test = x[idx]
    if idx==len(x)-1:
        idx -= 1
    for i in range(100):
        # TODO: 
        # 1. Given x_test predict the probability of each class
        # 2. Given the probability distribution RANDOMLY choose a class 
        # https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html
        p = 
        idx2 = 
        x_test = np.hstack([x_test[:,1:], idx2[None,:]])
        sentence.append(val2chr(idx2[0]))

    print(''.join(sentence))
    print('-'*20)
    print(''.join([val2chr(int(v)) for v in x[idx+1,:].tolist()[0]]))
    print('='*40)

Epoch 1/1
esseeas ach co wiwsil  an  wingull   taur to dthuth fe lith fanl  thit no thecives veiss he heag tha
--------------------
feit  so that other mine thou wilt resto
Epoch 1/1
e koume pying copwist love wirnt toll were is my my rate  thoueene glioks of ghich stis arly     kea
--------------------
s have drain d his blood and fill d his 
Epoch 1/1
  banch deture  whor all  sweartay i   if liin love theeag   liln heautt s tha lether flove un thou 
--------------------
 made  that millions of strange shadows 
Epoch 1/1
o if yore cand acker a glace in be deach be the with  but tith his ase ade hime that one tooth ate a
--------------------
e mute    or  if they sing   tis with so
Epoch 1/1
t of tipen  knagy alals lacven sight besed thy swicken migh loke gacs  your by were make  which noem
--------------------
eart   xlvii  betwixt mine eye and heart
Epoch 1/1
 eypring   fithel  a take wriches eveming alip sim no loves  is wore suskces wost in that faist less
--------------------
tha

## Many to Many LSTM

In the previous layer we predicted one time step given the last 40 steps. This time however, we are predicting the 2nd to 41st character given the first 40 characters. Another way of looking at this is that, at each **character input** we are predicting the subsequent character.

In [47]:
model = Sequential()
# TODO:
# 1. Add an Embedding and LSTM layer as before. However, set return_sequences=True in the LSTM
# 2. Add a TimeDistributed(Dense) layer instead of just Dense.
# See here as to why: https://keras.io/layers/wrappers/#timedistributed

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, None, 64)          1728      
_________________________________________________________________
lstm_5 (LSTM)                (None, None, 64)          33024     
_________________________________________________________________
time_distributed_3 (TimeDist (None, None, 27)          1755      
Total params: 36,507.0
Trainable params: 36,507
Non-trainable params: 0.0
_________________________________________________________________


In [48]:
x = np.zeros((num_chunks, sentence_len))
y = np.zeros((num_chunks, sentence_len))
for i in range(num_chunks):
    x[i,:] = text_num[i:i+sentence_len]
    y[i,:] = text_num[i+1:i+sentence_len+1]

y = np.reshape(y,(y.shape[0], y.shape[1], 1))

### Notes:
1. The shape of `y` is now the same as x, as we are not predicting just one character any more.
2. In the following cell, it is important to notice that I did not need to use a 40 length character as an input to the predictions. See below:

In [None]:
for i in range(10):
    model.fit(x,y, batch_size=128, epochs=1)
    
    sentence = []
    letter = np.random.choice(len_vocab,1).reshape((1,1)) #choose a random letter
    for i in range(100):
        sentence.append(val2chr(letter))
        p = model.predict(letter)
        letter = np.random.choice(27,1,p=p[0][0])
    print(''.join(sentence))
    print('='*40)

Epoch 1/1
thenokinds aunin   wegoub sow ld ad t tn t osth cy wed yat d mieeigr t horumet n ws ghiuy  trw ashar
Epoch 1/1