# Sequence experiments with LSTM
Following the experiments declared in https://machinelearningmastery.com/how-to-use-an-encoder-decoder-lstm-to-echo-sequences-of-random-integers/

First, we want to generate a random sequence of integers (range (0,100)), which will be one-hot encoded (because we interpret this problem as classification, not regression)

In [1]:
import numpy as np
import keras
import pandas as pd

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [69]:
UPPER_LIMIT = 100

def generate_sequence(length=25):
    return np.array([np.random.randint(0, UPPER_LIMIT-1) for _ in range(length)])

def onehot_encode(sequence, dim=UPPER_LIMIT):
    encodings = np.zeros((len(sequence), dim))
    encodings[np.arange(len(sequence)), sequence] = 1
    return encodings

def onehot_decode(sequence, dim=UPPER_LIMIT):
    return np.argmax(sequence, axis=1)

def generate_subsequences(sequence, n_in, n_out):
    if n_out > n_in:
        raise Exception("Wrong sizes.")
    i_subseq = []
    o_subseq = []
    for i in range(len(sequence) - n_in + 1):
        i_subseq.append(sequence[i:i+n_in])
        o_subseq.append(sequence[i:i+n_out])
    return np.array(i_subseq), np.array(o_subseq)

def get_data(length=25, n_in=5, n_out=5):
    seq = generate_sequence(length)
    seq = onehot_encode(seq)
    i_s, o_s = generate_subsequences(seq, n_in, n_out)
    return i_s, o_s
    
_x, _y = get_data(25, 6, 6)
print(_x.shape, _y.shape)

(20, 6, 100) (20, 6, 100)


## Whole sequence echo
Now that we have our sequence generator complete, we want an LSTM network to echo the complete sequence

In [86]:
from keras import Sequential
from keras.layers import LSTM, TimeDistributed, Dense, RepeatVector

# Define keras network
model = Sequential()
model.add(LSTM(16, batch_input_shape=(5, 6, 100), return_sequences=True, stateful=True, use_bias=True)) # 20 memory units
model.add(TimeDistributed(Dense(100, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

Now we can start training the network with random sequences. To avoid overfitting, we generate different sequences at each epoch.

In [82]:
EPOCHS = 500
# train LSTM
for epoch in range(EPOCHS):
    # generate new random sequence
    X,y = get_data(25, 6, 6)
    # fit model for one epoch on this sequence
    model.fit(X, y, epochs=1, batch_size=5, verbose=int(epoch==EPOCHS-1)*2, shuffle=False)
    model.reset_states()

Epoch 1/1
 - 0s - loss: 1.7123 - acc: 1.0000


Now we can test the trained network

In [83]:
# evaluate LSTM
X,y = get_data(25, 6, 6)
yhat = model.predict(X, batch_size=5, verbose=0)
# decode all pairs
for i in range(len(X)):
    print('Expected:', onehot_decode(y[i]), 'Predicted', onehot_decode(yhat[i]))

Expected: [71  2 83 21 61 12] Predicted [71  2 83 48 61 12]
Expected: [ 2 83 21 61 12 39] Predicted [ 2 83 48 61 12 39]
Expected: [83 21 61 12 39 31] Predicted [83 21 61 12 39 31]
Expected: [21 61 12 39 31 25] Predicted [21 61 12 39 31 25]
Expected: [61 12 39 31 25 82] Predicted [61 12 39 31 25 82]
Expected: [12 39 31 25 82  5] Predicted [12 39 31 25 82  5]
Expected: [39 31 25 82  5 90] Predicted [39 31 25 82  5 90]
Expected: [31 25 82  5 90 50] Predicted [31 25 82  5 90 50]
Expected: [25 82  5 90 50 73] Predicted [25 82  5 90 50 73]
Expected: [82  5 90 50 73 17] Predicted [82  5 90 50 73 17]
Expected: [ 5 90 50 73 17 91] Predicted [ 5 90 50 73 17 91]
Expected: [90 50 73 17 91  8] Predicted [90 50 73 17 91  8]
Expected: [50 73 17 91  8 98] Predicted [50 73 17 91  8  2]
Expected: [73 17 91  8 98 74] Predicted [73 17 91  8  2 74]
Expected: [17 91  8 98 74 57] Predicted [17 91  8  2 74 57]
Expected: [91  8 98 74 57 33] Predicted [91  8  2 74 57 33]
Expected: [ 8 98 74 57 33 30] Predicted 

In [84]:
print(model.layers[0].get_weights()[0].shape) # Kernel (used on inputs)
print(model.layers[0].get_weights()[1].shape) # Recurrent kernel (used on internal state)
print(model.layers[0].get_weights()[2].shape) # Bias

(100, 64)
(16, 64)
(64,)


The weights have a 2nd dimension which is 4 times the memory size, since it includes in a single matrix: 
- forget
- input
- cell
- output

## Seq2Seq
This time we want an output sequence of arbitrary length. To do this, we cannot rely on the LSTM predicting a single value at each timestep, while we need to create an encoder-decoder structure.

In [92]:
model = Sequential()
model.add(LSTM(150, batch_input_shape=(21, 5, 100), stateful=True))
model.add(RepeatVector(2))
model.add(LSTM(150, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(100, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

In [93]:
EPOCHS = 5000
# train LSTM
for epoch in range(EPOCHS):
    # generate new random sequence
    X,y = get_data(25, 5, 2)
    # fit model for one epoch on this sequence
    model.fit(X, y, epochs=1, batch_size=21, verbose=int((epoch+1) % 250 == 0)*2, shuffle=False)
    model.reset_states()

Epoch 1/1
 - 0s - loss: 3.8996 - acc: 0.1667
Epoch 1/1
 - 0s - loss: 3.0983 - acc: 0.3571
Epoch 1/1
 - 0s - loss: 2.6107 - acc: 0.4762
Epoch 1/1
 - 0s - loss: 1.8789 - acc: 0.5000
Epoch 1/1
 - 0s - loss: 1.9074 - acc: 0.5000
Epoch 1/1
 - 0s - loss: 2.0884 - acc: 0.5238
Epoch 1/1
 - 0s - loss: 1.6594 - acc: 0.5714
Epoch 1/1
 - 0s - loss: 1.7970 - acc: 0.5714
Epoch 1/1
 - 0s - loss: 1.3525 - acc: 0.5000
Epoch 1/1
 - 0s - loss: 1.3176 - acc: 0.6429
Epoch 1/1
 - 0s - loss: 1.1246 - acc: 0.6429
Epoch 1/1
 - 0s - loss: 0.9714 - acc: 0.6429
Epoch 1/1
 - 0s - loss: 1.1233 - acc: 0.6190
Epoch 1/1
 - 0s - loss: 0.8686 - acc: 0.7619
Epoch 1/1
 - 0s - loss: 0.3903 - acc: 0.9048
Epoch 1/1
 - 0s - loss: 0.5007 - acc: 0.9048
Epoch 1/1
 - 0s - loss: 0.3501 - acc: 0.9048
Epoch 1/1
 - 0s - loss: 0.3696 - acc: 0.9762
Epoch 1/1
 - 0s - loss: 0.3066 - acc: 0.9286
Epoch 1/1
 - 0s - loss: 0.3368 - acc: 0.9286


In [94]:
# evaluate LSTM
X,y = get_data(25, 5, 2)
yhat = model.predict(X, batch_size=21, verbose=0)
# decode all pairs
for i in range(len(X)):
    print('Expected:', onehot_decode(y[i]), 'Predicted', onehot_decode(yhat[i]))

Expected: [29 90] Predicted [29 90]
Expected: [90 60] Predicted [90 60]
Expected: [60 82] Predicted [82 60]
Expected: [82 79] Predicted [82 79]
Expected: [79 62] Predicted [79 62]
Expected: [62 44] Predicted [62 44]
Expected: [44 40] Predicted [44 40]
Expected: [40 44] Predicted [40 44]
Expected: [44 13] Predicted [44 13]
Expected: [13 48] Predicted [13 48]
Expected: [48 29] Predicted [48 29]
Expected: [29 15] Predicted [29 15]
Expected: [15 24] Predicted [15 24]
Expected: [24 20] Predicted [24 20]
Expected: [20 73] Predicted [20 73]
Expected: [73 37] Predicted [73 37]
Expected: [37  8] Predicted [37  8]
Expected: [ 8 69] Predicted [ 8 69]
Expected: [69 25] Predicted [69 25]
Expected: [25 38] Predicted [25 38]
Expected: [38 15] Predicted [38 15]


## Longer sequences?

In [96]:
from tqdm import tnrange

LEN = 4
MEM = 300

model = Sequential()
model.add(LSTM(MEM, batch_input_shape=(21, 5, 100), stateful=True))
model.add(RepeatVector(LEN))
model.add(LSTM(MEM, return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(100, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

EPOCHS = 5000
# train LSTM
for epoch in tnrange(EPOCHS):
    # generate new random sequence
    X,y = get_data(25, 5, LEN)
    # fit model for one epoch on this sequence
    model.fit(X, y, epochs=1, batch_size=21, verbose=int((epoch+1) % 250 == 0)*2, shuffle=False)
    model.reset_states()

Epoch 1/1
 - 0s - loss: 3.6320 - acc: 0.1786
Epoch 1/1
 - 0s - loss: 2.9323 - acc: 0.2857
Epoch 1/1
 - 0s - loss: 2.4213 - acc: 0.2976
Epoch 1/1
 - 0s - loss: 2.4369 - acc: 0.3095
Epoch 1/1
 - 0s - loss: 2.1717 - acc: 0.2976
Epoch 1/1
 - 0s - loss: 2.0979 - acc: 0.3095
Epoch 1/1
 - 0s - loss: 2.1899 - acc: 0.3929
Epoch 1/1
 - 0s - loss: 2.0553 - acc: 0.2976
Epoch 1/1
 - 0s - loss: 2.0192 - acc: 0.2619
Epoch 1/1
 - 0s - loss: 1.8303 - acc: 0.3214
Epoch 1/1
 - 0s - loss: 1.8980 - acc: 0.3214
Epoch 1/1
 - 0s - loss: 1.7297 - acc: 0.2619
Epoch 1/1
 - 0s - loss: 1.6563 - acc: 0.3571
Epoch 1/1
 - 0s - loss: 1.6828 - acc: 0.2857
Epoch 1/1
 - 0s - loss: 1.6940 - acc: 0.4167
Epoch 1/1
 - 0s - loss: 4.2849 - acc: 0.1905
Epoch 1/1
 - 0s - loss: 3.7389 - acc: 0.2619
Epoch 1/1
 - 0s - loss: 3.2624 - acc: 0.2976
Epoch 1/1
 - 0s - loss: 3.2066 - acc: 0.3095
Epoch 1/1
 - 0s - loss: 2.8951 - acc: 0.3929



In [97]:
# evaluate LSTM
X,y = get_data(25, 5, 4)
yhat = model.predict(X, batch_size=21, verbose=0)
# decode all pairs
for i in range(len(X)):
    print('Expected:', onehot_decode(y[i]), 'Predicted', onehot_decode(yhat[i]))

Expected: [35 66 12 21] Predicted [35 35 66 66]
Expected: [66 12 21 97] Predicted [66 66 32 32]
Expected: [12 21 97 74] Predicted [12 12 62  8]
Expected: [21 97 74 15] Predicted [21 21 21 21]
Expected: [97 74 15 47] Predicted [97 97 97 10]
Expected: [74 15 47 37] Predicted [74 74 15 74]
Expected: [15 47 37 90] Predicted [15 15 15 15]
Expected: [47 37 90 24] Predicted [47 88 88 57]
Expected: [37 90 24 92] Predicted [90 37 37 37]
Expected: [90 24 92 43] Predicted [90 90 90 90]
Expected: [24 92 43  9] Predicted [24 24 24 24]
Expected: [92 43  9 89] Predicted [92 92 43 92]
Expected: [43  9 89 92] Predicted [43 43 51 43]
Expected: [ 9 89 92 89] Predicted [ 9  9 80  8]
Expected: [89 92 89 70] Predicted [89 89 80 89]
Expected: [92 89 70 67] Predicted [92 92 92 92]
Expected: [89 70 67 19] Predicted [89 89 70 89]
Expected: [70 67 19 18] Predicted [70 70 70 70]
Expected: [67 19 18 34] Predicted [67 67 67 67]
Expected: [19 18 34  3] Predicted [19 19 19 19]
Expected: [18 34  3 72] Predicted [18 18