# Bahdanau Attention Test 1

Let's test our **Bahdanau Attention** Layer with the following test code [[1](https://machinelearningmastery.com/encoder-decoder-attention-sequence-to-sequence-prediction-keras/)].

* Added Bahdanau layer, followed by a TimeDistributedDense layer.
* Also, changed LSTM layer to BiLSTM layer as in [2](https://arxiv.org/pdf/1409.0473.pdf).

In [None]:
# import os
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

from random import randint
from numpy import array
from numpy import argmax
from numpy import array_equal
from keras.models import Sequential
# from keras.layers import LSTM
from keras_bahdanau.recurrent import BahdanauGRU
from keras.layers import Dense, LSTM
from keras.layers.wrappers import Bidirectional, TimeDistributed


# generate a sequence of random integers
def generate_sequence(length, n_unique):
    return [randint(0, n_unique-1) for _ in range(length)]


# one hot encode sequence
def one_hot_encode(sequence, n_unique):
    encoding = list()
    for value in sequence:
        vector = [0 for _ in range(n_unique)]
        vector[value] = 1
        encoding.append(vector)
    return array(encoding)


# decode a one hot encoded string
def one_hot_decode(encoded_seq):
    return [argmax(vector) for vector in encoded_seq]


# prepare data for the LSTM
def get_pair(n_in, n_out, cardinality):
    # generate random sequence
    sequence_in = generate_sequence(n_in, cardinality)
    sequence_out = sequence_in[:n_out] + [0 for _ in range(n_in-n_out)]
    # one hot encode
    X = one_hot_encode(sequence_in, cardinality)
    y = one_hot_encode(sequence_out, cardinality)
    # reshape as 3D
    X = X.reshape((1, X.shape[0], X.shape[1]))
    y = y.reshape((1, y.shape[0], y.shape[1]))
    return X,y


# configure problem
n_features = 50
n_timesteps_in = 5
n_timesteps_out = 2
# define model
model = Sequential()
model.add(Bidirectional(LSTM(150, return_sequences=True), input_shape=(n_timesteps_in, n_features)))
# model.add(LSTM(150, input_shape=(n_timesteps_in, n_features), return_sequences=True))
model.add(BahdanauGRU(100, return_sequences=True))
model.add(TimeDistributed(Dense(n_features, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
model.summary()
# train LSTM
for epoch in range(5000):
    # generate new random sequence
    X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
    # fit model for one epoch on this sequence
    model.fit(X, y, epochs=1, verbose=1)

# evaluate LSTM
total, correct = 100, 0
for _ in range(total):
    X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
    yhat = model.predict(X, verbose=0)
    if array_equal(one_hot_decode(y[0]), one_hot_decode(yhat[0])):
        correct += 1
print('Accuracy: %.2f%%' % (float(correct)/float(total)*100.0))

# spot check some examples
for _ in range(10):
    X,y = get_pair(n_timesteps_in, n_timesteps_out, n_features)
    yhat = model.predict(X, verbose=0)
    print('Expected:', one_hot_decode(y[0]), 'Predicted', one_hot_decode(yhat[0]))

Here is a summary of the model:

In [None]:
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional_1 (Bidirection (None, 5, 300)            241200    
_________________________________________________________________
bahdanau_gru_1 (BahdanauGRU) (None, 5, 100)            266000    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 50)             5050      
=================================================================
Total params: 512,250
Trainable params: 512,250
Non-trainable params: 0

In [None]:
Accuracy: 100.00%
Expected: [14, 16, 0, 0, 0] Predicted [14, 16, 0, 0, 0]
Expected: [41, 49, 0, 0, 0] Predicted [41, 49, 0, 0, 0]
Expected: [39, 19, 0, 0, 0] Predicted [39, 19, 0, 0, 0]
Expected: [15, 15, 0, 0, 0] Predicted [15, 15, 0, 0, 0]
Expected: [14, 7, 0, 0, 0] Predicted [14, 7, 0, 0, 0]
Expected: [4, 9, 0, 0, 0] Predicted [4, 9, 0, 0, 0]
Expected: [17, 30, 0, 0, 0] Predicted [17, 30, 0, 0, 0]
Expected: [8, 12, 0, 0, 0] Predicted [8, 12, 0, 0, 0]
Expected: [37, 4, 0, 0, 0] Predicted [37, 4, 0, 0, 0]
Expected: [13, 30, 0, 0, 0] Predicted [13, 30, 0, 0, 0]

In [None]:
Bahdanau layer learns the toy problem perfectly. However, we should try it out with harder problems; and be careful with overfitting.