It is highly recommended to use a powerful **GPU**, you can use it for free uploading this notebook to [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb).
<table align="center">
 <td align="center"><a target="_blank" href="https://colab.research.google.com/github/ezponda/intro_deep_learning_solutions_solutions/blob/main/class/RNN/Seq2seq.ipynb">
        <img src="https://i.ibb.co/2P3SLwK/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
  <td align="center"><a target="_blank" href="https://github.com/ezponda/intro_deep_learning_solutions_solutions/blob/main/class/RNN/Seq2seq.ipynb">
        <img src="https://i.ibb.co/xfJbPmL/github.png"  height="70px" style="padding-bottom:5px;"  />View Source on GitHub</a></td>
</table>

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## Introduction

In this example, we train a model to learn to add two numbers, provided as strings.

**Example:**

- Input: "535+61"
- Output: "596"

[Notebook from Keras Tutorial](https://keras.io/examples/nlp/addition_rnn/)

## Seq to seq model

Keras provides the `return_state` argument to the LSTM layer that will provide access to the hidden state output (state_h) and the cell state (state_c). Note that `LSTM` has 2 state  tensors, but `GRU`
only has one.

To configure the initial state of the layer, just call the layer with additional
keyword argument `initial_state`.
Note that the shape of the state needs to match the unit size of the layer, like in the
example below.

In [None]:
(timesteps, features, output_timesteps) = (10, 5, 12)

# Encoder
encoder_input = tf.keras.Input(shape=(timesteps, features),
                               name='encoder_input')

# Return states in addition to output
_, state_h, state_c = layers.LSTM(64, return_state=True,
                                       name="encoder")(encoder_input)
# Enncoded vector
encoder_state = [state_h, state_c]

# Decoder
decoder_input = tf.keras.Input(shape=(output_timesteps, 1),
                               name='decoder_input')

# Pass the 2 states to a new LSTM layer, as initial state
decoder_output = layers.LSTM(64, return_sequences=True,
                             name="decoder")(decoder_input,
                                             initial_state=encoder_state)
output = layers.TimeDistributed(layers.Dense(5))(decoder_output)

model = keras.Model([encoder_input, decoder_input], output)
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      [(None, 10, 5)]      0                                            
__________________________________________________________________________________________________
decoder_input (InputLayer)      [(None, 12, 1)]      0                                            
__________________________________________________________________________________________________
encoder (LSTM)                  [(None, 64), (None,  17920       encoder_input[0][0]              
__________________________________________________________________________________________________
decoder (LSTM)                  (None, 12, 64)       16896       decoder_input[0][0]              
                                                                 encoder[0][1]                

In [None]:
from tensorflow.keras.utils import plot_model
plot_model(model, show_shapes=True)

('Failed to import pydot. You must `pip install pydot` and install graphviz (https://graphviz.gitlab.io/download/), ', 'for `pydotprint` to work.')


In [None]:
class Encoder(tf.keras.Model):
    def __init__(self, enc_units, batch_sz, max_len):
        super(Encoder, self).__init__()
        self.batch_sz = batch_sz
        self.enc_units = enc_units
        self.max_len = max_len

        ##________ LSTM layer in Encoder ------- ##
        self.lstm_layer = tf.keras.layers.LSTM(self.enc_units,
                                               return_sequences=True,
                                               return_state=True)

    def call(self, encoder_input):
        _, state_h, state_c = self.lstm_layer(encoder_input)
        # Enncoded vector
        encoder_state = [state_h, state_c]
        return encoder_state

class Decoder(tf.keras.Model):
    def __init__(self, dec_units, batch_sz, max_len):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.max_len = max_len
        self.decoder_input = tf.zeros((self.batch_sz, self.max_len, 1))

        self.lstm_layer = tf.keras.layers.LSTM(self.dec_units,
                                               return_sequences=True)

    def call(self, encoder_state):
        x = self.lstm_layer(self.decoder_input, initial_state=encoder_state)
        output = layers.TimeDistributed(layers.Dense(5))(x)
        return output

encoder_input = tf.keras.Input(shape=(10, 10),
                               name='encoder_input')
encoder = Encoder(10, 10, 5)
encoder_state = encoder(encoder_input)

decoder= Decoder(10, 10, 5)
outputs = decoder(encoder_state)

model = keras.Model(encoder_input, outputs)
model.summary()


Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      [(None, 10, 10)]     0                                            
__________________________________________________________________________________________________
encoder (Encoder)               [(None, 10), (None,  840         encoder_input[0][0]              
__________________________________________________________________________________________________
decoder (Decoder)               (10, 5, 5)           480         encoder[0][0]                    
                                                                 encoder[0][1]                    
Total params: 1,320
Trainable params: 1,320
Non-trainable params: 0
__________________________________________________________________________________________________


You can also use [layers.RepeatVector](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RepeatVector)

In [None]:
model = keras.Sequential()
# "Encode" the input sequence using a LSTM, producing an output of size 128.
model.add(layers.LSTM(128, input_shape=(timesteps, features)))
model.add(layers.RepeatVector(output_timesteps))
model.add(layers.LSTM(128, return_sequences=True))
# Apply a dense layer to the every temporal slice of an input
model.add(layers.Dense(5, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 128)               68608     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 12, 128)           0         
_________________________________________________________________
lstm_3 (LSTM)                (None, 12, 128)           131584    
_________________________________________________________________
dense_1 (Dense)              (None, 12, 5)             645       
Total params: 200,837
Trainable params: 200,837
Non-trainable params: 0
_________________________________________________________________


In [None]:
plot_model(model, show_shapes=True)

('Failed to import pydot. You must `pip install pydot` and install graphviz (https://graphviz.gitlab.io/download/), ', 'for `pydotprint` to work.')


## Generate the data


In [None]:
max_digits = 3
max_int = 10**max_digits - 1
max_len = max_digits + 1 + max_digits
out_max_len = len(str(max_int + max_int))
print('max_digits : {0}, max_int: {1}, max_len: {2}, out_max_len: {3}'.format(
    max_digits, max_int, max_len, out_max_len))
print('max input length from {0}+{0} is {1}'.format(max_int,max_len))
print('max sum: {0}+{0}={1}'.format(max_int,max_int+max_int))

max_digits : 3, max_int: 999, max_len: 7, out_max_len: 4
max input length from 999+999 is 7
max sum: 999+999=1998


In [None]:
def generate_sample(max_len, max_int, out_max_len):
    a, b = np.random.randint(max_int, size=2)
    sentence = '{0}+{1}'.format(a, b)
    sentence = sentence + ' ' * (max_len - len(sentence))  # padding
    result = str(a + b)
    result = result + ' ' * (out_max_len - len(result))  # padding
    return sentence, result


sentences = []
results = []
seen = set()
print("Generating data...")
while len(sentences) < 50000:
    sentence, result = generate_sample(max_len, max_int, out_max_len)
    if sentence in seen:
        continue
    seen.add(sentence)
    sentences.append(sentence)
    results.append(result)
print("Total sentences:", len(sentences))
print('Some examples:', list(zip(sentences[:3], results[:3])))

Generating data...
Total sentences: 50000
Some examples: [('788+484', '1272'), ('157+876', '1033'), ('540+996', '1536')]


## Vectorize the data


In [None]:
chars = "0123456789+ "

char_indices = {c:i for i, c in enumerate(sorted(chars))}
print('char_indices', char_indices)
indices_char = {i:c for c,i in char_indices.items()}
print('indices_char', indices_char)

def vectorize_sentence(sentence, char_indices):
    x = np.zeros((len(sentence), len(char_indices)))
    for i, c in enumerate(list(sentence)):
        x[i, char_indices[c]] = 1
    return x

x = vectorize_sentence('13+11', char_indices)

print('sentence: 13+11')
print('vectorize_sentence inds:', x.argmax(-1))
print('vectorize_sentence :', x)

char_indices {' ': 0, '+': 1, '0': 2, '1': 3, '2': 4, '3': 5, '4': 6, '5': 7, '6': 8, '7': 9, '8': 10, '9': 11}
indices_char {0: ' ', 1: '+', 2: '0', 3: '1', 4: '2', 5: '3', 6: '4', 7: '5', 8: '6', 9: '7', 10: '8', 11: '9'}
sentence: 13+11
vectorize_sentence inds: [3 5 1 3 3]
vectorize_sentence : [[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]


In [None]:
def vec_to_sentence(x, indices_char):
    return "".join(indices_char[i] for i in x)

def mat_to_sentence(x, indices_char):
    x = x.argmax(axis=-1)
    return "".join(indices_char[i] for i in x)

mat_to_sentence(x, indices_char)

'13+11'

In [None]:
print("Vectorization...")
x = np.zeros((len(sentences), max_len, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), out_max_len, len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    x[i] = vectorize_sentence(sentence, char_indices)
for i, sentence in enumerate(results):
    y[i] = vectorize_sentence(sentence, char_indices)

# Explicitly set apart 10% for validation data that we never train over.
val_split = int(0.8 * len(x))
test_split = int(0.9 * len(x))

(x_train, y_train) = x[:val_split], y[:val_split]
(x_val, y_val) = x[val_split:test_split], y[val_split:test_split]
(x_test, y_test) = x[test_split:], y[test_split:]

print("Training Data:")
print(x_train.shape)
print(y_train.shape)

print("Validation Data:")
print(x_val.shape)
print(y_val.shape)

print("Test Data:")
print(x_test.shape)
print(y_test.shape)

Vectorization...
Training Data:
(40000, 7, 12)
(40000, 4, 12)
Validation Data:
(5000, 7, 12)
(5000, 4, 12)
Test Data:
(5000, 7, 12)
(5000, 4, 12)


## Build the model


In [None]:
encoded_dim = 16

In [None]:
# Encoder
encoder_input = tf.keras.Input(
    shape=(max_len, len(chars)), name='encoder_input')

# Return states in addition to output
_, state_h, state_c = layers.LSTM(encoded_dim, return_state=True, name="encoder")(
    encoder_input
)

# Enncoded vector
encoder_state = [state_h, state_c]




# Decoder
decoder_input = tf.keras.Input(
    shape=(out_max_len, 1), name='decoder_input')

# Pass the 2 states to a new LSTM layer, as initial state
decoder_output = layers.LSTM(encoded_dim, return_sequences=True, name="decoder")(
    decoder_input, initial_state=encoder_state
)
output = layers.TimeDistributed(layers.Dense(len(chars), activation='softmax'))(decoder_output)

model = keras.Model([encoder_input, decoder_input], output)
model.summary()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      [(None, 7, 12)]      0                                            
__________________________________________________________________________________________________
decoder_input (InputLayer)      [(None, 4, 1)]       0                                            
__________________________________________________________________________________________________
encoder (LSTM)                  [(None, 16), (None,  1856        encoder_input[0][0]              
__________________________________________________________________________________________________
decoder (LSTM)                  (None, 4, 16)        1152        decoder_input[0][0]              
                                                                 encoder[0][1]              

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


In [None]:
## The inputs of the decoder are zeros
decoder_input_data = np.zeros((len(x_train), out_max_len, 1))
decoder_input_data_val = np.zeros((len(x_val), out_max_len, 1))


In [None]:
epochs=30
batch_size=64

for epoch in range(1, epochs):
    print()
    print("Iteration", epoch)
    model.fit(
        [x_train, decoder_input_data],
        y_train,
        batch_size=batch_size,
        epochs=1,
        validation_data=([x_val, decoder_input_data_val], y_val),
    )

    for i in range(5):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], 1*y_val[ind]
        preds = np.argmax(model.predict([rowx, decoder_input_data_val[[0],:]]), axis=-1).flatten()
        q = mat_to_sentence(rowx[0], indices_char)
        correct = mat_to_sentence(rowy, indices_char)
        guess = vec_to_sentence(preds, indices_char)
        print()
        print("Input: ", q, "Correct output", correct)
        print('Prediction')
        if correct == guess:
            print("☑ " + guess)
        else:
            print("☒ " + guess)


Iteration 1

Input:  300+584 Correct output 884 
Prediction
☒ 100 

Input:  140+259 Correct output 399 
Prediction
☒ 100 

Input:  783+244 Correct output 1027
Prediction
☒ 100 

Input:  748+797 Correct output 1545
Prediction
☒ 1100

Input:  22+573  Correct output 595 
Prediction
☒ 10  

Iteration 2

Input:  768+442 Correct output 1210
Prediction
☒ 101 

Input:  607+580 Correct output 1187
Prediction
☒ 1410

Input:  265+328 Correct output 593 
Prediction
☒ 100 

Input:  498+921 Correct output 1419
Prediction
☒ 1400

Input:  163+48  Correct output 211 
Prediction
☒ 402 

Iteration 3

Input:  259+77  Correct output 336 
Prediction
☒ 511 

Input:  203+830 Correct output 1033
Prediction
☒ 121 

Input:  693+221 Correct output 914 
Prediction
☒ 101 

Input:  523+609 Correct output 1132
Prediction
☒ 121 

Input:  624+133 Correct output 757 
Prediction
☒ 110 

Iteration 4

Input:  897+150 Correct output 1047
Prediction
☒ 111 

Input:  632+469 Correct output 1101
Prediction
☒ 111 

Input:  15+5


Input:  871+944 Correct output 1815
Prediction
☒ 1716

Input:  927+919 Correct output 1846
Prediction
☒ 1811

Input:  316+898 Correct output 1214
Prediction
☒ 1160

Input:  316+39  Correct output 355 
Prediction
☒ 261 

Input:  449+433 Correct output 882 
Prediction
☒ 860 

Iteration 22

Input:  50+686  Correct output 736 
Prediction
☒ 622 

Input:  859+918 Correct output 1777
Prediction
☒ 1808

Input:  199+996 Correct output 1195
Prediction
☒ 1190

Input:  927+48  Correct output 975 
Prediction
☒ 970 

Input:  385+454 Correct output 839 
Prediction
☒ 880 

Iteration 23

Input:  741+616 Correct output 1357
Prediction
☒ 1301

Input:  831+414 Correct output 1245
Prediction
☒ 1253

Input:  825+932 Correct output 1757
Prediction
☒ 1753

Input:  555+662 Correct output 1217
Prediction
☒ 1253

Input:  142+311 Correct output 453 
Prediction
☒ 381 

Iteration 24

Input:  457+843 Correct output 1300
Prediction
☒ 1262

Input:  204+588 Correct output 792 
Prediction
☒ 762 

Input:  784+295 Correc

In [None]:
decoder_input_data_test = np.zeros((len(x_test), out_max_len, 1))

results = model.evaluate([x_test, decoder_input_data_test], y_test, verbose=1)
print('Test Loss: {}'.format(results[0]))
print('Test Accuracy: {}'.format(results[1]))

Test Loss: 1.3093889951705933
Test Accuracy: 0.5070499777793884


### Question 1: Find a model with test `accuracy> 0.9`


Study the influence of the encoded vector dimension

In [None]:
encoded_dim = 256

In [None]:
## Encoder
encoder_input = tf.keras.Input(
    shape=(max_len, len(chars)), name='encoder_input')

# Return states in addition to output
_, state_h, state_c = layers.LSTM(encoded_dim, return_state=True, name="encoder")(
    encoder_input
)

# Enncoded vector
encoder_state = [state_h, state_c]


# Decoder
decoder_input = tf.keras.Input(
    shape=(out_max_len, 1), name='decoder_input')

decoder_output = layers.LSTM(encoded_dim, return_sequences=True, name="decoder")(
    decoder_input, initial_state=encoder_state
)
output = layers.TimeDistributed(layers.Dense(len(chars), activation='softmax'))(decoder_output)

model = keras.Model([encoder_input, decoder_input], output)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      [(None, 7, 12)]      0                                            
__________________________________________________________________________________________________
decoder_input (InputLayer)      [(None, 4, 1)]       0                                            
__________________________________________________________________________________________________
encoder (LSTM)                  [(None, 256), (None, 275456      encoder_input[0][0]              
__________________________________________________________________________________________________
decoder (LSTM)                  (None, 4, 256)       264192      decoder_input[0][0]              
                                                                 encoder[0][1]              

In [None]:
## The inputs of the decoder are zeros
decoder_input_data = np.zeros((len(x_train), out_max_len, 1))
decoder_input_data_val = np.zeros((len(x_val), out_max_len, 1))


In [None]:
epochs=30
batch_size=64

for epoch in range(1, epochs):
    print()
    print("Iteration", epoch)
    model.fit(
        [x_train, decoder_input_data],
        y_train,
        batch_size=batch_size,
        epochs=1,
        validation_data=([x_val, decoder_input_data_val], y_val),
    )

    for i in range(5):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], 1*y_val[ind]
        preds = np.argmax(model.predict([rowx, decoder_input_data_val[[0],:]]), axis=-1).flatten()
        q = mat_to_sentence(rowx[0], indices_char)
        correct = mat_to_sentence(rowy, indices_char)
        guess = vec_to_sentence(preds, indices_char)
        print()
        print("Input: ", q, "Correct output", correct)
        print('Prediction')
        if correct == guess:
            print("☑ " + guess)
        else:
            print("☒ " + guess)


Iteration 1

Input:  828+594 Correct output 1422
Prediction
☑ 1422

Input:  312+764 Correct output 1076
Prediction
☒ 1011

Input:  367+362 Correct output 729 
Prediction
☒ 752 

Input:  693+988 Correct output 1681
Prediction
☒ 1652

Input:  870+557 Correct output 1427
Prediction
☒ 1422

Iteration 2

Input:  10+44   Correct output 54  
Prediction
☒ 11  

Input:  237+310 Correct output 547 
Prediction
☒ 566 

Input:  328+286 Correct output 614 
Prediction
☒ 616 

Input:  905+828 Correct output 1733
Prediction
☒ 1711

Input:  605+875 Correct output 1480
Prediction
☒ 1442

Iteration 3

Input:  321+640 Correct output 961 
Prediction
☒ 980 

Input:  767+762 Correct output 1529
Prediction
☒ 1522

Input:  494+210 Correct output 704 
Prediction
☒ 700 

Input:  707+924 Correct output 1631
Prediction
☒ 1610

Input:  358+418 Correct output 776 
Prediction
☒ 785 

Iteration 4

Input:  978+241 Correct output 1219
Prediction
☒ 1220

Input:  96+51   Correct output 147 
Prediction
☒ 100 

Input:  441+


Input:  836+333 Correct output 1169
Prediction
☑ 1169

Input:  319+877 Correct output 1196
Prediction
☑ 1196

Input:  600+59  Correct output 659 
Prediction
☑ 659 

Iteration 21

Input:  754+705 Correct output 1459
Prediction
☑ 1459

Input:  518+300 Correct output 818 
Prediction
☑ 818 

Input:  792+315 Correct output 1107
Prediction
☑ 1107

Input:  46+90   Correct output 136 
Prediction
☒ 146 

Input:  60+548  Correct output 608 
Prediction
☒ 508 

Iteration 22

Input:  453+586 Correct output 1039
Prediction
☑ 1039

Input:  269+920 Correct output 1189
Prediction
☑ 1189

Input:  58+268  Correct output 326 
Prediction
☑ 326 

Input:  137+893 Correct output 1030
Prediction
☑ 1030

Input:  205+310 Correct output 515 
Prediction
☑ 515 

Iteration 23

Input:  422+975 Correct output 1397
Prediction
☑ 1397

Input:  327+479 Correct output 806 
Prediction
☑ 806 

Input:  593+572 Correct output 1165
Prediction
☑ 1165

Input:  618+589 Correct output 1207
Prediction
☑ 1207

Input:  858+567 Correc

In [None]:
decoder_input_data_test = np.zeros((len(x_test), out_max_len, 1))
results = model.evaluate([x_test, decoder_input_data_test], y_test, verbose=1)
print('Test Loss: {}'.format(results[0]))
print('Test Accuracy: {}'.format(results[1]))

Test Loss: 0.022692818194627762
Test Accuracy: 0.9933500289916992


## Practice

Create a similar model for integer division, rounded to 3 decimals:
```python
'999/7' -> '142.714'
'3/4' -> '0.75'
'1/3' -> '0.333'
```

In [None]:
max_digits = 3
max_int = 10**max_digits - 1
max_len = 7#
out_max_len = 7#
print('max_digits : {0}, max_int: {1}, max_len: {2}, out_max_len: {3}'.format(
    max_digits, max_int, max_len, out_max_len))

max_digits : 3, max_int: 999, max_len: 7, out_max_len: 7


In [None]:
np.random.randint(max_int)

256

In [None]:
def generate_sample(max_len, max_int, out_max_len):
    a = np.random.randint(max_int)
    b = np.random.randint(1, max_int) # zero division
    sentence = '{0}/{1}'.format(a, b)
    sentence = sentence + ' ' * (max_len - len(sentence))  # padding
    result = str(np.round(a / b, 3))
    result = result + ' ' * (out_max_len - len(result))  # padding
    return sentence, result


sentences = []
results = []
seen = set()
print("Generating data...")
while len(sentences) < 100000:
    sentence, result = generate_sample(max_len, max_int, out_max_len)
    if sentence in seen:
        continue
    seen.add(sentence)
    sentences.append(sentence)
    results.append(result)
print("Total sentences:", len(sentences))
print('Some examples:', list(zip(sentences[:3], results[:3])))

Generating data...
Total sentences: 100000
Some examples: [('192/222', '0.865  '), ('549/991', '0.554  '), ('742/884', '0.839  ')]


In [None]:
## Data vectorization

chars = "0123456789/. "

char_indices = {c:i for i, c in enumerate(sorted(chars))}
print('char_indices', char_indices)
indices_char = {i:c for c,i in char_indices.items()}
print('indices_char', indices_char)

def vectorize_sentence(sentence, char_indices):
    x = np.zeros((len(sentence), len(char_indices)))
    for i, c in enumerate(list(sentence)):
        x[i, char_indices[c]] = 1
    return x

x = vectorize_sentence('13/11', char_indices)

print('sentence: 13/11')
print('vectorize_sentence inds:', x.argmax(-1))
print('vectorize_sentence :', x)

char_indices {' ': 0, '.': 1, '/': 2, '0': 3, '1': 4, '2': 5, '3': 6, '4': 7, '5': 8, '6': 9, '7': 10, '8': 11, '9': 12}
indices_char {0: ' ', 1: '.', 2: '/', 3: '0', 4: '1', 5: '2', 6: '3', 7: '4', 8: '5', 9: '6', 10: '7', 11: '8', 12: '9'}
sentence: 13/11
vectorize_sentence inds: [4 6 2 4 4]
vectorize_sentence : [[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]


In [None]:
def vec_to_sentence(x, indices_char):
    return "".join(indices_char[i] for i in x)

def mat_to_sentence(x, indices_char):
    x = x.argmax(axis=-1)
    return "".join(indices_char[i] for i in x)

mat_to_sentence(x, indices_char)

'13/11'

In [None]:
print("Vectorization...")
x = np.zeros((len(sentences), max_len, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), out_max_len, len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    x[i] = vectorize_sentence(sentence, char_indices)
for i, sentence in enumerate(results):
    y[i] = vectorize_sentence(sentence, char_indices)

# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]

print("Training Data:")
print(x_train.shape)
print(y_train.shape)

print("Validation Data:")
print(x_val.shape)
print(y_val.shape)

Vectorization...
Training Data:
(90000, 7, 13)
(90000, 7, 13)
Validation Data:
(10000, 7, 13)
(10000, 7, 13)


In [None]:
# Encoder
encoder_input = tf.keras.Input(
    shape=(max_len, len(chars)), name='encoder_input')

# Return states in addition to output
output, state_h, state_c = layers.LSTM(256, return_state=True, name="encoder_2")(
    encoder_input
)

# Enncoded vector
encoder_state = [state_h, state_c]

# Decoder
decoder_input = tf.keras.Input(
    shape=(out_max_len, 1), name='decoder_input')

# Pass the 2 states to a new LSTM layer, as initial state
decoder_output = layers.LSTM(256, return_sequences=True, name="decoder")(
    decoder_input, initial_state=encoder_state
)
output = layers.TimeDistributed(layers.Dense(len(chars), activation='softmax'))(decoder_output)

model = keras.Model([encoder_input, decoder_input], output)
model.summary()

Model: "model_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      [(None, 7, 13)]      0                                            
__________________________________________________________________________________________________
bidirectional_5 (Bidirectional) (None, 7, 256)       145408      encoder_input[0][0]              
__________________________________________________________________________________________________
decoder_input (InputLayer)      [(None, 7, 1)]       0                                            
__________________________________________________________________________________________________
encoder_2 (LSTM)                [(None, 256), (None, 525312      bidirectional_5[0][0]            
____________________________________________________________________________________________

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
decoder_input_data = np.zeros((len(x_train), out_max_len, 1))
decoder_input_data_val = np.zeros((len(x_val), out_max_len, 1))

In [None]:
epochs=30
batch_size=64

for epoch in range(1, epochs):
    print()
    print("Iteration", epoch)
    model.fit(
        [x_train, decoder_input_data],
        y_train,
        batch_size=batch_size,
        epochs=1,
        validation_data=([x_val, decoder_input_data_val], y_val),
    )

    for i in range(5):
        ind = np.random.randint(0, len(x_val))
        rowx, rowy = x_val[np.array([ind])], 1*y_val[ind]
        preds = np.argmax(model.predict([rowx, decoder_input_data_val[[0],:]]), axis=-1).flatten()
        q = mat_to_sentence(rowx[0], indices_char)
        correct = mat_to_sentence(rowy, indices_char)
        guess = vec_to_sentence(preds, indices_char)
        print()
        print("Input: ", q, "Correct output", correct)
        print('Prediction')
        if correct == guess:
            print("☑ " + guess)
        else:
            print("☒ " + guess)


Iteration 1

Input:  526/809 Correct output 0.65   
Prediction
☒ 0.633  

Input:  305/878 Correct output 0.347  
Prediction
☒ 0.339  

Input:  175/109 Correct output 1.606  
Prediction
☒ 1.537  

Input:  844/625 Correct output 1.35   
Prediction
☒ 1.323  

Input:  438/740 Correct output 0.592  
Prediction
☒ 0.533  

Iteration 2

Input:  354/156 Correct output 2.269  
Prediction
☒ 2.391  

Input:  266/590 Correct output 0.451  
Prediction
☒ 0.491  

Input:  952/186 Correct output 5.118  
Prediction
☒ 5.191  

Input:  657/282 Correct output 2.33   
Prediction
☒ 2.321  

Input:  490/70  Correct output 7.0    
Prediction
☒ 7.221  

Iteration 3

Input:  982/448 Correct output 2.192  
Prediction
☒ 2.263  

Input:  529/106 Correct output 4.991  
Prediction
☒ 4.413  

Input:  589/7   Correct output 84.143 
Prediction
☒ 88.5   

Input:  391/656 Correct output 0.596  
Prediction
☒ 0.583  

Input:  94/21   Correct output 4.476  
Prediction
☒ 4..5   

Iteration 4
 204/1407 [===>..................

KeyboardInterrupt: 