__Libraries:__

Given the string "0+69", the model should return a prediction: "69".

In [2]:
import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import TimeDistributed, Dense, Dropout, SimpleRNN, RepeatVector
from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback

from termcolor import colored

__Generate Data:__

we store characters in the memory, then we make two dictionaries one to tokenize the characters in the memory into numerics. we enumerate over char list, and set char values key and the indicies as values. Hence we get the tokenize representation of the characters.
then we another dictionary, kind of opposite of what we made above. just that indices will be keys here and char going to be corresponding values.

then next we define a function to return one example and its corresponding label.

In [3]:
all_chars = '0123456789+-*/'

In [4]:
num_of_features = len(all_chars)
print("Number of features: ", num_of_features)
char_to_index = dict((c, i) for i, c in enumerate(all_chars))
index_to_char = dict((i, c) for i, c in enumerate(all_chars))

Number of features:  14


In [5]:
def generate_data():
    f = np.random.randint(0, 100)
    s = np.random.randint(0, 100)
    exp = str(f) + '+' + str(s)
    label = str(f+s)
    return exp, label

generate_data()    

('29+50', '79')

__Create the Model:__

Consider these two reviews:

Review 1: This movie is not terrible at all.

Review 2: This movie is pretty decent.

Unlike regular Neural Networks, RNNs allow us to input vectors of varied lengths as examples and potentially also get labels of varied lengths. Meaning if we have input of different lengths such as in sentiment analysis of twitter tweets or imbd reviews. Length of all these input sentences varied in length. so, in normal NN we need to preprocess the input in such a way that all those sentences are padded with meaningless words to make them allof same length as of the one with longest length. whereas in RNNs we don't need to do it, it gives us equally great results even with different length input sentences

Now here we using keras to create RNN, which easy and efficient of way buliding a RNN model. We going to use kears' simple RNN layer, i.e., a fully connected RNN layer and ouputs of this layer are fed back into the RNN model. This uses a tanh activation function by default and we leave it as it is.

We using total 128 hidden layers. Two nos. in the input will be 0 to 99, so maximum length of a input expression will be 5. Can also be said as the maximum no. time steps in input sequences.
The model we making has two section to it, first part called the encoder is single simple RNN layer.

Then we create first layer, the simple RNN layer with arguments, no. of hidden layer, input shape which again has two argument, time steps and num of features defined above. And output of this layer will be a single vector representation. And to acheive this single vector representation of this layer we use repeat vector layer. In it we specify no. of times it should repeat, which is maximum no. of time steps. This makes the encoder part, and it's output will be fed to decoder.


Now in decoder, need to create another simple RNN layer. And this fill take vector representation as input and generate a predicted sequence. It will written simlar to the one above, but since we need to return the sequence we set return sequence argument to true. 
Then out of this layer will go into a dense layer with softmax activation function. Because we need possibility of various possible charaters for each time step.
Now we using a dense layer, but we encapsulate it inside a tie disturbution layer. So that the model knows we want to apply dense layer to individual time steps and hidden state is different for different time steps.
So we make a time distribution layer inside it we write dense layer with arguments, num of features, we want to find probability distributions of our characters. And other argument is mentioning activation to softmax.

Then usual, compiling the model. Stating the loss function, to 'categorical crossentropy' because we have a mulitclassification problem where we have one hot encoded dara. Setting optimizer to 'adam'. And will use 'accuracy' as training metrics.

In [6]:
hidden_units = 128
max_time_steps = 5

model = Sequential([
    SimpleRNN(hidden_units, input_shape=(None, num_of_features)),
    RepeatVector(max_time_steps),
    SimpleRNN(hidden_units, return_sequences=True),
    TimeDistributed(Dense(num_of_features, activation='softmax'))
])

model.compile(
    loss='categorical_crossentropy', 
    optimizer='adam', 
    metrics=['accuracy']
)

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn (SimpleRNN)       (None, 128)               18304     
_________________________________________________________________
repeat_vector (RepeatVector) (None, 5, 128)            0         
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 5, 128)            32896     
_________________________________________________________________
time_distributed (TimeDistri (None, 5, 14)             1806      
Total params: 53,006
Trainable params: 53,006
Non-trainable params: 0
_________________________________________________________________


__Vectorize and De-Vectorize Data:__

We create the model we want to train, also have the data or at least a way to create data but it's not in the desired format we want. We want to vectorize the string data, so that it can be used with the RNN model we made above.

Writing a funtion to vectorize the example and label pair generated via above generate data fucntion. 
Funtion will take example and label as its aruments. Then creating placeholdeer for example and label, having shape as (max time steps, no. of features).
Then store the difference between max time steps and example, and max time steps and label.
Essentially, what we doing is one hot encoding it just loop thorugh all the characters and save the character to index representation in the placeholders we created.
And we seperately loop through it if there's padding at the beginning.

Now to check whether it worked properly we need to de-vectorized the examples and label back to string. Although its not just to verify, we only need to vectorized version for the model. Hence, we need convert some test examples back to human readable format so that we can read and verify the results.
Funtion would take argument as example or lable, doesn't we need doing this to verify only. 
We use index to character dictionary to convert indices back into character. And we will be argmax funtion to get the maximum value which in this case is 1. And enumerate through indices can vectors and this will give a string of characters. 

In [7]:
def vectorize_example(exp, label):
    x = np.zeros((max_time_steps, num_of_features))
    y = np.zeros((max_time_steps, num_of_features))
    
    diff_x = max_time_steps - len(exp)
    diff_y = max_time_steps - len(label)
    
    for i, c in enumerate(exp):
        x[i + diff_x, char_to_index[c]] = 1
    for i in range(diff_x):
        x[i, char_to_index['0']] = 1
    for i, c in enumerate(label):
        y[i + diff_y, char_to_index[c]] = 1
    for i in range(diff_y):
        y[i, char_to_index['0']] = 1
    
    return x, y

e, l = generate_data()
print(e, l)
x, y = vectorize_example(e, l)
print(x.shape, y.shape)

15+2 17
(5, 14) (5, 14)


We not printing x and y because they just be two arrays, with 0's on most the places accept where there's our character represented by will 1's.

In [8]:
def devectorize_example(exp):
    result = [index_to_char[np.argmax(vec)] for i, vec in enumerate(exp)]
    return ''.join(result)

devectorize_example(x)

'015+2'

In [9]:
devectorize_example(y)

'00017'

__Create Dataset:__


In [33]:
def create_dataset(num_examples=2000):
    x = np.zeros((num_examples, max_time_steps, num_of_features))
    y = np.zeros((num_examples, max_time_steps, num_of_features))
    
    for i in range(num_examples):
        e, l = generate_data()
        e_v, l_v = vectorize_example(e, l)
        
        x[i] = e_v
        y[i] = l_v
        
    return x, y

x, y = create_dataset()
print(x.shape, y.shape)

(2000, 5, 14) (2000, 5, 14)


In [34]:
devectorize_example(x[0])

'73+14'

In [35]:
devectorize_example(y[0])

'00087'

__Training the Model:__

To train the model, we need some callbacks. One to simplfy the logging and we possibly going to train the model for 100s of epochs so need too keep log of validation accuracy. Its a pretty lenghty task so, we use lambda function to print out only the validation accuracy. 
Second callback is an early stopping callback by monitoring validation loss and set patience as 10. 
And then we simply fit the model.
And accuracy we will get will be character level not the entire sequence.

In [36]:
l_cb = LambdaCallback(
    on_epoch_end = lambda e, l: print('{:.2f}'.format(l['val_accuracy']), end = ' _ ')
)

es_cb = EarlyStopping(monitor='val_loss', patience=10)

model.fit(x, y, epochs=500, batch_size=256, validation_split=0.2, 
         verbose=False, callbacks=[es_cb,l_cb])

0.88 _ 0.87 _ 0.89 _ 0.88 _ 0.92 _ 0.93 _ 0.95 _ 0.95 _ 0.96 _ 0.96 _ 0.96 _ 0.97 _ 0.96 _ 0.97 _ 0.97 _ 0.97 _ 0.97 _ 0.97 _ 0.97 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.97 _ 0.97 _ 0.97 _ 0.97 _ 0.97 _ 0.98 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.97 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.97 _ 0.98 _ 0.98 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.97 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 0.98 _ 

<tensorflow.python.keras.callbacks.History at 0x14611bc8>

In [41]:
x_test, y_test = create_dataset(10)
preds = model.predict(x_test)

for i , pred in enumerate(preds):
    y = devectorize_example(y_test[i])
    y_hat = devectorize_example(pred)
    if y == y_hat:
        col = 'green'
    else: 
        y != y_hat
        col = 'red'
        
    out = 'Input: ' + devectorize_example(x_test[i]) +  ' Out: ' + y + ' Pred: ' + y_hat
    
    print(colored(out, col))

[32mInput: 62+33 Out: 00095 Pred: 00095[0m
[32mInput: 85+54 Out: 00139 Pred: 00139[0m
[32mInput: 53+50 Out: 00103 Pred: 00103[0m
[32mInput: 38+84 Out: 00122 Pred: 00122[0m
[32mInput: 25+81 Out: 00106 Pred: 00106[0m
[32mInput: 80+80 Out: 00160 Pred: 00160[0m
[31mInput: 77+22 Out: 00099 Pred: 00009[0m
[32mInput: 70+66 Out: 00136 Pred: 00136[0m
[32mInput: 37+92 Out: 00129 Pred: 00129[0m
[32mInput: 07+11 Out: 00018 Pred: 00018[0m
