<a href="https://colab.research.google.com/github/sangeetsaurabh/PyTorch_Keras_Experiment/blob/master/Text_Number_Prediction/TextNumber_prediction_multiple_steps_keras_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Predict the next number in the sequence

Given a set of numbers, goal of the model is to predict next number in the sequence. 

For example, model can be given input like - eight thousand one , eight thousand two , eight thousand three , eight thousand four , eight thousand five , eight thousand six , eight thousand seven , eight thousand eight , eight thousand nine , eight thousand ten , eight thousand eleven , eight thousand twelve....

Model will predict next number given the one input. Model in this notebook predicts next word given any of the words like above (multi steps prediciton). So if 20 numbers are given to the model, it will predict 20 numbers (i.e. a number after each number).



In [1]:
#### Make sure that the right version of Torch is there
!pip install torchtext==0.6.0
import torchtext
print(torchtext.__version__)

Collecting torchtext==0.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/f2/17/e7c588245aece7aa93f360894179374830daf60d7ed0bbb59332de3b3b61/torchtext-0.6.0-py3-none-any.whl (64kB)
[K     |█████                           | 10kB 21.0MB/s eta 0:00:01[K     |██████████▏                     | 20kB 1.7MB/s eta 0:00:01[K     |███████████████▎                | 30kB 2.6MB/s eta 0:00:01[K     |████████████████████▍           | 40kB 3.4MB/s eta 0:00:01[K     |█████████████████████████▌      | 51kB 2.1MB/s eta 0:00:01[K     |██████████████████████████████▋ | 61kB 2.5MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.3MB/s 
Collecting sentencepiece
[?25l  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 8.2MB/s 
Installing collected packages: sentencepiece, torchtext
  Found ex

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
#drive.flush_and_unmount

In [0]:
#### Setting up the right seed to make Keras result more consistent
import numpy as np
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
python_random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)


In [0]:
#### Setting up path to import important data preparation Python module
import sys
import os
sys.path.append('/content/drive/My Drive/Colab Notebooks/torch_pipe/')

In [6]:
os.getcwd()

'/content'

In [0]:
#### Using torch utilities to prepare the features. Importing all the important files
import torch
import torch.nn as nn
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
from Util.human_language_modeling import *
from torch.utils.data import DataLoader
import torch.nn.functional as F
import time
import logging

In [0]:
#### Enabling logging
import logging
logger = logging.getLogger()
fhandler = logging.FileHandler(filename='mylog.log', mode='a')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fhandler.setFormatter(formatter)
logger.addHandler(fhandler)
logger.setLevel(logging.DEBUG)

In [0]:
#### Setting up the batch size and length of the sequence
BATCH_SIZE = 64 ## defining the batch size
bptt = 20 ## back propogration through LSTM
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [0]:
### A simplie python function to show text given an array of vectors
def show_text(input_vector):
    separator = ' '
    txt = separator.join([vocab.itos[i] for i in input_vector])
    return txt

#### Download the train and validation data

In [11]:
tokenizer = get_tokenizer("spacy")
train_dataset, valid_dataset = HumanNumbers(root='data',bptt=bptt,batch_size=BATCH_SIZE,data_select=('train', 'valid'))
vocab = train_dataset.get_vocab()

0lines [00:00, ?lines/s]

<function tokenizer at 0x7efd7de9a158>


8001lines [00:00, 18674.64lines/s]


51200
51200
torch.Size([51200, 20])
torch.Size([51200, 20])
14080
14080
torch.Size([14080, 20])
torch.Size([14080, 20])


#### Extract the features for Keras/Tensor Flow implementation

In [0]:
import tensorflow as tf

from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Bidirectional, GRU, SimpleRNN
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import initializers

In [0]:
#### Building input features and lables for machine learning models
train_x = train_dataset.input_data.numpy()
train_label = train_dataset.label_data.numpy().astype(int)
train_y = tf.keras.utils.to_categorical(train_label, num_classes=len(vocab.itos))

valid_x = valid_dataset.input_data.numpy()
valid_label = valid_dataset.label_data.numpy()
valid_y = tf.keras.utils.to_categorical(valid_label, num_classes=len(vocab.itos))

In [22]:
print(train_x.shape)
print (train_y.shape)
print(valid_x.shape)
print(valid_y.shape)

(51200, 20)
(51200, 20, 34)
(14080, 20)
(14080, 20, 34)


##### Create the batch data for Train and Validation set

In [0]:
#### Setting up Keras dataset to feed into machine learning models
BUFFER_SIZE = train_x.shape[0] ## Shuffling the data across entire dataset before building the batch

train_batch = tf.data.Dataset.from_tensor_slices((train_x, train_y))
train_batch = train_batch.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

val_batch = tf.data.Dataset.from_tensor_slices((valid_x, valid_y))
val_batch = val_batch.batch(BATCH_SIZE)

#### Simple DNN to do the prediction

In [24]:
from tensorflow import keras

model = keras.Sequential(
    [
        keras.layers.Embedding(len(vocab.itos), 100, input_length=train_x.shape[1]),
        keras.layers.Dense(64, activation="relu"),
        keras.layers.BatchNormalization(),
        keras.layers.Dense(len(vocab.itos), activation="softmax"),
        #keras.layers.Lambda(lambda x: x[:,-1])
    ]
)
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, 20, 100)           3400      
_________________________________________________________________
dense_4 (Dense)              (None, 20, 64)            6464      
_________________________________________________________________
batch_normalization_1 (Batch (None, 20, 64)            256       
_________________________________________________________________
dense_5 (Dense)              (None, 20, 34)            2210      
Total params: 12,330
Trainable params: 12,202
Non-trainable params: 128
_________________________________________________________________


In [0]:
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

In [27]:
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=10, verbose=1,validation_data=val_batch)


Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


This sequential layer network didn't produce a good result. This was expected as simple DNN is not the right way to predict 1 through 21st word given 20 words sequence. Let's customize DNN to take the sequence of 20 words.

#### DNN to do the prediction

Customizing DNN to process one word at a time in a sequence. This is more like a custom RNN.

In [28]:
from tensorflow.keras import layers
from tensorflow import keras

nh = BATCH_SIZE

# Define a Functional model to do a softmax on final dense layer
inputs = keras.Input((bptt, nh))
outputs = layers.Dense(len(vocab.itos), activation="softmax")(inputs)
model = keras.Model(inputs, outputs)

class CustomRNN(tf.keras.Model):
    def __init__(self):
        super(CustomRNN, self).__init__()
        self.embedding1 = layers.Embedding(len(vocab.itos), nh, input_length=train_x.shape[1])
        self.projection_1 = layers.Dense(units=64, activation="relu")
        self.batchnormal = layers.BatchNormalization()
        # Our previously-defined Functional model
        self.classifier = model

    def call(self, inputs):
        ### Initialize the weights
        outputs = []
        if inputs.shape[0] == None:
          bs = BATCH_SIZE
        else:
          bs = inputs.shape[0]
        h = tf.zeros(shape=(bs, nh))
        ### going in the loop to pick one word at a time
        for t in range(inputs.shape[1]):
            x = inputs[:, t]
            h = h + self.embedding1(x)
            h = self.batchnormal(self.projection_1(h))
            outputs.append(h)
        features = tf.stack(outputs, axis=1)
        #print(features.shape)
        return self.classifier(features)

rnn_model = CustomRNN()
rnn_model.predict(valid_x).shape


(14080, 20, 34)

In [0]:
adam = Adam(lr=0.01)
rnn_model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

In [30]:
#### Train the model
#history = rnn_model.fit(train_x, train_y, epochs=20, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = rnn_model.fit(train_batch, epochs=20, verbose=1,validation_data=val_batch)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


#### See the results

In [0]:
validation_results = rnn_model.predict(val_batch)

In [32]:
validation_results[0].shape

(20, 34)

In [33]:
show_text(np.argmax(validation_results[20],axis=1))

'thousand nine hundred six hundred six thousand eight \n eight thousand eight \n eight thousand eight \n eight thousand ten'

In [34]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'nine' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'six' ---> Prediction: 'six' 
Label: '\n ' ---> Prediction: 'hundred' 
Label: 'eight' ---> Prediction: 'six' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'nine' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'ten' 


In [35]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

As can be seen, performance improved quite a bit with the implmentation of custom RNN model. Let's try real RNN, GRU and LSTM models to see what happens.

#### Simple RNN

In [36]:
model = Sequential()
model.add(Embedding(len(vocab.itos), 64, input_length=(train_x.shape[1])))
model.add(Bidirectional(SimpleRNN(150,return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=10, verbose=1,validation_data=val_batch)
#print model.summary()
print(model.summary())



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 20, 64)            2176      
_________________________________________________________________
bidirectional_1 (Bidirection (None, 20, 300)           64500     
_________________________________________________________________
dense_8 (Dense)              (None, 20, 34)            10234     
Total params: 76,910
Trainable params: 76,910
Non-trainable params: 0
_________________________________________________________________
None


#### See the results

In [0]:
validation_results = model.predict(val_batch)

In [38]:
validation_results[0].shape

(20, 34)

In [39]:
show_text(np.argmax(validation_results[20],axis=1))

'\n eight hundred six \n eight thousand seven \n eight hundred eight \n eight hundred nine \n one thousand seven'

In [40]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'six' ---> Prediction: 'six' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'one' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'seven' 


In [41]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

As expected, performance is so much better with bi-directional RNN.

#### GRU

In [42]:
model = Sequential()
model.add(Embedding(len(vocab.itos), output_dim=64, input_length=train_x.shape[1]))
model.add( Bidirectional(GRU(units=150,return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=5, verbose=1,validation_data=val_batch)
print (model.summary())



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_6 (Embedding)      (None, 20, 64)            2176      
_________________________________________________________________
bidirectional_2 (Bidirection (None, 20, 300)           194400    
_________________________________________________________________
dense_9 (Dense)              (None, 20, 34)            10234     
Total params: 206,810
Trainable params: 206,810
Non-trainable params: 0
_________________________________________________________________
None


#### See the results

In [0]:
validation_results = model.predict(val_batch)

In [44]:
validation_results[0].shape

(20, 34)

In [45]:
show_text(np.argmax(validation_results[20],axis=1))

'\n eight thousand six \n eight thousand seven \n eight thousand eight \n eight thousand nine \n eight thousand nine'

In [46]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'six' ---> Prediction: 'six' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'nine' 


In [47]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

Wow, bidirectional is the solution for it.

In [48]:
#### Let's try the stateful
model = Sequential()
model.add(Embedding(len(vocab.itos), 64, input_length=train_x.shape[1],batch_input_shape=(BATCH_SIZE,train_x.shape[1])))
model.add(Bidirectional(GRU(150,stateful=True,return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=5, verbose=1,validation_data=val_batch)
print (model.summary())


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_7 (Embedding)      (64, 20, 64)              2176      
_________________________________________________________________
bidirectional_3 (Bidirection (64, 20, 300)             194400    
_________________________________________________________________
dense_10 (Dense)             (64, 20, 34)              10234     
Total params: 206,810
Trainable params: 206,810
Non-trainable params: 0
_________________________________________________________________
None


#### LSTM

Let's do one more experiment. Try both uni directional and bi-directional network.

In [49]:
model = Sequential()
model.add(Embedding(len(vocab.itos), output_dim=64, input_length=train_x.shape[1]))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=10, verbose=1,validation_data=val_batch)
print (model.summary())

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 20, 64)            2176      
_________________________________________________________________
lstm_1 (LSTM)                (None, 20, 64)            33024     
_________________________________________________________________
dense_11 (Dense)             (None, 20, 34)            2210      
Total params: 37,410
Trainable params: 37,410
Non-trainable params: 0
_________________________________________________________________
None


In [50]:
#### Let's try the stateful
model = Sequential()
model.add(Embedding(len(vocab.itos), 64, input_length=train_x.shape[1],batch_input_shape=(BATCH_SIZE,train_x.shape[1])))
model.add(Bidirectional( LSTM(units=64, return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=5, verbose=1,validation_data=val_batch)
print (model.summary())

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_9 (Embedding)      (64, 20, 64)              2176      
_________________________________________________________________
bidirectional_4 (Bidirection (64, 20, 128)             66048     
_________________________________________________________________
dense_12 (Dense)             (64, 20, 34)              4386      
Total params: 72,610
Trainable params: 72,610
Non-trainable params: 0
_________________________________________________________________
None


#### See the results

In [0]:
validation_results = model.predict(val_batch)

In [52]:
validation_results[0].shape

(20, 34)

In [53]:
show_text(np.argmax(validation_results[20],axis=1))

'\n eight thousand six \n eight thousand seven \n eight thousand eight \n eight thousand nine \n eight thousand nine'

In [54]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'six' ---> Prediction: 'six' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'nine' 


In [55]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

Wow, bidirectional is the solution for it. Unidirectional is not optimum.