### Predict the next number in the sequence

Given a set of numbers, goal of the model is to predict next number in the sequence. 

For example, model can be given input like - eight thousand one , eight thousand two , eight thousand three , eight thousand four , eight thousand five , eight thousand six , eight thousand seven , eight thousand eight , eight thousand nine , eight thousand ten , eight thousand eleven , eight thousand twelve....

Model will predict next number given the one input. Model in this notebook predicts next word given any of the words like above (multi steps prediciton). So if 20 numbers are given to the model, it will predict 20 numbers (i.e. a number after each number).



In [0]:
#### Make sure that the right version of Torch is there
!pip install torchtext==0.6.0
import torchtext
print(torchtext.__version__)

0.6.0


In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
#drive.flush_and_unmount

In [0]:
#### Setting up the right seed to make Keras result more consistent
import numpy as np
import tensorflow as tf
import random as python_random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
python_random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)


In [0]:
#### Setting up path to import important data preparation Python module
import sys
import os
sys.path.append('/content/drive/My Drive/Colab Notebooks/torch_pipe/')

In [0]:
os.getcwd()

'/content'

#### Simple DNN to do the prediction

In [0]:
from tensorflow import keras

model = keras.Sequential(
    [
        keras.layers.Embedding(len(vocab.itos), 100, input_length=train_x.shape[1]),
        keras.layers.Dense(64, activation="relu"),
        keras.layers.BatchNormalization(),
        keras.layers.Dense(len(vocab.itos), activation="softmax"),
        #keras.layers.Lambda(lambda x: x[:,-1])
    ]
)
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 20, 100)           3400      
_________________________________________________________________
dense_2 (Dense)              (None, 20, 64)            6464      
_________________________________________________________________
batch_normalization_1 (Batch (None, 20, 64)            256       
_________________________________________________________________
dense_3 (Dense)              (None, 20, 34)            2210      
Total params: 12,330
Trainable params: 12,202
Non-trainable params: 128
_________________________________________________________________


In [0]:
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

In [0]:
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=10, verbose=1,validation_data=val_batch)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


This sequential layer network didn't produce a good result. This was expected as simple DNN is not the right way to predict 1 through 21st word given 20 words sequence. Let's customize DNN to take the sequence of 20 words.

#### DNN to do the prediction

Customizing DNN to process one word at a time in a sequence. This is more like a custom RNN.

In [0]:
from tensorflow.keras import layers
from tensorflow import keras

nh = BATCH_SIZE

# Define a Functional model to do a softmax on final dense layer
inputs = keras.Input((bptt, nh))
outputs = layers.Dense(len(vocab.itos), activation="softmax")(inputs)
model = keras.Model(inputs, outputs)

class CustomRNN(tf.keras.Model):
    def __init__(self):
        super(CustomRNN, self).__init__()
        self.embedding1 = layers.Embedding(len(vocab.itos), nh, input_length=train_x.shape[1])
        self.projection_1 = layers.Dense(units=64, activation="relu")
        self.batchnormal = layers.BatchNormalization()
        # Our previously-defined Functional model
        self.classifier = model

    def call(self, inputs):
        ### Initialize the weights
        outputs = []
        if inputs.shape[0] == None:
          bs = BATCH_SIZE
        else:
          bs = inputs.shape[0]
        h = tf.zeros(shape=(bs, nh))
        ### going in the loop to pick one word at a time
        for t in range(inputs.shape[1]):
            x = inputs[:, t]
            h = h + self.embedding1(x)
            h = self.batchnormal(self.projection_1(h))
            outputs.append(h)
        features = tf.stack(outputs, axis=1)
        #print(features.shape)
        return self.classifier(features)

rnn_model = CustomRNN()
rnn_model.predict(valid_x).shape


(14080, 20, 34)

In [0]:
adam = Adam(lr=0.01)
rnn_model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

In [0]:
#### Train the model
#history = rnn_model.fit(train_x, train_y, epochs=20, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = rnn_model.fit(train_batch, epochs=20, verbose=1,validation_data=val_batch)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


#### See the results

In [0]:
validation_results = rnn_model.predict(val_batch)

In [0]:
validation_results[0].shape

(20, 34)

In [0]:
show_text(np.argmax(validation_results[20],axis=1))

'thousand one hundred two hundred five thousand seven \n eight thousand eight \n five \n nine \n five thousand ten'

In [0]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'one' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'six' ---> Prediction: 'two' 
Label: '\n ' ---> Prediction: 'hundred' 
Label: 'eight' ---> Prediction: 'five' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'five' 
Label: 'thousand' ---> Prediction: '\n' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'five' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'ten' 


In [0]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

As can be seen, performance improved quite a bit with the implmentation of custom RNN model. Let's try real RNN, GRU and LSTM models to see what happens.

#### Simple RNN

In [0]:
model = Sequential()
model.add(Embedding(len(vocab.itos), 64, input_length=(train_x.shape[1])))
model.add(Bidirectional(SimpleRNN(150,return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=10, verbose=1,validation_data=val_batch)
#print model.summary()
print(model.summary())



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_23 (Embedding)     (None, 20, 64)            2176      
_________________________________________________________________
bidirectional_5 (Bidirection (None, 20, 300)           64500     
_________________________________________________________________
dense_21 (Dense)             (None, 20, 34)            10234     
Total params: 76,910
Trainable params: 76,910
Non-trainable params: 0
_________________________________________________________________
None


(14080, 20, 34)

#### See the results

In [0]:
validation_results = model.predict(val_batch)

In [0]:
validation_results[0].shape

(20, 34)

In [0]:
show_text(np.argmax(validation_results[20],axis=1))

'\n eight hundred three \n four hundred seven \n four hundred eight \n eight hundred nine \n four thousand three'

In [0]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'six' ---> Prediction: 'three' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'four' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'four' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'hundred' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'four' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'three' 


In [0]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

As expected, performance is so much better with bi-directional RNN.

#### GRU

In [0]:
model = Sequential()
model.add(Embedding(len(vocab.itos), output_dim=64, input_length=train_x.shape[1]))
model.add( Bidirectional(GRU(units=150,return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=5, verbose=1,validation_data=val_batch)
print (model.summary())



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_21"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_25 (Embedding)     (None, 20, 64)            2176      
_________________________________________________________________
bidirectional_7 (Bidirection (None, 20, 300)           194400    
_________________________________________________________________
dense_23 (Dense)             (None, 20, 34)            10234     
Total params: 206,810
Trainable params: 206,810
Non-trainable params: 0
_________________________________________________________________
None


#### See the results

In [0]:
validation_results = model.predict(val_batch)

In [0]:
validation_results[0].shape

(20, 34)

In [0]:
show_text(np.argmax(validation_results[20],axis=1))

'\n eight thousand six \n eight thousand seven \n eight thousand eight \n eight thousand nine \n eight \n nine'

In [0]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'six' ---> Prediction: 'six' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: '\n' 
Label: 'ten' ---> Prediction: 'nine' 


In [0]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

Wow, bidirectional is the solution for it.

In [0]:
#### Let's try the stateful
model = Sequential()
model.add(Embedding(len(vocab.itos), 64, input_length=train_x.shape[1],batch_input_shape=(BATCH_SIZE,train_x.shape[1])))
model.add(Bidirectional(GRU(150,stateful=True,return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=5, verbose=1,validation_data=val_batch)
print (model.summary())


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_22"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_26 (Embedding)     (64, 20, 64)              2176      
_________________________________________________________________
bidirectional_8 (Bidirection (64, 20, 300)             194400    
_________________________________________________________________
dense_24 (Dense)             (64, 20, 34)              10234     
Total params: 206,810
Trainable params: 206,810
Non-trainable params: 0
_________________________________________________________________
None


#### LSTM

Let's do one more experiment. Try both uni directional and bi-directional network.

In [0]:
model = Sequential()
model.add(Embedding(len(vocab.itos), output_dim=64, input_length=train_x.shape[1]))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=10, verbose=1,validation_data=val_batch)
print (model.summary())

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Model: "sequential_23"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_27 (Embedding)     (None, 20, 64)            2176      
_________________________________________________________________
lstm (LSTM)                  (None, 20, 64)            33024     
_________________________________________________________________
dense_25 (Dense)             (None, 20, 34)            2210      
Total params: 37,410
Trainable params: 37,410
Non-trainable params: 0
_________________________________________________________________
None


In [0]:
#### Let's try the stateful
model = Sequential()
model.add(Embedding(len(vocab.itos), 64, input_length=train_x.shape[1],batch_input_shape=(BATCH_SIZE,train_x.shape[1])))
model.add(Bidirectional( LSTM(units=64, return_sequences=True)))
model.add(Dense(len(vocab.itos), activation='softmax'))

### Compile the model
adam = Adam(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['categorical_accuracy'])

#earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
#history = model.fit(train_x, train_y, epochs=10, batch_size=64, verbose=1,validation_data=(valid_x,valid_y))
history = model.fit(train_batch, epochs=5, verbose=1,validation_data=val_batch)
print (model.summary())

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Model: "sequential_24"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_28 (Embedding)     (64, 20, 64)              2176      
_________________________________________________________________
bidirectional_9 (Bidirection (64, 20, 128)             66048     
_________________________________________________________________
dense_26 (Dense)             (64, 20, 34)              4386      
Total params: 72,610
Trainable params: 72,610
Non-trainable params: 0
_________________________________________________________________
None


#### See the results

In [0]:
validation_results = model.predict(val_batch)

In [0]:
validation_results[0].shape

(20, 34)

In [0]:
show_text(np.argmax(validation_results[20],axis=1))

'\n eight thousand six \n eight thousand seven \n eight thousand eight \n eight thousand nine \n eight thousand nine'

In [0]:
#### Comparing the label and predictions
for i,word_idx in  enumerate(np.argmax(validation_results[20],axis=1)):
  print (f'Label: {repr(tokenizer(show_text(valid_label[20]))[i])} ---> Prediction: {repr(vocab.itos[word_idx])} ')

Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'six' ---> Prediction: 'six' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'seven' ---> Prediction: 'seven' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'eight' ---> Prediction: 'eight' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'nine' ---> Prediction: 'nine' 
Label: '\n ' ---> Prediction: '\n' 
Label: 'eight' ---> Prediction: 'eight' 
Label: 'thousand' ---> Prediction: 'thousand' 
Label: 'ten' ---> Prediction: 'nine' 


In [0]:
show_text(valid_label[0])

'\n eight thousand one \n eight thousand two \n eight thousand three \n eight thousand four \n eight thousand five'

Wow, bidirectional is the solution for it. Unidirectional is not optimum.