# Code 5-38 to Code 5-51 and Code 5-65

This is a  “google colab” code. Please run this in google colab using  GPUs. Colaboratory (https://colab.research.google.com/notebooks/) is a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud. You could get started and set up a “colab” jupyter notebook.

In [5]:
import pandas as pd

Mounting the drive. Please make sure the files are in google drive before you mount the drive

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Getting the files prepared in 2_Letsgo_preprocess
1. Template file - dict_templ.pkl
2. Preprocessed file for training - lets_go_model_set1.csv

In [7]:
import pickle
root_dir = "/content/drive/My Drive/Newbounty_resync/Apress/Chapters/Chapter5"
base_fl = root_dir + '/dict_templ.pkl'
pkl_file = open(base_fl, 'rb')
df_sents = pickle.load(pkl_file)

In [8]:
base_fl_csv =  root_dir + '/lets_go_model_set1.csv'
t1 = pd.read_csv(base_fl_csv)

In [9]:
t1.head()

Unnamed: 0.1,Unnamed: 0,user_id,corrected_cust,corrected_bot1_shift,bots_templ_list_shift
0,0,2061123000,place_name at place_name time is it is the...,leaving from placename sep_sent is this co...,templ_473 templ_1190
1,1,2061123000,place_name,leaving from placename sep_sent is this co...,templ_473 templ_1190
2,2,2061123000,yes,leaving from placename sep_sent is this co...,templ_473 templ_1190 templ_211
3,3,2061123000,place_name place_name of num_th place_name,going to numth placename sep_sent is this ...,templ_902 templ_1190
4,4,2061123000,yes,going to numth placename sep_sent is this ...,templ_902 templ_1190 templ_1055


We are adding “start” and “end” tags to the sentences. This is done to introduce the lag in the decoder architecture

In [10]:
t1["bots_templ_list_shift"]  = 'start ' + t1["bots_templ_list_shift"] + ' end'

We are creating tokenizers one for encoder and for decoder

In [11]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer()
tokenizer1 = Tokenizer()

In [12]:
en_col_tr = list(t1["corrected_cust"].str.split())
de_col_tr = list(t1["bots_templ_list_shift"].str.split())


In [13]:
de_col_tr[0]

['start', 'templ_473', 'templ_1190', 'end']

We will now tokenize the sentences in customer and bot text (encoder input and decoder input respectively) and pad them based on max length of sentences in the corpus

In [14]:
tokenizer.fit_on_texts(en_col_tr)
en_tr1 = tokenizer.texts_to_sequences(en_col_tr)
tokenizer1.fit_on_texts(de_col_tr)
de_tr1 = tokenizer1.texts_to_sequences(de_col_tr)


In [15]:
def get_max_len(list1):
    len_list = [len(i) for i in list1]
    return max(len_list)

In [16]:
max_len1 = get_max_len(en_tr1)
max_len2 = get_max_len(de_tr1)

en_tr2 = pad_sequences(en_tr1, maxlen=max_len1, dtype='int32', padding='post')
de_tr2 = pad_sequences(de_tr1, maxlen=max_len2, dtype='int32', padding='post')
de_tr2.shape,en_tr2.shape,max_len1,max_len2

((7760, 27), (7760, 28), 28, 27)

We can see that encoder and decoder input have 2 dimensional shapes. Since we are directly feeding them to LSTM (without embedding layer) we need to convert them into 3 dimensional arrays. This is done by converting the sequence of words to one hot encoded forms

In [17]:
from keras.utils import to_categorical
en_tr3 = to_categorical(en_tr2)
de_tr3 = to_categorical(de_tr2)
en_tr3.shape, de_tr3.shape


((7760, 28, 733), (7760, 27, 1506))

The arrays are now 3-dimensional. Please note that so far we have only inputs defined. We have to now define the outputs. The output of the model is the decoder with a t +1 timstep. We want to predict given the last word the next word.

In [18]:
import numpy as np
from scipy.ndimage.interpolation import shift
de_target3 = np.roll(de_tr3, -1,axis=1)
de_target3[:,-1,:]=0

Saving number of encoder and decoder tokens to define model inputs

In [19]:
num_encoder_tokens = en_tr3.shape[2]
num_decoder_tokens = de_tr3.shape[2]

In [20]:
max_len1,num_decoder_tokens,num_encoder_tokens

(28, 1506, 733)

In [21]:
num_encoder_tokens,num_decoder_tokens,en_tr3.shape[2]

(733, 1506, 733)

In [22]:
import tensorflow as tf
tf.test.gpu_device_name()


'/device:GPU:0'

In the encoder - decoder architecture with teacher forcing there is a difference between the training and inferencing steps.The code is derived from this article https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html . First we see the steps for training. Here we are defining LSTMs with 100 hidden nodes for each step of the encoder-input (customer text). We keep the final cell state and the hidden state of the encoder and discard the outputs from each of the LSTM cells. Decoder takes input from the bot text and initializes the initial state from the encoder output. Since the encoder has 100 hidden nodes so does the decoder also has 100 hidden nodes. The output of decoder at each cell is passed to a dense layer. The network is trained with the  time adjusted of “decoder input”. Basically encoder values and decoder inputs at t-1 predicting the bot text at t.

In [23]:
from keras.models import Model
from keras.layers import Input, LSTM, Dense

encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(100, dropout=.2,return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None, num_decoder_tokens))

decoder_lstm = LSTM(100, return_sequences=True, return_state=True,dropout=0.2)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)



We will now see the set up for model inference.

Encoder Output
First we prepare a layer to get the encoder states. These are the initial states for the decoder

Decoder Input
The decoder input is not known when the model runs. Hence we will have to use the architecture to predict one word at a time and use that word to predict the next word. The decoder part of the model takes the decoder (time delayed input). For the “first” word the decoder model is initialized with encoder states. “decoder_lstm layer” is called using decoder inputs and the initialized encoder states.This layer provides 2 sets of outputs. The output set of the cells (decoder_ouput) and the final output of cell_state and hidden_state. The “decoder_ouput”” is passed to the dense layer to get the final prediction of the bot text. The decoder states output is used to update states for the next run (next word prediction). The bot_text is appended to the the decoder input and the same process is repeated.


In [24]:
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(100,))
decoder_state_input_c = Input(shape=(100,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

we will be training the model

In [25]:
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
fp = root_dir + "/models/best_collab_model.h5"

callbacks = [EarlyStopping(monitor='val_loss', patience=2),
             ModelCheckpoint(filepath=fp, monitor='val_loss', save_best_only=True)]

#Adam(lr=0.001)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([en_tr3, de_tr3], de_target3,
          batch_size=30,
          epochs=30,
          validation_split=0.2)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x7fb4f82c4cc0>

We now want to test if the model works. So we save the model objects. Use relevant “root_dir” folder here

In [None]:
from tensorflow.keras.models import save_model
dest_folder = root_dir + '/collab_models/'
encoder_model.save( dest_folder + 'enc_model_collab_211_redo')
decoder_model.save( dest_folder + 'dec_model_collab_211_redo')

In [None]:
import pickle
dest_folder = root_dir + '/collab_models/'
output = open(dest_folder + 'tokenizer_en_redo_1.pkl', 'wb')

# Pickle dictionary using protocol 0.
pickle.dump(tokenizer, output)

dest_folder = root_dir + '/collab_models/'
output1 = open(dest_folder + 'tokenizer_de_redo.pkl', 'wb')
pickle.dump(tokenizer1, output1)


# This is a differnt section on using Bi-direc LSTM instead of LSTM

Bi-directional LSTM - Encoder and Decoder Model creation could be replaced with the below code for bidrectional LSTMS


Encoder
Bidirectional LSTMs have additional outputs of forward and backward hidden states (forward_h, forward_c, backward_h, backward_c). These are then concatenated (hidden states together and cell states together) to get the final encoder states.

Decoder: 
Decoder has unidirectional LSTM as with our earlier case. However the number of hidden units is equal to the concatenated hidden units of bidirectional LSTM of the encoder. In our case, encoder LSTMs have 100 each and decoder has 200 units.


In [None]:
from keras.layers import LSTM,Bidirectional,Input,Concatenate

from keras.models import Model
from keras.models import Model
from keras.layers import Input, LSTM, Dense


n_units = 100
n_input = num_encoder_tokens
n_output = num_decoder_tokens

# encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = Bidirectional(LSTM(n_units, return_state=True))
encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder(encoder_inputs)
state_h = Concatenate()([forward_h, backward_h])
state_c = Concatenate()([forward_c, backward_c])
encoder_states = [state_h, state_c]

# decoder
decoder_inputs = Input(shape=(None, n_output))    
decoder_lstm = LSTM(n_units*2, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)


# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units*2,))
decoder_state_input_c = Input(shape=(n_units*2,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
