# Attempt 1
My first attempt was to create a simple model using LSTM to understand how the LSTM works, what should be the input, how to measure the accuracy if the model. For the first attempt I did not try to use the actual dataset taken from the contact center. Instead, I used a imdb review data set.  

Machine learning and deep learning algorithms does not deal with text directly. So the text must be first transformed into a format which is understood my model. This approach of transforming text into number vector is called word embedding. While there are many ways of achieving this, for the first attempts I chose to use the facility provided by keras. Keras provides the libraries which can be used for text pre-processing. First I used the Tokenizer which converts words into numbers.

Below is a sample code which demonstrates how Tokenizer is used.


In [3]:
from keras.preprocessing.text import Tokenizer
inputsample = ['The quick brown fox jumps over the lazy dog.', 'The fox is not lazy as the dog.']

# create a tokenizer limiting the maximum number of words to 1000
tokenizer = Tokenizer(num_words=1000)

# update the vocabulary based in input samples
tokenizer.fit_on_texts(inputsample)

# Transforms each text in texts to a sequence of integers
isequences = tokenizer.texts_to_sequences(inputsample)

# Print the results
word_index = tokenizer.word_index
print('Unique token count: %s' % len(word_index))
print('\nWord index: ', word_index)
print('\nSequences: ', isequences)

Unique token count: 11

Word index:  {'the': 1, 'fox': 2, 'lazy': 3, 'dog': 4, 'quick': 5, 'brown': 6, 'jumps': 7, 'over': 8, 'is': 9, 'not': 10, 'as': 11}

Sequences:  [[1, 5, 6, 2, 7, 8, 1, 3, 4], [1, 2, 9, 10, 3, 11, 1, 4]]


As you can see above, each unique word in the sentence was assigned with an integer. Word index above represents the words using integers. And the sequences shows the number vector representation for input text based sentences.
When we use the tokenizer to generate number vectors from words, it generates high dimensional vectors because it is a sparse array. This is because the vector is in the mode of binary. If the word is present it assigns 1, else 0. So there can be many 0s in each sentence.
To avoid generating such high dimensional vectors, we can use word embedding technique. Word embedding uses dense array by compacting the items into lower dimension vectors.
To generate work embeddings, either we can generate from our data set or we can use transfer learning techniques to use pre trained word embeddings.
Keras provides a very good library to generate word embeddings. It is important to generate the word embedding related to the problem we are going to solve. Keras word embedding with backpropagation provides a rich library to generate word embeddings. In below sample code I try to demonstrate how word embedding layer is added to a model. I used the imdb review data set for demonstrations purposes.


In [5]:
from keras.layers import Embedding
from keras.datasets import imdb
from keras import preprocessing

# limit the max unique word count to 10,000
max_features = 10000
# define the max length of a sentence to 20 words. The remaining words will be truncated
maxlen = 20

# load the imdb data set 
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# all sentences should be equal in length. Use padding to make all sentences same length
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)


Now we have a train data set and a test data set loaded from imdb. Below code I will try to demonstrate how the word embedding layer is added. I build a Sequential model starting with a Embedding layer. And the final output is a Dense layer with sigmoid activation because here the classification is 'positive' or 'negative'

In [11]:
from keras.models import Sequential
from keras.layers import Flatten, Dense
import numpy as np

# set the random seed to 3 so we keep it constant so the model is reproducable
# (without this the result may vary and validation accuracy was low)
np.random.seed(3)

# create a sequential model
model = Sequential()

# add embedding layer with 10000 input dimention and output dimension as 8, and the max input length for each sentence as 20
model.add(Embedding(input_dim=10000, output_dim = 8, input_length=maxlen))

# after embedding, we have to flattern the embeddings into 2D shape before giving the input to next Dense layer
model.add(Flatten())

# add a dense layer with a single output and sigmoid activation (since this is a binary classification porblem)
model.add(Dense(1, activation='sigmoid'))

# now we can compile the model with rmsprop optimizer and binary_crossentropy as the loss. Here we select binary_crossentropy 
# because this is a binary classification problem
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
# print the model summary
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 20, 8)             80000     
_________________________________________________________________
flatten_2 (Flatten)          (None, 160)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 161       
Total params: 80,161
Trainable params: 80,161
Non-trainable params: 0
_________________________________________________________________


Now we can train and validate the model using imdb data sets

In [13]:
# train and validate the model
history = model.fit(x_train, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Validation accuracy is close to 75% in above example when only 20 words from each sentence is used. Using more words from a sentence will help to increase this accuracy.

Above sample was done to understand how a Embedding layer works. Now we can jump into next level by using the same Embedding concept but for a multi class classifier. In this project, we intend to find a class (Intent) for an input text.

Reuters dataset from Keras is a good dataset which can be used for a multi class classification problem. I will be using reuters data so understand how word embedding and a sequential model works for a multi class classifier.


In [81]:
import tensorflow as tf
from keras.datasets import reuters
from keras.utils import np_utils
from keras.preprocessing.text import Tokenizer

(x_train_reuters, y_train_reuters), (x_test_reuters, y_test_reuters) = reuters.load_data(num_words=None, test_split=0.2)

max_words = 10000

num_classes = max(y_train_reuters) + 1

tokenizer = Tokenizer(num_words=max_words)
x_train_reuters = tokenizer.sequences_to_matrix(x_train_reuters, mode='binary')
x_test_reuters = tokenizer.sequences_to_matrix(x_test_reuters, mode='binary')

y_train_reuters = tf.keras.utils.to_categorical(y_train_reuters, num_classes)
y_test_reuters = tf.keras.utils.to_categorical(y_test_reuters, num_classes)

print('Number of classes %s' % num_classes)


Number of classes 46


We have 46 different classes in reuters dataset.

Now we can create a model.

In [85]:
from keras.layers import Dense, Dropout, Activation,LSTM


modelmc = Sequential()
modelmc.add(Dense(512, input_shape=(max_words,)))
modelmc.add(Activation('relu'))
modelmc.add(Dropout(0.5))
modelmc.add(Dense(num_classes))
modelmc.add(Activation('softmax'))


In [86]:
modelmc.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(modelmc.metrics_names)
modelmc.summary()

['loss', 'acc']
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_30 (Dense)             (None, 512)               5120512   
_________________________________________________________________
activation_11 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_31 (Dense)             (None, 46)                23598     
_________________________________________________________________
activation_12 (Activation)   (None, 46)                0         
Total params: 5,144,110
Trainable params: 5,144,110
Non-trainable params: 0
_________________________________________________________________


In [88]:
batch_size = 32
epochs = 10

historymc = modelmc.fit(x_train_reuters, y_train_reuters, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)
score = modelmc.evaluate(x_test_reuters, y_test_reuters, batch_size=batch_size, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 8083 samples, validate on 899 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 1.0783489140037543
Test accuracy: 0.8018699911483568


We have a test accuracy close to 80%

Now I will try to modify the model by adding LSTM layer

In [110]:
from keras.layers import Dense, Dropout, Activation,LSTM


modellstm = Sequential()
modellstm.add(Embedding(input_dim=10000, output_dim = 64))
modellstm.add(LSTM(32))
modellstm.add(Dense(64, input_shape=(max_words,)))
modellstm.add(Activation('relu'))
modellstm.add(Dropout(0.5))
modellstm.add(Dense(num_classes))
modellstm.add(Activation('softmax'))

In [111]:
modellstm.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

modellstm.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_28 (Embedding)     (None, None, 64)          640000    
_________________________________________________________________
lstm_14 (LSTM)               (None, 32)                12416     
_________________________________________________________________
dense_51 (Dense)             (None, 64)                2112      
_________________________________________________________________
activation_30 (Activation)   (None, 64)                0         
_________________________________________________________________
dropout_17 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_52 (Dense)             (None, 46)                2990      
_________________________________________________________________
activation_31 (Activation)   (None, 46)                0         
Total para

In [112]:
batch_size = 32
epochs = 10

historylstm = modellstm.fit(x_train_reuters, y_train_reuters, batch_size=batch_size, epochs=epochs, verbose=1, validation_split=0.1)
scorelstm = modellstm.evaluate(x_test_reuters, y_test_reuters, batch_size=batch_size, verbose=1)
print('Test loss:', scorelstm[0])
print('Test accuracy:', scorelstm[1])

Train on 8083 samples, validate on 899 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 2.419490618676027
Test accuracy: 0.36197684778237277


After adding LSTM layer, the training process for wach epoch takes abount 60 minutes

The total accurcy achived was close to 36% which is very low. Probably the way I used LSTM might be wrong. Now I should find out how to increase the accuracy with LSTM.
