# Complex Models

We will cover following models:
* Embedding => LSTM => Class
* Embedding => GRU => Class
* Embedding => Seq2Seq => Class
* Embedding => Seq2Seq with Attention => Class

### Load, tokenize and embedding of data
For details of this section please see [Models Notebook](Models.iynb)

In [1]:
# read data
import pandas as pd
train_csv = './data/toxic-comments/train.csv'
train_df = pd.read_csv(train_csv)

rowsums=train_df.iloc[:,2:].sum(axis=1)
train_df['clean']=(rowsums==0)
train_texts = train_df['comment_text']
train_labels = train_df['clean']

In [2]:
# tokenization
from keras.preprocessing.text import Tokenizer
max_vocab_size = 10000
tokenizer = Tokenizer(num_words=max_vocab_size)
tokenizer.fit_on_texts(train_texts)
sequences = tokenizer.texts_to_sequences(train_texts)

Using TensorFlow backend.


In [3]:
# batching, pre-processing for embedding layer
from keras import preprocessing
training_sequences = sequences[:10000]
training_labels = train_labels[:10000]
seq_max_len = 20
# training padded sequences
train_seq_pad = preprocessing.sequence.pad_sequences(sequences=training_sequences, maxlen=seq_max_len)

# testing padded sequences
testing_sequences = sequences[10000:11000]
testing_labels = train_labels[10000:11000]
test_seq_pad = preprocessing.sequence.pad_sequences(sequences=testing_sequences, maxlen=seq_max_len)

### Model 1. : Embedding to LSTML to Class

#### Define the model 1
Model 1 is made of 2 layers:
    - Layer 1 is Embedding layer
    - Layer 2 is LSTM layer

- Layer 2 is classification (Dense) Layer

In [4]:
# for details about layer 1 and layer 3 code, please check Models.ipynb

from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.layers.embeddings import Embedding

model_1 = Sequential()

vocab_size = 10000 
embedding_dim = 8 
seq_max_len = 20 
model_1.add(Embedding(vocab_size, embedding_dim, input_length=seq_max_len))
#LSTM: dimentionality
model_1.add(LSTM(128))

model_1.add(Dense(1, activation='sigmoid'))

model_1.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

model_1.summary()

history_1 = model_1.fit(train_seq_pad, training_labels, epochs=10, batch_size=32, validation_split=0.2)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 20, 8)             80000     
_________________________________________________________________
lstm_1 (LSTM)                (None, 128)               70144     
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 129       
Total params: 150,273
Trainable params: 150,273
Non-trainable params: 0
_________________________________________________________________


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 8000 samples, validate on 2000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Test the model 1


In [5]:
print(model_1.metrics_names)
model_1.evaluate(x=test_seq_pad, y=testing_labels)

['loss', 'acc']


[0.21064355981349944, 0.9300000071525574]

We can extend the model by adding more LSTM layers in between and for the intermediate layers need to return output sequence for next layer.

#### Extended model 2
Extended model 2 is made of 5 layers:

- Layer 1 is Embedding layer
- Layer 2 is LSTM RNN layer (return full sequence)
- Layer 3 is LSTM RNN layer (return full sequence)
- Layer 4 is LSTM RNN layer (return last output)
- Layer 5 is classification (Dense) layer 


In [10]:
model_1_ext = Sequential()
model_1_ext.add(Embedding(vocab_size, embedding_dim))
# for intermediate layers, we want to return output of each cell of the RNN, 
# so that it forms a seq. which is processed by next RNN layer
model_1_ext.add(LSTM(32, return_sequences=True))
model_1_ext.add(LSTM(64, return_sequences=True))
# in final RNN layer we will not return the sequence but only the final cell output,
# which is use in the next non RNN layer e.g. Dense layer in this case
model_1_ext.add(LSTM(32))
model_1_ext.add(Dense(1, activation='sigmoid'))
model_1_ext.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model_1_ext.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, None, 16)          160000    
_________________________________________________________________
lstm_5 (LSTM)                (None, None, 32)          6272      
_________________________________________________________________
lstm_6 (LSTM)                (None, None, 64)          24832     
_________________________________________________________________
lstm_7 (LSTM)                (None, 32)                12416     
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 33        
Total params: 203,553
Trainable params: 203,553
Non-trainable params: 0
_________________________________________________________________


#### Train the extended model 1

In [11]:
history_1_ext = model_1_ext.fit(train_seq_pad, training_labels, epochs=10, batch_size=32, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 8000 samples, validate on 2000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Test the extended model 1

In [12]:
print(model_1_ext.metrics_names)
model_1_ext.evaluate(x=test_seq_pad, y=testing_labels)

['loss', 'acc']


[0.2308857423812151, 0.925000011920929]

### Model 2: Embedding => GRU => Class
In this model 2 we will update the Model 1 by replacing LSTM layer by GRU.


#### Define the model 2
Model 2 is made of 3 layers:
    - Layer 1 is Embedding layer
    - Layer 2 is GRU layer
    - Layer 3 is classification (Dense) layer 

In [6]:
from keras.models import Sequential
from keras.layers import Dense, Embedding, GRU

# model configurations
vocab_size = 10000
seq_max_len = 20 # this can be removed as it is not required for next layer which is RNN
embedding_dim = 16

# model definition
model_2 = Sequential()
model_2.add(Embedding(vocab_size, embedding_dim, input_length=seq_max_len))
model_2.add(GRU(32))
model_2.add(Dense(1, activation='sigmoid'))
model_2.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model_2.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 20, 16)            160000    
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                4704      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33        
Total params: 164,737
Trainable params: 164,737
Non-trainable params: 0
_________________________________________________________________


#### Train the model 2

In [7]:
history_2 = model_2.fit(train_seq_pad, training_labels, epochs=10, batch_size=32, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 8000 samples, validate on 2000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Test the model 2

In [8]:
print(model_2.metrics_names)
model_2.evaluate(x=test_seq_pad, y=testing_labels)

['loss', 'acc']


[0.22840861797332765, 0.9240000247955322]

We see that above model didn't have good accuracy compared to much simpler model. We didn't use most of the data, training data is very less and also value of seq_len was less for training data and more for testing data.


We can extend the model by adding more RNN layers in between and for the above we didn't use the out of intermediate output of RNN layer.

#### Extended model 2
Extended model 2 is made of 5 layers:

- Layer 1 is Embedding layer
- Layer 2 is GRU RNN layer (return full sequence)
- Layer 3 is GRU RNN layer (return full sequence)
- Layer 4 is GRU RNN layer (return last output)
- Layer 5 is classification (Dense) layer 


In [13]:
model_2_ext = Sequential()
model_2_ext.add(Embedding(vocab_size, embedding_dim))
# for intermediate layers, we want to return output of each cell of RNN, 
# so that it forms a seq. which is processed by next RNN layer
model_2_ext.add(GRU(32, return_sequences=True))
model_2_ext.add(GRU(64, return_sequences=True))
# in final RNN layer we will not return the sequence but only the final output,
# which is use in the next non RNN layer e.g. Dense layer in this case
model_2_ext.add(GRU(32))
model_2_ext.add(Dense(1, activation='sigmoid'))
model_2_ext.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model_2_ext.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, None, 16)          160000    
_________________________________________________________________
gru_2 (GRU)                  (None, None, 32)          4704      
_________________________________________________________________
gru_3 (GRU)                  (None, None, 64)          18624     
_________________________________________________________________
gru_4 (GRU)                  (None, 32)                9312      
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 33        
Total params: 192,673
Trainable params: 192,673
Non-trainable params: 0
_________________________________________________________________


#### Train the extended model 2

In [14]:
history_2_ext = model_2_ext.fit(train_seq_pad, training_labels, epochs=10, batch_size=32, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 8000 samples, validate on 2000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Test the extended model 2

In [15]:
print(model_2_ext.metrics_names)
model_2_ext.evaluate(x=test_seq_pad, y=testing_labels)

['loss', 'acc']


[0.22118819653987884, 0.9269999861717224]

### Model 3: Embedding => Bidirectional RNN => Output
In this model 3 we will extend the Model 2 by wrapping the RNN layer with a Bidirectional wrapper.

#### Define the model 3
Extended model 3 is made of 3 layers:

- Layer 1 is Embedding layer
- Layer 2 is Bidirectional RNN layer (return last output)
- Layer 3 is classification (Dense) layer 

In [13]:
from keras.models import Sequential
from keras.layers import Dense, Embedding, SimpleRNN
from keras.layers.wrappers import Bidirectional

# model configurations
vocab_size = 10000
seq_max_len = 20 # this can be removed as it is not required for next layer which is RNN
embedding_dim = 16

# model definition
model_3 = Sequential()
model_3.add(Embedding(vocab_size, embedding_dim, input_length=seq_max_len))
# [1] This will create two copies of the hidden layer, 
# one fit in the input sequences as-is and one on a reversed copy of the input sequence. 
# By default, the output values from these LSTMs will be concatenated.
model_3.add(Bidirectional(SimpleRNN(32)))
model_3.add(Dense(1, activation='sigmoid'))
model_3.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model_3.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, 20, 16)            160000    
_________________________________________________________________
bidirectional_1 (Bidirection (None, 64)                3136      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 65        
Total params: 163,201
Trainable params: 163,201
Non-trainable params: 0
_________________________________________________________________


#### Train model 3

In [14]:
history_3 = model_3.fit(train_seq_pad, training_labels, epochs=10, batch_size=32, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 8000 samples, validate on 2000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Testing model 3

In [15]:
print(model_3.metrics_names)
model_3.evaluate(x=test_seq_pad, y=testing_labels)

['loss', 'acc']


[0.4578404839038849, 0.8730000257492065]

Similarly like model 2, model 3 can be extended by adding more bidirectional layers in between.  

#### Extended model 3
Extended model 3 is made of 5 layers:

- Layer 1 is Embedding layer
- Layer 2 is Bidirectional RNN layer (return full sequence)
- Layer 3 is Bidirectional RNN layer (return full sequence)
- Layer 4 is Bidirectional RNN layer (return last output)
- Layer 5 is classification (Dense) layer 


In [16]:
from keras.models import Sequential
from keras.layers import Dense, Embedding, SimpleRNN
from keras.layers.wrappers import Bidirectional

# model configurations
vocab_size = 10000
seq_max_len = 20 # this can be removed as it is not required for next layer which is RNN
embedding_dim = 16

In [17]:
# model definition
model_3_ext = Sequential()
model_3_ext.add(Embedding(vocab_size, embedding_dim, input_length=seq_max_len))
model_3_ext.add(Bidirectional(SimpleRNN(32, return_sequences=True)))
model_3_ext.add(Bidirectional(SimpleRNN(64, return_sequences=True)))
model_3_ext.add(Bidirectional(SimpleRNN(32)))
model_3_ext.add(Dense(1, activation='sigmoid'))
model_3_ext.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model_3_ext.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 20, 16)            160000    
_________________________________________________________________
bidirectional_2 (Bidirection (None, 20, 64)            3136      
_________________________________________________________________
bidirectional_3 (Bidirection (None, 20, 128)           16512     
_________________________________________________________________
bidirectional_4 (Bidirection (None, 64)                10304     
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 65        
Total params: 190,017
Trainable params: 190,017
Non-trainable params: 0
_________________________________________________________________


#### Train ext. model 3

In [18]:
history_3_ext = model_3_ext.fit(train_seq_pad, training_labels, epochs=10, batch_size=32, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 8000 samples, validate on 2000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Test ext. model 3

In [19]:
print(model_3_ext.metrics_names)
model_3_ext.evaluate(x=test_seq_pad, y=testing_labels)

['loss', 'acc']


[0.8108287644386292, 0.8569999933242798]

### Plotting the above results

//ToDo: train the above m


In [20]:
import matplotlib.pyplot

##### Ref.:
1. https://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/
