## Lab04
  
The [Conversation AI](https://conversationai.github.io/) team, a research initiative founded by [Jigsaw](https://jigsaw.google.com/) and Google (both a part of Alphabet) are working on tools to help improve online conversation. One area of focus is the study of negative online behaviors, like toxic comments (i.e. comments that are rude, disrespectful or otherwise likely to make someone leave a discussion).   
  
Kaggle are currently hosting their [second competition](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge#description) on this research. The challenge is to create a model that is capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate better than Perspective’s current models. The competitions use a dataset of comments from Wikipedia’s talk page edits. Improvements to the current model will hopefully help online discussion become more productive and respectful.

We shall be using this dataset to benchmark a number of ML models. While the focus of the current competition is to mitigate bias, we will not be using the metric used in the competition. Instead we will be focusing on a simpler metric [Area under the Curve (or AUC)](https://www.kaggle.com/learn-forum/53782) which is suitable to unbalanced binary datasets. Also, we shall not consider different levels of Toxicity; we shall purely take anything marked over the 0.5 level in the measured toxicity range as toxic, and anything underneath as non-toxic. 

We have created a jupyter notbook with some of the tools to model this problem in Deep Learning, using Logistic regression, MLP, CNN and RNNs. Your challenge will be to fill in the models and benchmark the accuracy you achieve on different models.

We shall be using the keras deep learning package. As you may know, this is an API into DL frameworks, but is most commonly backed by Tensorflow. [keras.io](keras.io) is a great source for documentation and examples on layers available andn functionality. 

**Have fun!!**


*Disclaimer: the dataset used contains text that may be considered profane, vulgar, or offensive.*

### Set up packages

In [1]:
import os
import tensorflow as tf
import keras
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
import pandas as pd
import numpy as np

from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score

Using TensorFlow backend.


### Load data
Let's load our data and take a peak.   
The competition metrics and fields have too much detail too cover here.   
We will just focus on the comment of the users and whether it was deemed toxic (target>0.5) or not.    

In [2]:
os.listdir()

['.ipynb_checkpoints',
 'glove.6B.300d.txt',
 'glove.6B.50d.txt',
 'glove.6B.100d.txt',
 'glove.6B.200d.txt',
 'glove.6B.zip',
 'w251_lab04_1dcnn.ipynb',
 'train.csv.zip']

In [3]:
train_df = pd.read_csv('train.csv.zip', 
                       dtype={'comment_text':str},
                       usecols=['comment_text', 'target'],
                       compression = 'zip')
train_df['target'] = (train_df['target']>0.5).astype(int)
print("Dataframe Number of (rows, columns) : {}".format(train_df.shape))

Dataframe Number of (rows, columns) : (1804874, 2)


In [4]:
pd.options.display.max_colwidth = 200
train_df.head(10)

Unnamed: 0,target,comment_text
0,0,"This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!"
1,0,"Thank you!! This would make my life a lot less anxiety-inducing. Keep it up, and don't let anyone get in your way!"
2,0,This is such an urgent design problem; kudos to you for taking it on. Very impressive!
3,0,Is this something I'll be able to install on my site? When will you be releasing it?
4,1,haha you guys are a bunch of losers.
5,1,ur a sh*tty comment.
6,0,hahahahahahahahhha suck it.
7,0,FFFFUUUUUUUUUUUUUUU
8,0,The ranchers seem motivated by mostly by greed; no one should have the right to allow their animals destroy public land.
9,0,It was a great show. Not a combo I'd of expected to be good together but it was.


### Create validation data set
Lets randomly 66/33 split the data into a training and validation set.   
<font color='blue'>**No change needed here - note, please do not change the KFold split parameters, keeping it consistent will help us debug.**</font>

In [5]:
cv = KFold(n_splits=3, shuffle=True, random_state=42)
trn_ids, val_ids = next(cv.split(train_df))
x_train, x_valid = train_df['comment_text'][trn_ids], train_df['comment_text'][val_ids]
y_train, y_valid = train_df['target'].values[trn_ids], train_df['target'].values[val_ids]

### 1D CNN

Now we shall demonstrate how to apply 1D convolutions over sentences to learn trends.  
As shown in the diagram from wildml.com, the first step, is to create a vector representation (or embedding) of each word, or token, in the corpus. We do this by assigning a unique integer id to each token. We then create an array, where each row  is the vector for a single toke, whcih can be indexed by the unique id. This can be seen below keras's text tokenizer.  
![Image](http://www.wildml.com/wp-content/uploads/2015/11/Screen-Shot-2015-11-06-at-8.03.47-AM.png)  
Keras does not allow dynamic graphs, we need to define the graph in advance. To do this we need to define the length of the sequences. As the length of the sequences are fixed, we need to pad (or crop) each sentence to that fixed length.   
There is ways to get around this in Keras's which significantly increase speed, however, we do not cover that here. If you are interested, an example is [here](https://github.com/darraghdog/avito-demand/blob/d25c441e6c37557cb3ba1637df9487ca00b99822/nnet/nnet_2605.py#L230). 

In [6]:
# We crop a lot of the texts by only making max length 50
MAX_LEN = 30
tokenizer = keras.preprocessing.text.Tokenizer() 
tokenizer.fit_on_texts(list(x_train) + list(x_valid))
word_index = tokenizer.word_index
X_trn_seq = tokenizer.texts_to_sequences(list(x_train))
X_val_seq = tokenizer.texts_to_sequences(list(x_valid))
X_trn_seq = keras.preprocessing.sequence.pad_sequences(X_trn_seq, maxlen=MAX_LEN)
X_val_seq = keras.preprocessing.sequence.pad_sequences(X_val_seq, maxlen=MAX_LEN)

Now we have a 2D array being fed in for each sentence sequence, which is the `<Embedding Vectors> * <Numer of tokens in the sequence>`. A sequence of 1D convolutions is applied to these matrices being inputted, along with max poolling. 

In [7]:
inp = keras.Input(shape = (MAX_LEN,))
x = keras.layers.Embedding(len(X_trn_seq) + 1, 128)(inp)
x = keras.layers.SpatialDropout1D(0.3)(x)
x = keras.layers.Conv1D(64, 2, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Conv1D(64, 3, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.1)(keras.layers.Dense(128, activation='relu') (x))
x = keras.layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inp, outputs=x)
model.compile(loss="binary_crossentropy", optimizer=keras.optimizers.Adam(lr=1e-3), metrics=["accuracy"])
model.summary()

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 30)                0         
_________________________________________________________________
embedding_1 (Embedding)      (None, 30, 128)           154016000 
_________________________________________________________________
spatial_dropout1d_1 (Spatial (None, 30, 128)           0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 30, 64)            16448     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 6, 64)             0         
_________________________________________________________________
conv1d_2 (Conv1D)    

In [8]:
model.fit(X_trn_seq, y_train, batch_size=2**10, epochs=1, verbose=1, validation_data=(X_val_seq, y_valid))
preds_cnn = model.predict(X_val_seq).flatten()
print('AUC score : {:.5f}'.format(roc_auc_score(y_valid, preds_cnn)))

Instructions for updating:
Use tf.cast instead.


  num_elements)


Train on 1203249 samples, validate on 601625 samples
Epoch 1/1
AUC score : 0.92494


### CNN with Embeddings
Embeddings are vector representations for words. Embeddings are covered in week 7, however we will check here if pretrained word embeddings our results. When learning from randomly generated embeddings, as seen above the model creates embeddings of the words based on the corpus in the training data only. Using pretrained embeddings, we can use embeddings of words pretrained on much larger datasets.  
  
For example, Stanford created the [Glove](https://nlp.stanford.edu/projects/glove/) created the `glove.840B.300d.txt` embedding vectors based on a Common Crawl of the web. It scanned 840Billion words (or tokens) and created embeddings for over 2Million tokens. Each token is represented by a vector of dimension 300 - ie. 300 floating point numbers to represent each token or word from the 2Million tokens found. 

<font color='blue'>**In this task we will load pretrained embeddings from disk into memory, where we assign each token an embedding. This is done by creating a numpy array of with a row for each index position in the `tokenizer` we created earlier.  
The width of the array will be the dimension of the vectors for each token.
Then you will load that embedding matrix into the Embedding layer in the keras model. As words are fed in from the array of tokenised sentences within the model, each token will be indexed into the embedding matrix to locate its equivalent embedding vector. Please set the embedding matrix to not trainable, so the pretrained embeddings do not change as the model learns.**  
<font color='blue'>**To understand how to load the pretrained embeddings into the embedding layer you can leverage the approach taken in [this script](https://www.kaggle.com/jhoward/improved-lstm-baseline-glove-dropout/data#Improved-LSTM-baseline) put together by Jeremy Howard to load the `glove.6B.50d.txt`.
Please do this by setting `trainable=False` in the embedding layer, when you have the pretrained embeddings loaded.**

In [31]:
EMBEDDING_FILE

['glove.6B.300d.txt',
 'glove.6B.50d.txt',
 'glove.6B.100d.txt',
 'glove.6B.200d.txt',
 'glove.6B.zip']

In [22]:
'''Follow Jeremy's approach to load one of the embedding files below 
Test with different embedding files. Remember, each files has a different embedding dimension.
https://www.kaggle.com/jhoward/improved-lstm-baseline-glove-dropout/data#Improved-LSTM-baseline
'''
EMBEDDING_FILE = [f for f in os.listdir() if 'glove' in f]

def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE[0]))
embeddings_index

{'george': array([-0.1176   , -0.085845 ,  0.22586  ,  0.067101 ,  0.11489  ,
        -0.032905 , -0.29119  , -0.48233  , -0.22767  , -0.83936  ,
         0.16669  , -0.25017  ,  0.056319 , -0.0022606,  0.41021  ,
         0.55337  , -0.054828 , -0.37209  ,  0.41163  ,  0.12591  ,
         0.31362  , -0.17306  ,  0.63346  ,  0.14013  ,  0.49404  ,
         0.1812   , -0.27215  , -0.12521  ,  0.57681  ,  0.17889  ,
         0.42946  , -0.0037585,  0.047973 ,  0.17374  , -1.0808   ,
         0.065725 , -0.28447  , -0.083942 , -0.12016  , -0.036501 ,
        -0.10537  ,  0.22979  , -0.12295  , -0.049347 , -0.19218  ,
         0.062323 ,  0.38778  , -0.29248  ,  0.21427  , -0.30515  ,
        -0.12015  ,  0.103    , -0.37617  ,  0.37951  ,  0.37333  ,
         0.39937  , -0.062949 ,  0.77966  ,  0.019877 , -0.35743  ,
         0.52675  ,  0.24657  ,  0.24714  , -0.039039 ,  0.58525  ,
        -0.95502  , -0.41566  ,  0.11964  , -0.23184  , -0.30868  ,
        -0.15137  , -0.080427 ,  0.264

In [27]:
embed_size = 300 # how big is each word vector
max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 100 # max number of words in a comment to use

all_embs = np.stack(embeddings_index.values())
emb_mean,emb_std = all_embs.mean(), all_embs.std()
emb_mean,emb_std

word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector

  """


In [28]:
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation
from keras.layers import Bidirectional, GlobalMaxPool1D


inp = keras.Input(shape = (MAX_LEN,))
'''Students fill in how to load the embeddings from numpy array to Keras. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# Again, follow Jeremy's approach to load the embeddings to the Embedding layer 
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
'''
x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)
x = Bidirectional(LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(x)
x = keras.layers.SpatialDropout1D(0.3)(x)
x = keras.layers.Conv1D(64, 2, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Conv1D(64, 3, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.1)(keras.layers.Dense(128, activation='relu') (x))
x = keras.layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inp, outputs=x)
model.compile(loss="binary_crossentropy", optimizer=keras.optimizers.Adam(lr=1e-3), metrics=["accuracy"])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         (None, 30)                0         
_________________________________________________________________
embedding_2 (Embedding)      (None, 30, 300)           6000000   
_________________________________________________________________
bidirectional_1 (Bidirection (None, 30, 100)           140400    
_________________________________________________________________
spatial_dropout1d_2 (Spatial (None, 30, 100)           0         
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 30, 64)            12864     
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 6, 64)             0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 6, 64)             12352     
__________

In [29]:
model.fit(X_trn_seq, y_train, batch_size=2**10, epochs=1, verbose=1, validation_data=(X_val_seq, y_valid))
preds_cnn = model.predict(X_val_seq).flatten()

Train on 1203249 samples, validate on 601625 samples
Epoch 1/1


In [30]:
print('AUC score : {:.5f}'.format(roc_auc_score(y_valid, preds_cnn)))

AUC score : 0.92154


In [32]:
#glove.6B.50d.txt'
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE[1]))

embed_size = 50 # how big is each word vector
max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 100 # max number of words in a comment to use

all_embs = np.stack(embeddings_index.values())
emb_mean,emb_std = all_embs.mean(), all_embs.std()
emb_mean,emb_std

word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector
        
inp = keras.Input(shape = (MAX_LEN,))
'''Students fill in how to load the embeddings from numpy array to Keras. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# Again, follow Jeremy's approach to load the embeddings to the Embedding layer 
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
'''
x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)
x = Bidirectional(LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(x)
x = keras.layers.SpatialDropout1D(0.3)(x)
x = keras.layers.Conv1D(64, 2, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Conv1D(64, 3, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.1)(keras.layers.Dense(128, activation='relu') (x))
x = keras.layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inp, outputs=x)
model.compile(loss="binary_crossentropy", optimizer=keras.optimizers.Adam(lr=1e-3), metrics=["accuracy"])
model.summary()

model.fit(X_trn_seq, y_train, batch_size=2**10, epochs=1, verbose=1, validation_data=(X_val_seq, y_valid))
preds_cnn = model.predict(X_val_seq).flatten()

print('AUC score : {:.5f}'.format(roc_auc_score(y_valid, preds_cnn)))

  if __name__ == '__main__':


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_6 (InputLayer)         (None, 30)                0         
_________________________________________________________________
embedding_3 (Embedding)      (None, 30, 50)            1000000   
_________________________________________________________________
bidirectional_2 (Bidirection (None, 30, 100)           40400     
_________________________________________________________________
spatial_dropout1d_3 (Spatial (None, 30, 100)           0         
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 30, 64)            12864     
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 6, 64)             0         
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 6, 64)             12352     
__________

In [33]:
#glove.6B.100d.txt'
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE[2]))

embed_size = 100 # how big is each word vector
max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 100 # max number of words in a comment to use

all_embs = np.stack(embeddings_index.values())
emb_mean,emb_std = all_embs.mean(), all_embs.std()
emb_mean,emb_std

word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector
        
inp = keras.Input(shape = (MAX_LEN,))
'''Students fill in how to load the embeddings from numpy array to Keras. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# Again, follow Jeremy's approach to load the embeddings to the Embedding layer 
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
'''
x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)
x = Bidirectional(LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(x)
x = keras.layers.SpatialDropout1D(0.3)(x)
x = keras.layers.Conv1D(64, 2, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Conv1D(64, 3, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.1)(keras.layers.Dense(128, activation='relu') (x))
x = keras.layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inp, outputs=x)
model.compile(loss="binary_crossentropy", optimizer=keras.optimizers.Adam(lr=1e-3), metrics=["accuracy"])
model.summary()

model.fit(X_trn_seq, y_train, batch_size=2**10, epochs=1, verbose=1, validation_data=(X_val_seq, y_valid))
preds_cnn = model.predict(X_val_seq).flatten()

print('AUC score : {:.5f}'.format(roc_auc_score(y_valid, preds_cnn)))

  if __name__ == '__main__':


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         (None, 30)                0         
_________________________________________________________________
embedding_4 (Embedding)      (None, 30, 100)           2000000   
_________________________________________________________________
bidirectional_3 (Bidirection (None, 30, 100)           60400     
_________________________________________________________________
spatial_dropout1d_4 (Spatial (None, 30, 100)           0         
_________________________________________________________________
conv1d_7 (Conv1D)            (None, 30, 64)            12864     
_________________________________________________________________
max_pooling1d_7 (MaxPooling1 (None, 6, 64)             0         
_________________________________________________________________
conv1d_8 (Conv1D)            (None, 6, 64)             12352     
__________

In [34]:
#glove.6B.200d.txt'
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE[3]))

embed_size = 200 # how big is each word vector
max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 100 # max number of words in a comment to use

all_embs = np.stack(embeddings_index.values())
emb_mean,emb_std = all_embs.mean(), all_embs.std()
emb_mean,emb_std

word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector
        
inp = keras.Input(shape = (MAX_LEN,))
'''Students fill in how to load the embeddings from numpy array to Keras. 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# Again, follow Jeremy's approach to load the embeddings to the Embedding layer 
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
'''
x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)
x = Bidirectional(LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(x)
x = keras.layers.SpatialDropout1D(0.3)(x)
x = keras.layers.Conv1D(64, 2, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Conv1D(64, 3, activation='relu', padding='same')(x)
x = keras.layers.MaxPooling1D(5, padding='same')(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.1)(keras.layers.Dense(128, activation='relu') (x))
x = keras.layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inp, outputs=x)
model.compile(loss="binary_crossentropy", optimizer=keras.optimizers.Adam(lr=1e-3), metrics=["accuracy"])
model.summary()

model.fit(X_trn_seq, y_train, batch_size=2**10, epochs=1, verbose=1, validation_data=(X_val_seq, y_valid))
preds_cnn = model.predict(X_val_seq).flatten()

print('AUC score : {:.5f}'.format(roc_auc_score(y_valid, preds_cnn)))

  if __name__ == '__main__':


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_8 (InputLayer)         (None, 30)                0         
_________________________________________________________________
embedding_5 (Embedding)      (None, 30, 200)           4000000   
_________________________________________________________________
bidirectional_4 (Bidirection (None, 30, 100)           100400    
_________________________________________________________________
spatial_dropout1d_5 (Spatial (None, 30, 100)           0         
_________________________________________________________________
conv1d_9 (Conv1D)            (None, 30, 64)            12864     
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 6, 64)             0         
_________________________________________________________________
conv1d_10 (Conv1D)           (None, 6, 64)             12352     
__________