# GloVe Models

This notebook is where I run all models that use the GloVe embedding.

## Setup

In [1]:
import numpy as np
import pandas as pd

from architectures import BidAttentionLstm, BidMaxPoolGru
from helpers import make_df, make_glovevec, predict_and_save

from sklearn.model_selection import train_test_split

np.random.seed(7)

max_features = 100000
maxlen = 150
embed_size = 300
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult",
                "identity_hate"]

xtr, xte, y, word_index = make_df("./input/train.csv",
                                  "./input/test.csv",
                                  max_features, maxlen, list_classes)



embedding_vector = make_glovevec("./input/glove.840B.300d.txt",
                                 max_features, embed_size, word_index)

[xtr, xval, y, yval] = train_test_split(xtr, y, train_size=0.90, random_state=233)


Using TensorFlow backend.


clean
Text is clean
clean
Text is clean
[[ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.27204001 -0.06203    -0.1884     ...,  0.13015001 -0.18317001  0.1323    ]
 [ 0.31924     0.06316    -0.27858001 ...,  0.082745    0.097801
   0.25044999]
 ..., 
 [ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.          0.        ]
 [ 0.          0.          0.         ...,  0.          0.          0.        ]]


## Callbacks

In [2]:
from helpers import RocAucEvaluation
from keras.callbacks import EarlyStopping, ModelCheckpoint

file_path = "./modelckpts/.model.{epoch:02d}.hdf5"

ckpt = ModelCheckpoint(file_path, monitor='val_loss', verbose=2,
                        mode='min')
# I've decided not to use Early Stopping, since it doesn't monitor ROC/AUC score.
early = EarlyStopping(monitor="val_loss", mode="min", patience=3)
roc = RocAucEvaluation(validation_data=(xval, yval), interval=1)

# Run Bidirectional LSTM model with Attention

In [None]:
model = BidAttentionLstm(maxlen, max_features, embed_size, embedding_vector)


In [5]:
model.fit(xtr, y, batch_size=512, epochs=30, validation_data=(xval, yval),
          callbacks=[ckpt, roc], verbose=2)

Train on 143613 samples, validate on 15958 samples
Epoch 1/30

Epoch 00001: saving model to ./modelckpts/.model.01.hdf5

 ROC-AUC - epoch: 1 - score: 0.990168 

 - 185s - loss: 0.0365 - acc: 0.9856 - val_loss: 0.0413 - val_acc: 0.9838
Epoch 2/30

Epoch 00002: saving model to ./modelckpts/.model.02.hdf5

 ROC-AUC - epoch: 2 - score: 0.989938 

 - 183s - loss: 0.0362 - acc: 0.9857 - val_loss: 0.0413 - val_acc: 0.9839
Epoch 3/30

Epoch 00003: saving model to ./modelckpts/.model.03.hdf5

 ROC-AUC - epoch: 3 - score: 0.989451 

 - 183s - loss: 0.0354 - acc: 0.9858 - val_loss: 0.0422 - val_acc: 0.9840
Epoch 4/30

Epoch 00004: saving model to ./modelckpts/.model.04.hdf5

 ROC-AUC - epoch: 4 - score: 0.989511 

 - 183s - loss: 0.0346 - acc: 0.9862 - val_loss: 0.0416 - val_acc: 0.9840
Epoch 5/30

Epoch 00005: saving model to ./modelckpts/.model.05.hdf5

 ROC-AUC - epoch: 5 - score: 0.989913 

 - 183s - loss: 0.0342 - acc: 0.9864 - val_loss: 0.0422 - val_acc: 0.9839
Epoch 6/30

Epoch 00006: savi

KeyboardInterrupt: 

In [6]:
#0.9828
predict_and_save(model, xte, '07', 'bidlstm_07_9897')

Predicting with model...
Saving to submission file...


## Run Bidirectional GRU model with Max Pooling

In [None]:
model = BidMaxPoolGru(maxlen, max_features, embed_size, embedding_vector)
model.fit(xtr, y, batch_size=1024, epochs=20, validation_data=(xval, yval),
          callbacks=[ckpt, roc], verbose=2)


## Predict Model and Save Submission to CSV

In [None]:
#0.9829
predict_and_save(model, xte, '05', 'bidgru_04')