(Notes from Ceshine: this is a fork of @tunguz's notebook. I added is the LIME text explainer from ELI5 library. From the output of the explainer it seems the hyperlinks is interfering the prediction, and removing them seems to be a good idea.  I've also removed the part that demonstrates the embedding matrix and the pretrain embeddings dataset that wasn't used in the actual prediction. )

I've been wanting to play with this dataset for a while. I've also been wanting to try to see how do models built on [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/) perform on non-competition "real world" data. Here I will just use one model that was built inside of a [kernel](https://www.kaggle.com/tunguz/bi-gru-lstm-cnn-poolings-fasttext). The kernel scores in the 0.984x AUC range. It's a respectable score, but well below the top solutions that scored in the 0.988x range. 

Let's take a look. First, let's load all the required packages.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from functools import partial

import time
start_time = time.time()

from sklearn.model_selection import train_test_split
import sys, os, re, csv, codecs, numpy as np, pandas as pd
np.random.seed(32)
os.environ["OMP_NUM_THREADS"] = "4"

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation, Conv1D, GRU
from keras.layers import Bidirectional, GlobalMaxPool1D, MaxPooling1D, Add, Flatten
from keras.layers import GlobalAveragePooling1D, GlobalMaxPooling1D, concatenate, SpatialDropout1D
from keras.models import Model, load_model
from keras import initializers, regularizers, constraints, optimizers, layers, callbacks
from keras import backend as K
from keras.engine import InputSpec, Layer
from keras.optimizers import Adam, RMSprop
from keras.callbacks import EarlyStopping, ModelCheckpoint, LearningRateScheduler
from keras.layers import GRU, BatchNormalization, Conv1D, MaxPooling1D

import logging
from keras.callbacks import Callback

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

from eli5.lime import TextExplainer
# Any results you write to the current directory are saved as output.

Now, let's load the data and all the vector embeddings. 

In [None]:
tweets = pd.read_csv("../input/clinton-trump-tweets/tweets.csv")
tweets.head()

In [None]:
tweets.shape

In [None]:
sum(tweets.text.isnull())

In [None]:
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
raw_text = tweets["text"].str.lower()

In [None]:
max_features = 130000
max_len = 220
tk = Tokenizer(num_words = max_features, lower = True)
tk.fit_on_texts(raw_text)
tweets["comment_seq"] = tk.texts_to_sequences(raw_text)

In [None]:
tweets_pad_sequences = pad_sequences(tweets.comment_seq, maxlen = max_len)

In [None]:
tweets_pad_sequences.shape

Now we'll load the actual trained model and make the predictions on our data.

In [None]:
model = load_model("../input/bi-gru-lstm-cnn-poolings-fasttext/best_model.hdf5")

In [None]:
pred = model.predict(tweets_pad_sequences, batch_size = 1024, verbose = 1)

In [None]:
pred.max()

In [None]:
toxic_predictions = pd.DataFrame(columns=list_classes, data=pred)

In [None]:
toxic_predictions.head()

In [None]:
toxic_predictions['id'] = tweets['id'].values
toxic_predictions['handle'] = tweets['handle'].values
toxic_predictions['text'] = tweets['text'].values

In [None]:
toxic_predictions.tail()

In [None]:
Hillary_predictions = toxic_predictions[toxic_predictions['handle'] == 'HillaryClinton']
Trump_predictions = toxic_predictions[toxic_predictions['handle'] == 'realDonaldTrump']

In [None]:
Hillary_predictions[list_classes].describe()

In [None]:
Trump_predictions[list_classes].describe()

Based on this summary statistics, it would seem that both of them score pretty low on average for all of the "Toxic" categories. However, there do seem to be a few notable "highly probable" problemeatic tweets in each one of the six categories, with notable exception of "threat". Which, I think, is a good thing. For what it's worth (not much at all, IMHO), Hillary's tweets seem to be, on the average, toxic, severaly toxic, and obscene, while Trump's tweets score higher on the average for threat, insult, and identity hate. 

Let's see what the "worst offenders" are in for both candidates. Let's start with the most toxic Hillary tweet.

### Define explainer helper function

In [None]:
def predict_texts(texts, class_idx):
    sequence = tk.texts_to_sequences(texts)
    sequence = pad_sequences(sequence, maxlen=max_len) 
    preds = model.predict(sequence, batch_size=100, verbose=1)[:, class_idx]
    # Make the probability sums to 1
    preds = np.array([preds, 1-preds]).transpose()
    return preds

def explain_text(text, class_idx):
    te = TextExplainer(random_state=42, n_samples=1000)
    te.fit(text, partial(predict_texts, class_idx=class_idx))
    print(te.metrics_)
    return te.show_prediction(target_names=[list_classes[class_idx], "None"])

In [None]:
class_idx = 0
print(list_classes[class_idx])
explain_text(Hillary_predictions.loc[Hillary_predictions['toxic'].idxmax()]['text'], class_idx=class_idx)

In [None]:
Hillary_predictions.loc[Hillary_predictions['toxic'].idxmax()]

Meh, not really toxic. Seems like the word "mad", or the high frequency of special characters, have flagged this tweet as toxic. The same tweet was also marked as the top tweet in both "severe toxic" and "obscene" categories. 

Now let's look at "threats":

In [None]:
class_idx = 3
print(list_classes[class_idx])
explain_text(Hillary_predictions.loc[Hillary_predictions['threat'].idxmax()]['text'], class_idx=class_idx)

In [None]:
Hillary_predictions.loc[Hillary_predictions['threat'].idxmax()]

In [None]:
class_idx = 0
print(list_classes[class_idx])
explain_text(Hillary_predictions.loc[Hillary_predictions['threat'].idxmax()]['text'], class_idx=class_idx)

Yeah, not much going on there. As predicted with very low probability of this actually being a threat.

What's Hillary's worst insult?

In [None]:
Hillary_predictions.loc[Hillary_predictions['insult'].idxmax()]

In [None]:
class_idx = 4
print(list_classes[class_idx])
explain_text(Hillary_predictions.loc[Hillary_predictions['insult'].idxmax()]['text'], class_idx=class_idx)

Ouch. That's definitely below the belt, but in a more indirect kind of way. And yeah, insluting. Good job, predictive modeling!

Let's look at identity hate:

In [None]:
Hillary_predictions.loc[Hillary_predictions['identity_hate'].idxmax()]

In [None]:
class_idx = 5
print(list_classes[class_idx])
explain_text(Hillary_predictions.loc[Hillary_predictions['identity_hate'].idxmax()]['text'], class_idx=class_idx)

Hmm, that's interesting: seem the algorithm has marked Hillary's ReTweet of Trump's tweet. Seems like there is something deep going on here. Or the algorithm is just plain unreliable. 

OK, let's move onto Trump. First, his most toxic tweet:

In [None]:
Trump_predictions.loc[Trump_predictions['toxic'].idxmax()]

In [None]:
class_idx = 0
print(list_classes[class_idx])
explain_text(Trump_predictions.loc[Trump_predictions['toxic'].idxmax()]['text'], class_idx=class_idx)

That's just weird: there is nothign toxic about it. The same tweet has been flagged as the most severly toxic and obscene tweet as well. Not very informative.

Now how about threats?

In [None]:
Trump_predictions.loc[Trump_predictions['threat'].idxmax()]

In [None]:
class_idx = 3
print(list_classes[class_idx])
explain_text(Trump_predictions.loc[Trump_predictions['threat'].idxmax()]['text'], class_idx=class_idx)

Massive tax increases? yeah, I can see how this could be viewed as threatening.

How about the most insulting tweet?

In [None]:
Trump_predictions.loc[Trump_predictions['insult'].idxmax()]

In [None]:
class_idx = 4
print(list_classes[class_idx])
explain_text(Trump_predictions.loc[Trump_predictions['insult'].idxmax()]['text'], class_idx=class_idx)

Yeah, definitely insulting. On so many levels. I can't even ...

And what about identity hate?

In [None]:
Trump_predictions.loc[Trump_predictions['identity_hate'].idxmax()]

In [None]:
class_idx = 5
print(list_classes[class_idx])
explain_text(Trump_predictions.loc[Trump_predictions['identity_hate'].idxmax()]['text'], class_idx=class_idx)

That one really made me LOL. And think. Is he mocking him for his "identity" of bing Jeb? Or mommy's boy? Or W's brother? Or a weakling? All of the above? So many choices  ...

In the end, this exercise shows both the strengths and limitations of algorithmic approach to toxic comment classification. Since the AUC score for the training sets is relatively high (almost 0.99 for  the top models), it is most likely that in the case human insight is even more relevant than for most other ML areas. Furthermore, even though we had a pretty large dataset to work with, it is very likely that in order to get even close to human level toxic text classification, we'd need several orders of magnitude larger training set, and/or deeper natural text understanding models. 