# TensorFlow Deep Model

Ultimately we want to work in tf to build a model, as it will give the most flexibility and we can build off of some modules we built for SQuAD. 

We'll go through as follows:
0. Set-up.
1. Read in dataset.
2. Convert dataset into format for RNN.
3. Construct vocabulary for RNN.
4. Fit TF-RNN classifier.

In [1]:
# 0. Some initial set-up.
from collections import Counter
import numpy as np
import os
import pandas as pd
import random
from tf_rnn_classifier import TfRNNClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_fscore_support, classification_report, roc_auc_score
import tensorflow as tf
import sst
from utils import evaluate, build_rnn_dataset
import utils

  from ._conv import register_converters as _register_converters


In [2]:
vsmdata_home = 'vsmdata'

glove_home = os.path.join(vsmdata_home, 'glove.6B')

In [3]:
from tensorflow.python.client import device_lib

device_lib.list_local_devices()

[name: "/device:CPU:0"
 device_type: "CPU"
 memory_limit: 268435456
 locality {
 }
 incarnation: 15321837663203353070, name: "/device:GPU:0"
 device_type: "GPU"
 memory_limit: 4917166080
 locality {
   bus_id: 1
   links {
   }
 }
 incarnation: 17279758261332026128
 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"]

In [4]:
data_dir = "./data/"

## Read in data

In [5]:
train = pd.read_csv(data_dir + "train.csv").fillna(' ')
test = pd.read_csv(data_dir + "test.csv").fillna(' ')
test_labels = pd.read_csv(data_dir + "test_labels.csv")

In [6]:
label_cols = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
train_text = train['comment_text']
train_labels = train[label_cols]

## Formatting dataset for RNN

In order to format the dataset for the RNN, we want to format it so that we have a list of lists. Outer list corresponds to training examples, inner list corresponds to token within each example.

In [7]:
X_rnn, Y_rnn = build_rnn_dataset(train, 0.9)

In [7]:
" ".join(X_rnn['train'][30][:50])

'tYOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE A MOTHJER FUCKER COCKSUCKER! YOU ARE'

In [20]:
Y_rnn['train'][30]

[1, 1, 1, 0, 1, 0]

## Get vocab for RNN

In [8]:
full_train_vocab = sst.get_vocab(X_rnn['train'])

In [10]:
print("sst_full_train_vocab has {:,} items".format(len(full_train_vocab)))

sst_full_train_vocab has 494,751 items


## Experiment #1: Vanilla LSTM


In [48]:
num_train = 100 #len(X_rnn["train"])

In [49]:
tf_rnn = TfRNNClassifier(
    full_train_vocab,
    embed_dim=50,
    hidden_dim=50,
    max_length=100,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=10,
    eta=0.1)

In [50]:
_ = tf_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

ValueError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [2,?,50], [50,6].

## Evaluate on dev set

In [79]:
tf_rnn_predictions = tf_rnn.predict(X_rnn['dev'])

In [81]:
evaluate(Y_rnn['dev'][:], tf_rnn_predictions)

CLASS: toxic
p, r, f1: 0.6711, 0.5515, 0.6054

CLASS: severe_toxic
p, r, f1: 0.5000, 0.0633, 0.1124

CLASS: obscene
p, r, f1: 0.6467, 0.5509, 0.5950

CLASS: threat
p, r, f1: 0.0000, 0.0000, 0.0000

CLASS: insult
p, r, f1: 0.5981, 0.4624, 0.5216

CLASS: identity_hate
p, r, f1: 0.2500, 0.0286, 0.0513

average F1 score: 0.314273
macro-averaged ROC-AUC score: 0.906019


## Experiment #2: Bidirectional RNN

What we'll do is add a flag to the model to be bidirectional. At least initially, the bw cell will use the same hyperparams as the fw cell. Further, we use a separate bw cell vs. fw cell as we feel that ordering will likely make a difference, thus it does not necessarily make sense to have the bw cell share weights with the fw cell.


In [15]:
bidir_rnn = TfRNNClassifier(
    full_train_vocab,
    embed_dim=50,
    hidden_dim=50,
    max_length=50,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=50,
    bidir_rnn=True, # Bidirectional RNN!
    eta=0.01)

In [16]:
num_train=1000#len(X_rnn["train"])

_ = bidir_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

Iteration 50: loss: 0.035049978643655786

In [17]:
bidir_rnn_predictionsSMALL = bidir_rnn.predict(X_rnn['train'][:1000])

In [18]:
evaluate(Y_rnn['train'][:1000], bidir_rnn_predictionsSMALL)

CLASS: toxic
p, r, f1: 0.9388, 0.8846, 0.9109

CLASS: severe_toxic
p, r, f1: 1.0000, 0.0769, 0.1429

CLASS: obscene
p, r, f1: 0.8814, 0.8667, 0.8739

CLASS: threat
p, r, f1: 0.0000, 0.0000, 0.0000

CLASS: insult
p, r, f1: 0.8070, 0.8519, 0.8288

CLASS: identity_hate
p, r, f1: 1.0000, 0.1250, 0.2222

average F1 score: 0.496458
macro-averaged ROC-AUC score: 0.991964


In [24]:
bidir_rnn_predictions = bidir_rnn.predict(X_rnn['dev'])

In [25]:
evaluate(Y_rnn['dev'][:], bidir_rnn_predictions)

CLASS: toxic
p, r, f1: 0.7414, 0.6710, 0.7044

CLASS: severe_toxic
p, r, f1: 0.5000, 0.3165, 0.3876

CLASS: obscene
p, r, f1: 0.8255, 0.6527, 0.7290

CLASS: threat
p, r, f1: 0.4444, 0.1404, 0.2133

CLASS: insult
p, r, f1: 0.6961, 0.6128, 0.6518

CLASS: identity_hate
p, r, f1: 0.2444, 0.0786, 0.1189

average F1 score: 0.467509
macro-averaged ROC-AUC score: 0.934755


## Experiment #3: GloVe Pretrained Embeddings

It seems natural that some pretrained embeddings would inject useful syntactic and semantic information into our model. 

In [55]:
glove_lookup = utils.glove2dict(
    os.path.join(vsmdata_home, 'glove.6B.100d.txt'))

In [56]:
glove_vocab = sorted(set(glove_lookup) & set(full_train_vocab))
print("Embedding matrix contains %d words." % len(glove_vocab))

Embedding matrix contains 55422 words.


In [57]:
glove_embedding = np.array([glove_lookup[w] for w in glove_vocab])

In [58]:
glove_vocab.append("$UNK")
glove_embedding = np.vstack(
    (glove_embedding, utils.randvec(glove_embedding.shape[1])))

### w/o retraining GloVe

In [98]:
# This was using the limited vocab (only those where glove was defined)
bidir_glove_rnn = TfRNNClassifier(
    glove_vocab,
    embedding=glove_embedding,
    embed_dim=100,
    hidden_dim=50,
    max_length=50,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=False,
    max_iter=10,
    bidir_rnn=True, # Bidirectional RNN!
    eta=0.01)

In [99]:
num_train = len(X_rnn['train'])
bidir_glove_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

Iteration 10: loss: 8.846844404935837

<tf_rnn_classifier.TfRNNClassifier at 0x7f363102d7f0>

In [100]:
bidir_glove_rnn_predictions = bidir_glove_rnn.predict(X_rnn['dev'])

In [101]:
evaluate(Y_rnn['dev'][:], bidir_glove_rnn_predictions)

CLASS: toxic
p: 0.7891 , r: 0.5411, f1: 0.6419

CLASS: severe_toxic
p: 0.4878 , r: 0.1266, f1: 0.2010

CLASS: obscene
p: 0.8003 , r: 0.5260, f1: 0.6348

CLASS: threat
p: 0.6923 , r: 0.1579, f1: 0.2571

CLASS: insult
p: 0.7036 , r: 0.4538, f1: 0.5517

CLASS: identity_hate
p: 0.4651 , r: 0.1429, f1: 0.2186

average F1 score: 0.417535
macro-averaged ROC-AUC score: 0.944858


### w/ retraining GloVe

In [69]:
# This was using the limited vocab (only those where glove was defined)
bidir_glove_retrain_rnn = TfRNNClassifier(
    glove_vocab,
    embedding=glove_embedding,
    embed_dim=50,
    hidden_dim=50,
    max_length=50,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=10,
    bidir_rnn=True, # Bidirectional RNN!
    eta=0.01)

In [1]:
num_train = len(X_rnn['train'])
bidir_glove_retrain_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

NameError: name 'X_rnn' is not defined

In [72]:
bidir_glove_retrain_rnn_predictions = bidir_glove_retrain_rnn.predict(X_rnn['dev'])

In [74]:
evaluate(Y_rnn['dev'][:], bidir_glove_retrain_rnn_predictions)

CLASS: toxic
p: 0.6615 , r: 0.5711, f1: 0.6130

CLASS: severe_toxic
p: 0.4286 , r: 0.2468, f1: 0.3133

CLASS: obscene
p: 0.6743 , r: 0.4989, f1: 0.5735

CLASS: threat
p: 0.5600 , r: 0.2456, f1: 0.3415

CLASS: insult
p: 0.5993 , r: 0.4316, f1: 0.5018

CLASS: identity_hate
p: 0.3934 , r: 0.1714, f1: 0.2388

average F1 score: 0.430290
macro-averaged ROC-AUC score: 0.925147


## Experiment #4: GloVe + Full Vocab (random for others)

In [59]:
full_glove_vocab = sorted(set(full_train_vocab))
print("Embedding matrix contains %d words." % len(full_glove_vocab))

Embedding matrix contains 494751 words.


In [60]:
full_glove_embedding = np.array([
    glove_lookup[w] 
    if w in glove_lookup else utils.randvec(len(glove_lookup["hello"])) 
    for w in full_glove_vocab
])

In [61]:
full_glove_vocab.append("$UNK")
full_glove_embedding = np.vstack(
    (full_glove_embedding, utils.randvec(full_glove_embedding.shape[1])))

In [21]:
bidir_full_glove_rnn = TfRNNClassifier(
    full_glove_vocab,
    embedding=full_glove_embedding,
    embed_dim=100,
    hidden_dim=50,
    max_length=50,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=20,
    bidir_rnn=True, # Bidirectional RNN!
    eta=0.01)

In [125]:
num_train = len(X_rnn['train'])
bidir_full_glove_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

Iteration 20: loss: 0.11288791222614236

<tf_rnn_classifier.TfRNNClassifier at 0x7f362f565f28>

In [126]:
bidir_full_glove_predictions = bidir_full_glove_rnn.predict(X_rnn['dev'])

In [127]:
evaluate(Y_rnn['dev'][:], bidir_full_glove_predictions)

CLASS: toxic
p: 0.7413 , r: 0.6636, f1: 0.7003

CLASS: severe_toxic
p: 0.4352 , r: 0.2975, f1: 0.3534

CLASS: obscene
p: 0.8163 , r: 0.6686, f1: 0.7351

CLASS: threat
p: 0.5000 , r: 0.2456, f1: 0.3294

CLASS: insult
p: 0.6883 , r: 0.6017, f1: 0.6421

CLASS: identity_hate
p: 0.3152 , r: 0.2071, f1: 0.2500

average F1 score: 0.501711
macro-averaged ROC-AUC score: 0.929445


## Experiment #5: Stacked RNN

In [41]:
stacked_rnn = TfRNNClassifier(
    full_glove_vocab,
    embedding=full_glove_embedding,
    embed_dim=100,
    hidden_dim=50,
    max_length=100,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=10,
    bidir_rnn=True, # Bidirectional RNN!
    stacked=True, # Stacked RNN!
    eta=0.03)

In [42]:
num_train = len(X_rnn['train'])
stacked_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

Iteration 10: loss: 3.178255107253794

<tf_rnn_classifier.TfRNNClassifier at 0x7f34beaed198>

In [43]:
stacked_predictions = stacked_rnn.predict(X_rnn['dev'])

In [44]:
evaluate(Y_rnn['dev'][:], stacked_predictions)

CLASS: toxic
p: 0.7284 , r: 0.6703, f1: 0.6981

CLASS: severe_toxic
p: 0.5325 , r: 0.2595, f1: 0.3489

CLASS: obscene
p: 0.8439 , r: 0.6787, f1: 0.7524

CLASS: threat
p: 0.0000 , r: 0.0000, f1: 0.0000

CLASS: insult
p: 0.7327 , r: 0.5105, f1: 0.6017

CLASS: identity_hate
p: 0.3261 , r: 0.1071, f1: 0.1613

average F1 score: 0.427079
macro-averaged ROC-AUC score: 0.956077


In [45]:
stacked_predictions_train = stacked_rnn.predict(X_rnn['train'][100000:120000])

In [46]:
evaluate(Y_rnn['train'][100000:120000], stacked_predictions_train)

CLASS: toxic
p: 0.9524 , r: 0.9479, f1: 0.9501

CLASS: severe_toxic
p: 0.7674 , r: 0.4950, f1: 0.6018

CLASS: obscene
p: 0.9306 , r: 0.9090, f1: 0.9197

CLASS: threat
p: 0.0000 , r: 0.0000, f1: 0.0000

CLASS: insult
p: 0.8914 , r: 0.8467, f1: 0.8684

CLASS: identity_hate
p: 0.7119 , r: 0.4828, f1: 0.5753

average F1 score: 0.652574
macro-averaged ROC-AUC score: 0.994512


## Experiment #6: Character Embeddings w/ Char-Level CNN, (w/ GloVe + randoms full vocab)

In [75]:
char_emb_rnn = TfRNNClassifier(
    full_glove_vocab,
    embedding=full_glove_embedding,
    embed_dim=100,
    hidden_dim=50,
    max_length=100,
    hidden_activation=tf.nn.tanh,
    cell_class=tf.nn.rnn_cell.GRUCell,
    train_embedding=True,
    char_embed=True,
    char_embed_dim=20,
    max_iter=50,
    word_length=12,
    bidir_rnn=True, # Bidirectional RNN!
    eta=0.01)

In [None]:
num_train = len(X_rnn['train'])
char_emb_rnn.fit(X_rnn['train'][:num_train], Y_rnn['train'][:num_train])

77
20
_define_embedding: 77, 20


In [24]:
char_preds = char_emb_rnn.predict(X_rnn['dev'])

In [25]:
evaluate(Y_rnn['dev'], char_preds)

CLASS: toxic
p, r, f1: 0.6063, 0.5398, 0.5712

CLASS: severe_toxic
p, r, f1: 1.0000, 0.0190, 0.0373

CLASS: obscene
p, r, f1: 0.8139, 0.3314, 0.4711

CLASS: threat
p, r, f1: 0.0000, 0.0000, 0.0000

CLASS: insult
p, r, f1: 0.7500, 0.2996, 0.4282

CLASS: identity_hate
p, r, f1: 0.0000, 0.0000, 0.0000

average F1 score: 0.251279
macro-averaged ROC-AUC score: 0.914805


In [None]:
char_preds_t = char_emb_rnn.predict(X_rnn['train'][:100])

In [None]:
evaluate(Y_rnn['train'][:100], char_preds_t)