<a href="https://colab.research.google.com/github/michealman114/Natural-Language-Models-for-Hate-Speech-Classification/blob/main/TestingLSTMsForHateSpeech.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here's most of our code for lazy testing different approaches. We still need to add in the k-fold cross validation functions from the original experiments and add in the title context functionality.


To use this notebook you will need
- to be able to mount your google drive in the 4th code cell (with the folder containing the folder of stored tensors in your home directory)
- to upload CommentsDatasets.py and models.py in the left sidebar file manager

In [1]:
import torch
import torch.nn as nn 
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as torch_data

In [2]:
from torch import cuda

if cuda.is_available():
    device = 'cuda'
    seed = 4814
    torch.cuda.manual_seed_all(seed)
    print("running on GPU:", torch.cuda.get_device_name(0))
else:
    print("running on CPU")

running on GPU: Tesla P100-PCIE-16GB


In [3]:
from CommentsDatasets import * # torch dataset setup
from models import * # all our LSTM based models

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [5]:
!ls '/content/drive/MyDrive/Natural Language Models for Hate Speech Classification'

'Classification with BERT.ipynb'  'LSTMs for Hate Speech Testing.ipynb'
'Embeddings from BERT.ipynb'	   StoredTensors


Note to self (and anybody else who wants to try this): The two cells below correspond to loading data for pregenerated Word2Vec (embed_dim = 768) and BERT (embed_dim = 768) embeddings respectively. Pick the one that you want, and don't bother running the other.

Word2Vec Embeddings Results (from a lazy training loop)
- base model: Loss: 3.535007230937481
- bidirectional LSTM: Loss: 1.8555335476994514
- bidirectional LSTM with Attention: 0.5990170016884804

In [6]:
#Word2Vec 

embed_dim = 300

train_comments_array = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/w2v_train_comments_array.pt')
train_titles_array = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/w2v_train_titles_array.pt')
train_labels = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/w2v_train_labels.pt')

test_comments_array = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/w2v_test_comments_array.pt')
test_titles_array = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/w2v_test_titles_array.pt')
test_labels = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/w2v_test_labels.pt')

In [7]:
training_data = GeneralDataset(train_comments_array, train_labels)

The following cells correspond to loading data for pregenerated BERT (embed_dim = 300) embeddings.

BERT Embeddings Results (from a lazy training loop)
- base model: Loss: 858.4895858764648
- bidirectional LSTM: Loss: 341.25000190734863
- bidirectional LSTM with Attention: Loss: 0.0014144671586109325

These results are both intuitive and surprising. It is amsuingly surprising that the LSTMS without attention are as ludicrously terrible as they are, but it kind of makes sense. 

The enormous performance jump suggests that the attention mechanism (which in this application is just a very simple set of FC layers) is doing most of the work (even without being attention masked - which is something we need to fix pretty urgently). 

I'd be willing to bet that when working on BERT embeddings we can just trivially slap on a couple linear layers on top and get really good performance The suspicion here is that BERT more or less makes the LSTM obsolete - ironic.

Also, when we were cleaning up the dataset before feeding into word2vec we originally did some classical stuff (removing stopwords, punctuation etc) that removes important embedding context - especailly since some stopwords like "no","never","not" substantially change the meaning of a sentence - which probably also explains a good amount of why BERT performs so much better. Fixing this really obvious data processing mistake isn't too difficult, but we can do that later.

In [6]:
embed_dim = 768

train_comments_array = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/BERT_train_comments_embeddings.pt')
train_comments_attention_masks = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/BERT_train_comments_attention_masks.pt')

train_labels = torch.load('drive/MyDrive/Natural Language Models for Hate Speech Classification/StoredTensors/BERT_train_labels.pt')

In [7]:
"""
quick sanity checks:
input data should be: (batch_size, max_length, embed_dim)
labels should be: (batch_size,)
"""
print(train_comments_array.shape, train_comments_attention_masks.shape, train_labels.shape)

torch.Size([1528, 512, 768]) torch.Size([1528, 512]) torch.Size([1528])


In [8]:
training_data = BERTDataset(train_comments_array, train_comments_attention_masks,train_labels)

Now for a simple training loop experiment:

In [12]:
#It's optional to run the cell, the actual training loop cells automate the generation of the model.
model = Full_LSTM_Model(embed_dim=768, bidi=True,attention=True)

  "num_layers={}".format(dropout, num_layers))


In [10]:
from tqdm import tqdm

def train(training_dataset, n_epochs, batch_size, modeltype, model = None, embed_dim = None, bidi = False, attention = False):
    
    if model is None:
        """
        embed_dim = 768 for BERT embeddings
        embed_dim = 300 for Word2Vec embeddings 
        """
        model = modeltype(embed_dim = embed_dim, bidi=bidi,attention=attention).to(device)
    else:
        model = model.to(device)
        
    opt = optim.Adam(model.parameters(), lr=0.001)

    loader = torch_data.DataLoader(training_dataset, batch_size=batch_size, shuffle=True)

    loss_fn = nn.BCELoss()

    losses = []
    for epoch in tqdm(range(n_epochs)):
        epoch_loss = 0
        for context, context_am, label in loader:
        #for context, label in loader:
            context = context.to(device)
            label = label.to(device).type(torch.float32)

            preds = model.forward(context)

            opt.zero_grad()
            loss = loss_fn(preds, label)
            loss.backward()
            opt.step()

            epoch_loss += loss.item()
        print('Loss:', epoch_loss)
        losses.append(epoch_loss)

    
    return model,losses

In [11]:
num_epochs = 30 #running on colab, use 30-40ish, see what works


In [12]:
trained_model,training_losses = train(training_data, num_epochs, 128, Full_LSTM_Model, embed_dim = embed_dim, bidi = False, attention = False)

  "num_layers={}".format(dropout, num_layers))
  3%|▎         | 1/30 [00:00<00:26,  1.11it/s]

Loss: 736.4319438934326


  7%|▋         | 2/30 [00:01<00:22,  1.24it/s]

Loss: 858.2291717529297


 10%|█         | 3/30 [00:02<00:20,  1.30it/s]

Loss: 858.3854217529297


 13%|█▎        | 4/30 [00:03<00:18,  1.40it/s]

Loss: 858.3333358764648


 17%|█▋        | 5/30 [00:03<00:17,  1.46it/s]

Loss: 858.4895858764648


 20%|██        | 6/30 [00:04<00:16,  1.49it/s]

Loss: 858.2291717529297


 23%|██▎       | 7/30 [00:04<00:15,  1.51it/s]

Loss: 858.7500076293945


 27%|██▋       | 8/30 [00:05<00:14,  1.54it/s]

Loss: 858.28125


 30%|███       | 9/30 [00:06<00:13,  1.52it/s]

Loss: 858.5416717529297


 33%|███▎      | 10/30 [00:06<00:13,  1.54it/s]

Loss: 858.6458358764648


 37%|███▋      | 11/30 [00:07<00:12,  1.55it/s]

Loss: 858.6979217529297


 40%|████      | 12/30 [00:08<00:11,  1.56it/s]

Loss: 858.8020858764648


 43%|████▎     | 13/30 [00:08<00:10,  1.57it/s]

Loss: 858.1770858764648


 47%|████▋     | 14/30 [00:09<00:10,  1.57it/s]

Loss: 858.28125


 50%|█████     | 15/30 [00:10<00:09,  1.57it/s]

Loss: 858.6458358764648


 53%|█████▎    | 16/30 [00:10<00:08,  1.57it/s]

Loss: 858.2291717529297


 57%|█████▋    | 17/30 [00:11<00:08,  1.57it/s]

Loss: 858.5937576293945


 60%|██████    | 18/30 [00:11<00:07,  1.57it/s]

Loss: 857.96875


 63%|██████▎   | 19/30 [00:12<00:06,  1.58it/s]

Loss: 858.0208358764648


 67%|██████▋   | 20/30 [00:13<00:06,  1.58it/s]

Loss: 858.2291717529297


 70%|███████   | 21/30 [00:13<00:05,  1.58it/s]

Loss: 858.6458358764648


 73%|███████▎  | 22/30 [00:14<00:05,  1.58it/s]

Loss: 858.7500076293945


 77%|███████▋  | 23/30 [00:15<00:04,  1.58it/s]

Loss: 858.2291717529297


 80%|████████  | 24/30 [00:15<00:03,  1.58it/s]

Loss: 858.4375


 83%|████████▎ | 25/30 [00:16<00:03,  1.58it/s]

Loss: 858.3333358764648


 87%|████████▋ | 26/30 [00:16<00:02,  1.58it/s]

Loss: 858.28125


 90%|█████████ | 27/30 [00:17<00:01,  1.57it/s]

Loss: 858.4375


 93%|█████████▎| 28/30 [00:18<00:01,  1.57it/s]

Loss: 858.2291717529297


 97%|█████████▋| 29/30 [00:18<00:00,  1.57it/s]

Loss: 858.3854217529297


100%|██████████| 30/30 [00:19<00:00,  1.53it/s]

Loss: 858.4895858764648





In [14]:
trained_model,training_losses = train(training_data, num_epochs, 128, Full_LSTM_Model, embed_dim = embed_dim, bidi = True, attention = False)

  "num_layers={}".format(dropout, num_layers))
  3%|▎         | 1/30 [00:01<00:30,  1.04s/it]

Loss: 321.42041778564453


  7%|▋         | 2/30 [00:02<00:29,  1.04s/it]

Loss: 375.810471534729


 10%|█         | 3/30 [00:03<00:28,  1.04s/it]

Loss: 315.74020195007324


 13%|█▎        | 4/30 [00:04<00:27,  1.04s/it]

Loss: 341.56250190734863


 17%|█▋        | 5/30 [00:05<00:26,  1.04s/it]

Loss: 341.40625190734863


 20%|██        | 6/30 [00:06<00:24,  1.04s/it]

Loss: 341.3541679382324


 23%|██▎       | 7/30 [00:07<00:23,  1.04s/it]

Loss: 341.875


 27%|██▋       | 8/30 [00:08<00:22,  1.04s/it]

Loss: 341.77083587646484


 30%|███       | 9/30 [00:09<00:21,  1.04s/it]

Loss: 341.3020839691162


 33%|███▎      | 10/30 [00:10<00:20,  1.04s/it]

Loss: 341.25000190734863


 37%|███▋      | 11/30 [00:11<00:19,  1.04s/it]

Loss: 341.6666679382324


 40%|████      | 12/30 [00:12<00:18,  1.04s/it]

Loss: 341.8229179382324


 43%|████▎     | 13/30 [00:13<00:17,  1.04s/it]

Loss: 341.56250190734863


 47%|████▋     | 14/30 [00:14<00:16,  1.04s/it]

Loss: 341.4583339691162


 50%|█████     | 15/30 [00:15<00:15,  1.04s/it]

Loss: 341.6145839691162


 53%|█████▎    | 16/30 [00:16<00:14,  1.04s/it]

Loss: 341.71875190734863


 57%|█████▋    | 17/30 [00:17<00:13,  1.04s/it]

Loss: 341.40625190734863


 60%|██████    | 18/30 [00:18<00:12,  1.03s/it]

Loss: 341.5104179382324


 63%|██████▎   | 19/30 [00:19<00:11,  1.03s/it]

Loss: 341.6666679382324


 67%|██████▋   | 20/30 [00:20<00:10,  1.03s/it]

Loss: 341.77083587646484


 70%|███████   | 21/30 [00:21<00:09,  1.03s/it]

Loss: 341.4583339691162


 73%|███████▎  | 22/30 [00:22<00:08,  1.03s/it]

Loss: 341.71875190734863


 77%|███████▋  | 23/30 [00:23<00:07,  1.03s/it]

Loss: 341.71875190734863


 80%|████████  | 24/30 [00:24<00:06,  1.03s/it]

Loss: 341.5104179382324


 83%|████████▎ | 25/30 [00:25<00:05,  1.04s/it]

Loss: 341.3541679382324


 87%|████████▋ | 26/30 [00:26<00:04,  1.04s/it]

Loss: 341.6666679382324


 90%|█████████ | 27/30 [00:27<00:03,  1.03s/it]

Loss: 341.0416679382324


 93%|█████████▎| 28/30 [00:29<00:02,  1.03s/it]

Loss: 341.6145839691162


 97%|█████████▋| 29/30 [00:30<00:01,  1.03s/it]

Loss: 341.6666679382324


100%|██████████| 30/30 [00:31<00:00,  1.04s/it]

Loss: 341.25000190734863





In [13]:
trained_model,training_losses = train(training_data, num_epochs, 128, Full_LSTM_Model, embed_dim = embed_dim, bidi = True, attention = True)

  "num_layers={}".format(dropout, num_layers))
  3%|▎         | 1/30 [00:01<00:32,  1.11s/it]

Loss: 7.180750131607056


  7%|▋         | 2/30 [00:02<00:30,  1.10s/it]

Loss: 6.726059198379517


 10%|█         | 3/30 [00:03<00:29,  1.09s/it]

Loss: 6.02546301484108


 13%|█▎        | 4/30 [00:04<00:28,  1.09s/it]

Loss: 5.329063594341278


 17%|█▋        | 5/30 [00:05<00:27,  1.09s/it]

Loss: 4.747676312923431


 20%|██        | 6/30 [00:06<00:26,  1.09s/it]

Loss: 4.297030806541443


 23%|██▎       | 7/30 [00:07<00:25,  1.09s/it]

Loss: 3.5883412808179855


 27%|██▋       | 8/30 [00:08<00:23,  1.09s/it]

Loss: 3.0509005934000015


 30%|███       | 9/30 [00:09<00:22,  1.09s/it]

Loss: 2.1718843430280685


 33%|███▎      | 10/30 [00:10<00:21,  1.09s/it]

Loss: 1.3834800384938717


 37%|███▋      | 11/30 [00:11<00:20,  1.09s/it]

Loss: 1.0970782116055489


 40%|████      | 12/30 [00:13<00:19,  1.09s/it]

Loss: 0.8731239698827267


 43%|████▎     | 13/30 [00:14<00:18,  1.09s/it]

Loss: 0.3953817803412676


 47%|████▋     | 14/30 [00:15<00:17,  1.09s/it]

Loss: 0.5424669510684907


 50%|█████     | 15/30 [00:16<00:16,  1.09s/it]

Loss: 0.7307494673877954


 53%|█████▎    | 16/30 [00:17<00:15,  1.09s/it]

Loss: 0.3719218969345093


 57%|█████▋    | 17/30 [00:18<00:14,  1.09s/it]

Loss: 0.2213728162460029


 60%|██████    | 18/30 [00:19<00:13,  1.09s/it]

Loss: 0.0609927698969841


 63%|██████▎   | 19/30 [00:20<00:11,  1.08s/it]

Loss: 0.027710496506188065


 67%|██████▋   | 20/30 [00:21<00:10,  1.08s/it]

Loss: 0.016836508293636143


 70%|███████   | 21/30 [00:22<00:09,  1.09s/it]

Loss: 0.007693573512369767


 73%|███████▎  | 22/30 [00:23<00:08,  1.09s/it]

Loss: 0.00448762602172792


 77%|███████▋  | 23/30 [00:25<00:07,  1.08s/it]

Loss: 0.0036835895880358294


 80%|████████  | 24/30 [00:26<00:06,  1.09s/it]

Loss: 0.0030698949558427557


 83%|████████▎ | 25/30 [00:27<00:05,  1.09s/it]

Loss: 0.0025964675296563655


 87%|████████▋ | 26/30 [00:28<00:04,  1.09s/it]

Loss: 0.002277467036037706


 90%|█████████ | 27/30 [00:29<00:03,  1.09s/it]

Loss: 0.001995962389628403


 93%|█████████▎| 28/30 [00:30<00:02,  1.09s/it]

Loss: 0.0017738451060722582


 97%|█████████▋| 29/30 [00:31<00:01,  1.09s/it]

Loss: 0.0015760693859192543


100%|██████████| 30/30 [00:32<00:00,  1.09s/it]

Loss: 0.0014144671586109325



