**Cyberbullying Classifier**

We built this classifier on top of Google's pretrained Bert Base large language model. We did this in order to leverage it's latent understanding of language, allowing us to train a more robust 'cyberbullying' classifier with fewer training examples than would otherwise be necessary. 

We obtained the following quantitative results...

Accuracy: 89%; Precision: 91.23%; Recall: 96.81%
Of all instances of harassment identified, 91.23% of instances were true positives, i.e. correctly identified as harassment. 91.23% of the positives identifications were actually correct.
Of the total actual instances of harassment, 96.18% was correctly identified as harassment. 96.18% of actual positives were identified correctly. 

Qualitatively, our classifier is a little bit over-aggressive, generating many false positives. Inexplicably, messages like 'My mother fought 2 vote at the beginning of last century. Incredible women must still fight 4 equality' are labeled with concern scores over 90%. Furthermore, the classifier is extremely reactive to content which contains curse words. For example, 'THIS FUCKING POTATO IS BLOWING MY MIND. Duck fat. You guys. FIGURATIVELY DYING OF BLISS,' which we clearly interpret as a joke is labeled with a concern score of 86%.

Despite our decent performance metrics, we decided that Google's Perspective API would perform better in practice. Besides the more robust performance of Google's API, the primary reasons behind this decision were architectural. We would have two options to run our homemade model on our Discord bot: via API calls to a homemade server or by direct integration into the bot. The first option would necessitate getting our own server up and running which is feasible but would take additional development work and might cost money. The second option would mean running an instance of the model for every instance of our bot; given the large size of the Google Bert model, this choice is suboptimal. The Perspective API alleviates both of these issues because it is run by Google on Google servers and can be called from a lightweight bot. Additionally, given Google’s ability to collect and train on vast amounts of data, we believe their model would be able to outperform ours. Furthermore, the onus of maintaining, retraining, and updating the model is on Google rather than our small dev team.

If we were to continue developing this classifier, the most important thing to do would be to collect large amounts of up-to-date training data. Language tends to develop very quickly on the internet, as trolls and deviants create new forms of hate-speech in order to circumvent content moderation. If we could retrain the classifier every month, for instance, with new and modern data, we could feasibly keep up with the ever-changing landscape of hate speech and harrassment.

# Loading


In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import BertTokenizer, BertForSequenceClassification
from torch.utils.data import TensorDataset, DataLoader
from transformers import AdamW
import torch

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
# Load the dataset from Kaggle
data = pd.read_csv('/content/drive/MyDrive/cyberbullying_tweets.csv')
data.head()

Unnamed: 0,tweet_text,cyberbullying_type
0,"In other words #katandandre, your food was cra...",not_cyberbullying
1,Why is #aussietv so white? #MKR #theblock #ImA...,not_cyberbullying
2,@XochitlSuckkks a classy whore? Or more red ve...,not_cyberbullying
3,"@Jason_Gio meh. :P thanks for the heads up, b...",not_cyberbullying
4,@RudhoeEnglish This is an ISIS account pretend...,not_cyberbullying


In [None]:
labels = []
for i, row in data.iterrows():
  if row['cyberbullying_type'] == 'not_cyberbullying':
    labels.append(0)
  else:
    labels.append(1)

data['labels'] = labels
data.head(10)

Unnamed: 0,tweet_text,cyberbullying_type,labels
0,"In other words #katandandre, your food was cra...",not_cyberbullying,0
1,Why is #aussietv so white? #MKR #theblock #ImA...,not_cyberbullying,0
2,@XochitlSuckkks a classy whore? Or more red ve...,not_cyberbullying,0
3,"@Jason_Gio meh. :P thanks for the heads up, b...",not_cyberbullying,0
4,@RudhoeEnglish This is an ISIS account pretend...,not_cyberbullying,0
5,"@Raja5aab @Quickieleaks Yes, the test of god i...",not_cyberbullying,0
6,Itu sekolah ya bukan tempat bully! Ga jauh kay...,not_cyberbullying,0
7,Karma. I hope it bites Kat on the butt. She is...,not_cyberbullying,0
8,@stockputout everything but mostly my priest,not_cyberbullying,0
9,Rebecca Black Drops Out of School Due to Bully...,not_cyberbullying,0


In [None]:
# Split the dataset into input (tweets) and target (labels)
tweets = data['tweet_text'].values
labels = data['labels'].values

# Split the dataset into training, validation, and test sets
train_tweets, test_tweets, train_labels, test_labels = train_test_split(tweets, labels, test_size=0.2, random_state=42)
train_tweets, val_tweets, train_labels, val_labels = train_test_split(train_tweets, train_labels, test_size=0.2, random_state=42)

# Training

In [None]:
# Load the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

# Tokenize and encode the training set
train_encodings = tokenizer.batch_encode_plus(
    train_tweets,
    add_special_tokens=True,
    max_length=512,
    truncation=True,
    padding=True,
    return_attention_mask=True,
    return_tensors='pt'
)

# Tokenize and encode the validation set
val_encodings = tokenizer.batch_encode_plus(
    val_tweets,
    add_special_tokens=True,
    max_length=512,
    truncation=True,
    padding=True,
    return_attention_mask=True,
    return_tensors='pt'
)


Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

In [None]:
# Create PyTorch datasets
train_dataset = TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels))
val_dataset = TensorDataset(val_encodings['input_ids'], val_encodings['attention_mask'], torch.tensor(val_labels))

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

NameError: ignored

In [None]:
# Load the pre-trained BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [None]:
# Set up the optimizer
optimizer = AdamW(model.parameters(), lr=1e-5)

# Training loop
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

epochs = 2

for epoch in range(epochs):
    model.train()
    total_loss = 0

    for batch in train_loader:
        optimizer.zero_grad()

        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        total_loss += loss.item()

        loss.backward()
        optimizer.step()

    avg_train_loss = total_loss / len(train_loader)

    # Evaluation on the validation set
    model.eval()
    val_accuracy = 0
    val_loss = 0

    with torch.no_grad():
        for batch in val_loader:
            input_ids = batch[0].to(device)
            attention_mask = batch[1].to(device)
            labels = batch[2].to(device)

            outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
            val_loss += outputs.loss.item()



In [None]:
# Save model
torch.save(model.state_dict(), '/content/drive/MyDrive/cyberbullying_model.pth')

# Testing

In [None]:
# Tokenize and encode the test set
test_encodings = tokenizer.batch_encode_plus(
    test_tweets,
    add_special_tokens=True,
    max_length=512,
    truncation=True,
    padding=True,
    return_attention_mask=True,
    return_tensors='pt'
)

# Create the test dataset
test_dataset = TensorDataset(test_encodings['input_ids'], test_encodings['attention_mask'], torch.tensor(test_labels))

# Create the test data loader
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

In [None]:
# Load the pre-trained BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model.load_state_dict(torch.load('/content/drive/MyDrive/cyberbullying_model.pth'))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

<All keys matched successfully>

In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [None]:
import numpy as np

model.eval()
test_accuracy = 0
test_loss = 0
predictions = []
probs = []
true_labels = []

with torch.no_grad():
    for batch in test_loader:
        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].to(device)

        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        logits = outputs.logits
        test_loss += outputs.loss.item()

        # Apply softmax to obtain probabilities
        probabilities = torch.softmax(logits, dim=1)
        probs.extend(probabilities.cpu().numpy())
        _, preds = torch.max(probabilities, dim=1)

        predictions.extend(preds.cpu().numpy())
        true_labels.extend(labels.cpu().numpy())

    test_accuracy = np.where(predictions == true_labels, 1, 0).mean()
    test_loss = test_loss / len(test_loader)

In [None]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(true_labels, predictions)
print("Confusion Matrix:")
print(cm)

Confusion Matrix:
[[ 896  728]
 [ 302 7613]]


In [None]:
test_accuracy = 0
for i in range(len(predictions)):
  if (predictions[i]==true_labels[i]):
    test_accuracy+=1

test_accuracy = test_accuracy/len(predictions)
print(test_accuracy)

0.8920222245518398


In [None]:
for i in range(len(predictions)):
  if predictions[i] == 1 and true_labels[i] == 0:
    print(str(probs[i]) + ' ' + test_tweets[i])

[0.44245297 0.5575471 ] @MajorPaulSmyth would love to go but got to be at the hospital with the daughter
[0.23573557 0.76426446] @FunkyreFresh @SuperSpacedad as a target of her delusions, yeah, i'm going to say these things about how she behaves towards me.
[0.41174752 0.5882524 ] @NNdabbour64 They kill him.
[0.34704334 0.65295666] @wilw magnificent.
[0.1447305 0.8552695] @ProErn man you bulling
[0.39245996 0.6075401 ] @kelli_nak Did you have a chance to do that thing we were discussing at #ladieswholunchslc?
[0.34773657 0.6522634 ] @DoctorAvenue selfies are rad. ^.^
[0.18208672 0.8179133 ] Gosh golly how dare she respond @Potatottamus @ProfessorF @chrisvcsefalvay
[0.335625 0.664375] RT @TheQuinnspiracy: "I am proud to announce this nonprofit organization""u r fat"God i love working on the internet.
[0.33016443 0.66983557] @8BitBecca possibly. Want to see how this evolves. Internet video conf may become a thing.
[0.37511417 0.62488586] @logicalmind11 http://t.co/PQtPUrEL3O
[0.41762468 

In [None]:
test_tweets[0]

'@Goree_JuhssGuns hahaha he ain\'t even worth my tweets dumb fuck don\'t knw the diff between "nigga" &amp; "nigger"'

In [None]:
print(test_loss)

0.23201145900341633


In [None]:
np.mean(predictions)

0.874410315546703

In [None]:
print(true_labels[:5])

[1, 1, 1, 1, 0]


In [None]:
np.where(predictions == true_labels, 1, 0).mean()

In [None]:
for tweet in range(20):
  

# Save for GitHub

In [None]:
# Load the pre-trained BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model.load_state_dict(torch.load('/content/drive/MyDrive/cs246h4q4/cyberbullying_model.pth'))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

<All keys matched successfully>

In [None]:
model.save_pretrained('/content/drive/MyDrive/')