<a href="https://colab.research.google.com/github/julrods/cyber-bullying-detector/blob/main/3_Evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"></ul></div>

# Environment

## Libraries

In [None]:
!pip install transformers

In [2]:
import os
import tensorflow as tf
import pandas as pd
import numpy as np
import pickle
from sklearn.metrics import confusion_matrix, f1_score, classification_report
from transformers import TFBertModel, BertConfig, TFBertForSequenceClassification

In [3]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## Functions

In [None]:
def load_vectors(dataset_name):
  pickle_inp_path = f'/content/gdrive/MyDrive/Cyber-bullying-project/data/3_tokenized_data/bert_inp_{dataset_name}.pkl'
  pickle_mask_path = f'/content/gdrive/MyDrive/Cyber-bullying-project/data/3_tokenized_data/bert_mask_{dataset_name}.pkl'

  input_ids = pickle.load(open(pickle_inp_path, 'rb'))
  attention_masks = pickle.load(open(pickle_mask_path, 'rb'))

  return input_ids, attention_masks

In [None]:
def bert_setup():
  base_model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
  
  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
  metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
  optimizer = tf.keras.optimizers.Adam(learning_rate=2e-5,
                                       epsilon=1e-08)
  
  base_model.compile(loss = loss, optimizer = optimizer, metrics = [metric])
  
  return base_model

In [None]:
def evaluate_model(model_name, inputs, mask, base_model):
  model_save_path = f'/content/gdrive/MyDrive/Cyber-bullying-project/models/{model_name}.h5'
  base_model.load_weights(model_save_path)
  trained_model = base_model
  
  preds = trained_model.predict([inputs, mask],
                                batch_size=32)
  
  pred_labels = [np.argmax(pred) for pred in preds[0]]

  return pred_labels

# Evaluation

In [None]:
base_model = bert_setup()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_‚Ä¶




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=536063208.0, style=ProgressStyle(descri‚Ä¶




All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
input_ids, attention_masks = load_vectors('eval_data')

In [None]:
pred_labels = evaluate_model('aggression_model_1epoch', input_ids, attention_masks, base_model)

In [7]:
eval_data_path = '/content/gdrive/MyDrive/Cyber-bullying-project/data/4_evaluation_data/clean_evaluation_data.csv'

In [None]:
eval_data = pd.read_csv(eval_data_path)

In [None]:
eval_data['label'] = pred_labels

In [None]:
labeled_eval_data = eval_data[['text', 'label']]

In [4]:
labeled_eval_data_path = '/content/gdrive/MyDrive/Cyber-bullying-project/data/4_evaluation_data/labeled_evaluation_data.csv'

In [None]:
labeled_eval_data.to_csv(labeled_eval_data_path, index = False)

I manually labeled the ones that were labeled by the model as aggressive comments (class 1) but that were actually positive or neutral comments (class 0).

In [5]:
labeled_eval_data_checked_path = '/content/gdrive/MyDrive/Cyber-bullying-project/data/4_evaluation_data/labeled_evaluation_data_checked.csv'

In [6]:
checked = pd.read_csv(labeled_eval_data_checked_path) 

In [10]:
checked.head()

Unnamed: 0,text,label,wrong_label
0,Rapist,1,
1,Racist pedo,1,
2,I so fucking love you Kevin! Happy to see you'...,1,1.0
3,fuck you asshole,1,
4,You are My perfec daddy ‚ù§Ô∏èüòçüòç,1,1.0


In [13]:
# I only wrote 1 for the mislabeled instances, so we have to fill the null values with 0 for the ones that are correctly labeled
checked['wrong_label'] = checked['wrong_label'].fillna(0).astype(int)

In [17]:
checked.head()

Unnamed: 0,text,label,wrong_label
0,Rapist,1,0
1,Racist pedo,1,0
2,I so fucking love you Kevin! Happy to see you'...,1,1
3,fuck you asshole,1,0
4,You are My perfec daddy ‚ù§Ô∏èüòçüòç,1,1


In [None]:
# Save the file with the filled values
checked.to_csv(labeled_eval_data_checked_path, index = False)

In [24]:
# Precision
precision = 1 - checked['wrong_label'].sum() / len(checked)
precision_percent = (1 - checked['wrong_label'].sum() / len(checked)) * 100
print(f'Out of all the instances labelled as aggressive comments, {precision_percent:.2f}% were correct. The precision of the model is {precision:.2f}')

Out of all the instances labelled as aggressive comments, 85.64% were correct. The precision of the model is 0.86


In [24]:
# Sample of false positives: 
checked[checked['wrong_label']==1].sample(5)

Unnamed: 0,text,label,wrong_label
2460,Welcomback bossüî•,1,1
1120,You are fucking AWESOME!,1,1
47,Awww yissss ima watch this shit,1,1
2312,"Y‚Äôall crazy, accuser was anonymous and died. I...",1,1
2271,Shit men. Come back to house of cards.,1,1


Most of the false positives contain words that are negative but used in a "friendly" manner:  
- beast
- boss
- bullshit
- crack
- crap
- fuck/fucking
- goat
- motherfucker
- savage
- shit
- son of a bitch
- stupid
- sucks

Examples: 
- I fucking love you Kevin
- You're the a fucking God you son of a bitch
- Fuck yeah
- Hell yes
- House of Cards sucks without you

To improve the model we could train it again with sentences that have swear words, some that are truly aggressive way and some that are just friendly banter. 