# Toxicity classification using BERT

**Description:** This notebook builds a classification model by fine tuning BERT to label comments with 6 classes 'toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'.

The data used for training the model was originally sourced from [Kaggle Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge). 

<a id = 'returnToTop'></a>

## Notebook Contents
  * 1. [Setup](#setup) 
  * 2. [Data](#data)  
  * 3. [Tokenization](#tokenization)
  * 4. [Model Training](#training)
  * 5. [Model Evaluation](#evaluation)


<a id = 'setup'></a>

## 1. Setup

Install required libraries

In [1]:
!pip install transformers==4.27.2 --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m32.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 KB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m64.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
!pip uninstall tensorflow --yes
!pip install tensorflow==2.11.0

Found existing installation: tensorflow 2.12.0
Uninstalling tensorflow-2.12.0:
  Successfully uninstalled tensorflow-2.12.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow==2.11.0
  Downloading tensorflow-2.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m588.3/588.3 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Collecting protobuf<3.20,>=3.9.2
  Downloading protobuf-3.19.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m66.5 MB/s[0m eta [36m0:00:00[0m
Collecting tensorboard<2.12,>=2.11
  Downloading tensorboard-2.11.2-py3-none-any.whl (6.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m74.7 MB/s[0m eta [36m0:00:00[0m
Collecting keras<2.12,>=2.11.0
  Downloading keras-2.11.0-py2.py3

Import required libraries

In [3]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras

from sklearn.metrics import classification_report

import matplotlib.pyplot as plt

In [4]:
from transformers import BertTokenizer, TFBertModel

In [5]:
import transformers
print(transformers.__version__)
print(tf.__version__)

4.27.2
2.11.0


[Return to Top](#returnToTop)  
<a id = 'data'></a>

## 2. Data

The jigsaw database has been downloaded from kaggle, cleaned and preprocessed and split into train, validation and test datasets. The datsets are stored on amazon S3 where we will be accessing them from.

In [6]:
LOAD_TEST_DATA = False

In [7]:
if (LOAD_TEST_DATA):
  from google.colab import drive
  drive.mount('/content/drive')

  df_train = pd.read_csv("/content/drive/My Drive/Colab Notebooks/w266project/sample_train_data.csv")
  df_valid = pd.read_csv("/content/drive/My Drive/Colab Notebooks/w266project/sample_validation_data.csv")
  df_test = pd.read_csv("/content/drive/My Drive/Colab Notebooks/w266project/sample_test_data.csv")
  
else:
  df_train = pd.read_csv("https://adamhyman-public.s3.amazonaws.com/w266/for_modeling/augmented_and_balanced/train_data_balanced.csv")

  df_valid = pd.read_csv("https://adamhyman-public.s3.amazonaws.com/w266/for_modeling/augmented_and_balanced/validation_data_balanced.csv")

  df_test = pd.read_csv("https://adamhyman-public.s3.amazonaws.com/w266/for_modeling/test_data.csv")

In [8]:
df_train.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,f755931dc4dcc548,actually i think what you mean to say is this ...,0.0,0.0,0.0,0.0,0.0,0.0
1,9d739901c70b13b6,how to kill mozart wooooooooooooooooooooooooo...,1.0,0.0,0.0,1.0,0.0,0.0
2,ce5a99228b180b90,were have i edited another users comments tha...,0.0,0.0,0.0,0.0,0.0,0.0
3,b27340ae18c81148,when i wanted to create the article there we...,0.0,0.0,0.0,0.0,0.0,0.0
4,9d9684f671a003b8,page is done and up enjoy 2092122850,0.0,0.0,0.0,0.0,0.0,0.0


In [9]:
df_test = df_test.dropna(how='any',axis=0) 

In [10]:
#covert labels to inetger
df_train['toxic'] = df_train['toxic'].astype(int)
df_train['severe_toxic'] = df_train['severe_toxic'].astype(int)
df_train['obscene'] = df_train['obscene'].astype(int)
df_train['threat'] = df_train['threat'].astype(int)
df_train['insult'] = df_train['insult'].astype(int)
df_train['identity_hate'] = df_train['identity_hate'].astype(int)
df_train.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,f755931dc4dcc548,actually i think what you mean to say is this ...,0,0,0,0,0,0
1,9d739901c70b13b6,how to kill mozart wooooooooooooooooooooooooo...,1,0,0,1,0,0
2,ce5a99228b180b90,were have i edited another users comments tha...,0,0,0,0,0,0
3,b27340ae18c81148,when i wanted to create the article there we...,0,0,0,0,0,0
4,9d9684f671a003b8,page is done and up enjoy 2092122850,0,0,0,0,0,0


In [11]:
#split input and output variables
train_comments, train_labels = df_train["comment_text"], df_train[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']]
valid_comments, valid_labels = df_valid["comment_text"], df_valid[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']]
test_comments, test_labels = df_test["comment_text"], df_test[['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']]

In [12]:
#covert to tensors
train_comments, train_labels = tf.convert_to_tensor(train_comments), tf.convert_to_tensor(train_labels)
valid_comments, valid_labels = tf.convert_to_tensor(valid_comments), tf.convert_to_tensor(valid_labels)
test_comments, test_labels = tf.convert_to_tensor(test_comments), tf.convert_to_tensor(test_labels)

In [13]:
#verify input data
train_comments[:4]

<tf.Tensor: shape=(4,), dtype=string, numpy=
array([b'actually i think what you mean to say is this clostridium difficile infection can cause pseudomembraneous colitis which can lead in extreme cases to toxic megacolon the c diff infection often comes about as a result of taking a broadspectrum antibiotic such as clinamycin which destroys the normal gut flora enabling c diff if present or introduced to completely take over normally the beneficial species of microorganisms in the gut flora can overcome c diff',
       b'how to kill mozart  wooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo nothing',
       b' were have i edited another users comments that is a very serious accusation and if you do not provide me with any evidence i will take action against you for it pro ',
       b'  when i wanted to create the article there were two options 1 create in arti

In [14]:
#verify outpit labels
train_labels[:4]

<tf.Tensor: shape=(4, 6), dtype=int64, numpy=
array([[0, 0, 0, 0, 0, 0],
       [1, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])>

[Return to Top](#returnToTop)  
<a id = 'tokenization'></a>
## 3. Tokenization

Get the pre-trained BERT model and tokenizer.

In [15]:
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
bert_model = TFBertModel.from_pretrained('bert-base-cased')

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading tf_model.h5:   0%|          | 0.00/527M [00:00<?, ?B/s]

Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


In [16]:
# BERT Tokenization of training and validation data
MAX_SEQUENCE_LENGTH = 128                 # set max_length of the input sequence

train_examples = [x.decode('utf-8') for x in train_comments.numpy()]
valid_examples = [x.decode('utf-8') for x in valid_comments.numpy()]

x_train = bert_tokenizer(train_examples,
              max_length=MAX_SEQUENCE_LENGTH,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_train = train_labels

x_valid = bert_tokenizer(valid_examples,
              max_length=MAX_SEQUENCE_LENGTH,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_valid = valid_labels

[Return to Top](#returnToTop)  
<a id = 'model'></a>

# 4. Model definition and training


Define the model...

In [17]:
def create_bert_classification_model(bert_model,
                                     num_train_layers=0,
                                     hidden_size=200, 
                                     dropout=0.3,
                                     learning_rate=0.00005):
    """
    Build a simple classification model with BERT. Use the Pooler Output for classification purposes
    """
    if num_train_layers == 0:
        # Freeze all layers of pre-trained BERT model
        bert_model.trainable = False

    elif num_train_layers == 12: 
        # Train all layers of the BERT model
        bert_model.trainable = True

    else:
        # Restrict training to the num_train_layers outer transformer layers
        retrain_layers = []

        for retrain_layer_number in range(num_train_layers):

            layer_code = '_' + str(11 - retrain_layer_number)
            retrain_layers.append(layer_code)
          
        
        print('retrain layers: ', retrain_layers)

        for w in bert_model.weights:
            if not any([x in w.name for x in retrain_layers]):
                #print('freezing: ', w)
                w._trainable = False

    input_ids = tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int64, name='input_ids_layer')
    token_type_ids = tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int64, name='token_type_ids_layer')
    attention_mask = tf.keras.layers.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int64, name='attention_mask_layer')

    bert_inputs = {'input_ids': input_ids,
                   'token_type_ids': token_type_ids,
                   'attention_mask': attention_mask}      

    bert_out = bert_model(bert_inputs)

    pooler_token = bert_out[1]
    #cls_token = bert_out[0][:, 0, :]

    hidden = tf.keras.layers.Dense(hidden_size, activation='relu', name='hidden_layer')(pooler_token)

    hidden = tf.keras.layers.Dropout(dropout)(hidden)  

    classification = tf.keras.layers.Dense(6, activation='sigmoid',name='classification_layer')(hidden)
    
    classification_model = tf.keras.Model(inputs=[input_ids, token_type_ids, attention_mask], outputs=[classification])
    
    classification_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                                 loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), 
                                 metrics='accuracy')
    
    return classification_model

In [18]:
bert_classification_model = create_bert_classification_model(bert_model, num_train_layers=4)

retrain layers:  ['_11', '_10', '_9', '_8']


In [19]:
#confirm all layers are frozen
bert_classification_model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 attention_mask_layer (InputLay  [(None, 128)]       0           []                               
 er)                                                                                              
                                                                                                  
 input_ids_layer (InputLayer)   [(None, 128)]        0           []                               
                                                                                                  
 token_type_ids_layer (InputLay  [(None, 128)]       0           []                               
 er)                                                                                              
                                                                                              

In [20]:
bert_classification_model_history = bert_classification_model.fit(
    [x_train.input_ids, x_train.token_type_ids, x_train.attention_mask],
    y_train,
    validation_data=([x_valid.input_ids, x_valid.token_type_ids, x_valid.attention_mask], y_valid),
    batch_size=32,
    epochs=2
)  

Epoch 1/2




Epoch 2/2


[Return to Top](#returnToTop)  
<a id = 'evaluation'></a>

# 4. Model Evaluation

In [21]:
#Run some tests
test_commment = ['what a stupid useless creature']
test_tokens = bert_tokenizer(test_commment,
              max_length=MAX_SEQUENCE_LENGTH,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')

test_predictions = bert_classification_model.predict([test_tokens.input_ids, test_tokens.token_type_ids, test_tokens.attention_mask], batch_size=32)
test_pred = np.where(test_predictions>=0.5, 1, 0)
target_names = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']

pred_df = pd.DataFrame(data = test_pred, columns = target_names)
print(pred_df)

   toxic  severe_toxic  obscene  threat  insult  identity_hate
0      1             0        1       0       1              0


In [22]:
#Prepare test data
test_examples = [x.decode('utf-8') for x in test_comments.numpy()]

x_test = bert_tokenizer(test_examples,
              max_length=MAX_SEQUENCE_LENGTH,
              truncation=True,
              padding='max_length', 
              return_tensors='tf')
y_test = test_labels

In [23]:
test_examples[:10]

['thank you for understanding i think very highly of you and would not revert without discussion',
 'dear god this site is horrible',
 ' somebody will invariably try to add religion  really  you mean the way people have invariably kept adding religion to the samuel beckett infobox  and why do you bother bringing up the longdead completely nonexistent influences issue  you are just flailing making up crap on the fly    for comparison the only explicit acknowledgement in the entire amos oz article that he is personally jewish is in the categories       ',
 '    it says it right there that it is a type the type of institution is needed in this case because there are three levels of suny schools   university centers and doctoral granting institutions   state colleges   community colleges    it is needed in this case to clarify that ub is a suny center it says it even in binghamton university university at albany state university of new york and stony brook university stop trying to say it 

In [24]:
# run the trained model on the test data (the model outputs probabilities)
#y_test_predictions = bert_classification_model(x_test)
y_test_predictions = bert_classification_model.predict([x_test.input_ids, x_test.token_type_ids, x_test.attention_mask], batch_size=32)

# apply the threshold function to create a 0, 1 outcome
y_test_pred = np.where(y_test_predictions>=0.5, 1, 0)
y_test_pred[:10] # first 10 only



array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0]])

In [25]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_test_pred, target_names=['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'], zero_division = 0))

               precision    recall  f1-score   support

        toxic       0.55      0.66      0.60      6090
 severe_toxic       0.10      0.80      0.17       367
      obscene       0.52      0.77      0.62      3691
       threat       0.19      0.84      0.31       211
       insult       0.49      0.71      0.58      3427
identity_hate       0.19      0.84      0.31       712

    micro avg       0.42      0.72      0.53     14498
    macro avg       0.34      0.77      0.43     14498
 weighted avg       0.49      0.72      0.57     14498
  samples avg       0.04      0.06      0.05     14498



In [26]:
from sklearn.metrics import hamming_loss
hamming_loss(y_test, y_test_pred)

0.048482148439942474

In [28]:
df_train.shape

(13144, 8)

In [29]:
df_train.sum()

id               f755931dc4dcc5489d739901c70b13b6ce5a99228b180b...
comment_text     actually i think what you mean to say is this ...
toxic                                                         4346
severe_toxic                                                  2018
obscene                                                       3132
threat                                                        1524
insult                                                        3072
identity_hate                                                 1951
dtype: object