## Setting up the Environment

First, we need to install Hugging Face [transformers](https://huggingface.co/transformers/index.html) and other needed libraries for this notebook to run. Next we will import needed dependencies

In [None]:
!pip install transformers
!pip install sentencepiece
!pip install ipywidgets
!jupyter nbextension enable --py widgetsnbextension
! pip install torch

Collecting transformers
  Downloading transformers-4.17.0-py3-none-any.whl (3.8 MB)
[K     |████████████████████████████████| 3.8 MB 14.3 MB/s 
Collecting tokenizers!=0.11.3,>=0.11.1
  Downloading tokenizers-0.11.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB)
[K     |████████████████████████████████| 6.5 MB 50.4 MB/s 
[?25hCollecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 49.9 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 7.6 MB/s 
[?25hCollecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 62.4 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting uninstall: pyyaml
 

In [None]:
import torch
import transformers
import torch.nn as nn
from torch.utils.data import DataLoader
from tqdm.notebook import tqdm
from transformers import DebertaTokenizer, DebertaForSequenceClassification, Trainer, TrainingArguments
import pandas as pd
import numpy as np

from sklearn.metrics import classification_report
from sklearn.utils import class_weight

if not torch.cuda.is_available():
  print('WARNING: You may want to change the runtime to GPU for faster training!')
  DEVICE = 'cpu'
else:
  DEVICE = 'cuda:0'

## Getting the data

The first part is to get the training and testing data. For training data we load it in from a pre-processed csv file (preprocessing techniques are described in the report). For testing (official dev set) we use the official id-s to get the dev data from the overall original dataset

In [None]:
from dont_patronize_me import DontPatronizeMe
# Initialize a dpm (Don't Patronize Me) object.
# It takes two areguments as input: 
# (1) Path to the directory containing the training set files, which is the root directory of this notebook.
# (2) Path to the test set, which will be released when the evaluation phase begins. In this example, 
# we use the dataset for Subtask 1, which the code will load without labels.
dpm = DontPatronizeMe('.', '.')
# This method loads the subtask 1 data
dpm.load_task1()
data = dpm.train_task1_df

In [None]:
#Read in the test labels and make them into string
teids = pd.read_csv('dev_semeval_parids-labels.csv')
teids.par_id = teids.par_id.astype(str)

In [None]:
# Use the labels to get the official dev set from the original training set and make it into a pandas dataframe
rows = [] # will contain par_id, label and text
for idx in range(len(teids)):  
  parid = teids.par_id[idx]
  #print(parid)
  # select row from original dataset
  text = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].text.values[0]
  label = dpm.train_task1_df.loc[dpm.train_task1_df.par_id == parid].label.values[0]
  rows.append({
      'par_id':parid,
      'text':text,
      'label':label
  })
tedf = pd.DataFrame(rows)

In [None]:
#Read in the training dataset from a csv file into a pandas dataframe
trdf = pd.read_csv("Augmented_set.csv")

## Loading and preprocessing the corpus 


Next we will preprocess the datasets and make a DontPatronizeMe classes for training

In [None]:
# Make a CW reader function that returns a dictionary with text and labels as keys
def reader_CW(input_df):
    return {'texts':input_df["text"].tolist(), 'labels':input_df["label"].tolist()}

In [None]:
# Shuffle the test and training dataframes and make them into the dictionary using reader_CW
trdf = trdf.sample(frac=1).reset_index(drop=True)
tedf = tedf.sample(frac=1).reset_index(drop=True)

trainset = reader_CW(trdf)
testset = reader_CW(tedf)

In [None]:
# Make a DontPatronizeMe class to hold the dataset, tokenizer and collate function which will be used later for training
class DontPatronizeMe(torch.utils.data.Dataset):

    def __init__(self, tokenizer, input_set):

        self.tokenizer = tokenizer
        self.texts = input_set['texts']
        self.labels = input_set['labels']
        
    def collate_fn(self, batch):

        texts = []
        labels = []

        for b in batch:
            texts.append(b['texts'])
            labels.append(b['labels'])
 
        # We also pad shorter sentences to a length of 256 tokens
        encodings = self.tokenizer(texts, return_tensors='pt', padding=True, truncation=True, max_length=256)
        labels_why = {}
        encodings['labels'] =  torch.tensor(labels)
        return encodings
    
    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
       
        item = {'texts': self.texts[idx],
                'labels': self.labels[idx]}
        return item

In [None]:
# Load the Deberta tokenizer and make the training and testing DontPatronizeMe classes
tokenizer = DebertaTokenizer.from_pretrained('microsoft/deberta-base')
train_dataset = DontPatronizeMe(tokenizer, trainset)
test_dataset = DontPatronizeMe(tokenizer, testset)

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/474 [00:00<?, ?B/s]

## Finetuning a pre-trained Deberta model



To Finetune the pretrained model we define firstly a Trainer class where we specify the loss computation and also a main_training function where we specify the hyperparameters, metrics, validation and training datasets and their evaluation. We also call the model, start the training loop and implement any callbacks if needed (during hyperparametertuning we used the transformer.EarlyStoppingCallback to specify a early stopping mechanism which isnt used here as the parameters are fixed). We also specify where to save the model for it to be loaded later during evaluation 

In [None]:
#Defining the Trainer class and the loss function
class Trainer_hate_speech(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):

        labels = inputs.pop('labels')
        outputs = model(**inputs, labels=labels)
        loss = outputs.loss
        return (loss, outputs) if return_outputs else loss

In [None]:
# Specify the main training loop with parameters and training of the model
def main_training():

    #call our custom DeBERTa model and pass as parameter the name of an available pretrained model
    model = DebertaForSequenceClassification.from_pretrained("microsoft/deberta-base")
    
    training_args = TrainingArguments(
        output_dir='./model/',
        learning_rate = 2e-5,
        warmup_steps = 100, 
        lr_scheduler_type = "linear" ,
        logging_steps= 100,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs = 4,
        adam_beta1 = 0.9,
        adam_beta2 = 0.999,
        adam_epsilon = 1e-6,
        max_grad_norm  = 1.0, 
        seed = 420,
        save_strategy = "no"
    )
    

    trainer = Trainer_hate_speech(
        model=model,                         
        args=training_args,                 
        train_dataset=train_dataset,                   
        data_collator=train_dataset.collate_fn,
    )
    trainer.train()

    trainer.save_model('./model/')



In [None]:
# Start the training
main_training()

Downloading:   0%|          | 0.00/533M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaForSequenceClassification: ['lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.dense.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.weight']
- This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-base and are newly initialized: ['pooler.dense.weight', 'classi

Step,Training Loss
100,0.647
200,0.4237
300,0.2815
400,0.3113
500,0.274
600,0.2176
700,0.1455
800,0.1665
900,0.1195
1000,0.1275




Training completed. Do not forget to share your model on huggingface.co/models =)


Saving model checkpoint to ./model/
Configuration saved in ./model/config.json
Model weights saved in ./model/pytorch_model.bin


### Evaluation
Once we trained our model, we evaluate it on the official dev set 

Let's define a helper function ``predict_labels`` that will extract the predicted labels.

In [None]:
def predict_labels(input, tokenizer, model): 
  model.eval()
  encodings = tokenizer(input, return_tensors='pt', padding=True, truncation=True, max_length=256)
  
  output = model(**encodings)
  preds = torch.max(output.logits, 1)

  return {'prediction':preds[1], 'confidence':preds[0]}

Now let's define a function that will evaluate our model on the official dev set we prepared.

In [None]:
def evaluate(model, tokenizer, data_loader):

  total_count = 0
  correct_count = 0 

  preds = []
  tot_labels = []

  with torch.no_grad():
    for data in tqdm(data_loader): 

      labels = {}
      labels['labels'] = data['labels']

      tweets = data['texts']

      pred = predict_labels(tweets, tokenizer, model)

      preds.append(pred['prediction'].tolist())
      tot_labels.append(labels['labels'].tolist())

  # with the saved predictions and labels we can compute accuracy, precision, recall and f1-score
  report = classification_report(tot_labels, preds, target_names=["Not PCL","PCL"], output_dict= True)

  return report

In [18]:
# Start the evaluation and pring out the metrics
# To load the exact model which achieved the 61.7% F1 score stated in the report, specify './model/report_model/' as model_name below
# If no model has been retrained above, currently the same model parameter files are also at the './model/' address

tokenizer = DebertaTokenizer.from_pretrained('microsoft/deberta-base')

#your saved model name here
model_name = './model/' 
model = DebertaForSequenceClassification.from_pretrained(model_name)

test_loader = DataLoader(test_dataset)

report = evaluate(model, tokenizer, test_loader)

print(report)

print("The total accuracy: ", report['accuracy'])
print("The f1 score of label 1: ", report['PCL']['f1-score'])
print("The precision of label 1: ", report['PCL']['precision'])
print("The recall of label 1: ", report['PCL']['recall'])


loading file https://huggingface.co/microsoft/deberta-base/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/ce0ac094af27cf80bbf403595a6d47f1fc632981bf1d4c5bf69968568cbea410.e8ad27cc324bb0dc448d4d95f63e48f72688fb318a4c4c3f623485621b0b515c
loading file https://huggingface.co/microsoft/deberta-base/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/05056f257c8d2b63ad16fd26f847c9ab9ee34e33cdfad926e132be824b237869.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b
loading file https://huggingface.co/microsoft/deberta-base/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/microsoft/deberta-base/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/microsoft/deberta-base/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c2bc27a1c7529c177696ff76b1e74cba8667be14e202359f20f9114e407f43e2.a39abb1c6179fb264c2db685f9a056b7cb8

  0%|          | 0/2094 [00:00<?, ?it/s]

{'Not PCL': {'precision': 0.9571577847439916, 'recall': 0.9667546174142481, 'f1-score': 0.9619322656865319, 'support': 1895}, 'PCL': {'precision': 0.65, 'recall': 0.5879396984924623, 'f1-score': 0.6174142480211081, 'support': 199}, 'accuracy': 0.9307545367717287, 'macro avg': {'precision': 0.8035788923719958, 'recall': 0.7773471579533552, 'f1-score': 0.7896732568538201, 'support': 2094}, 'weighted avg': {'precision': 0.9279675272635454, 'recall': 0.9307545367717287, 'f1-score': 0.9291915371691397, 'support': 2094}}
The total accuracy:  0.9307545367717287
The f1 score of label 1:  0.6174142480211081
The precision of label 1:  0.65
The recall of label 1:  0.5879396984924623
