> **University of Pisa - M.Sc. Computer Science, Artificial Intelligence**  
> **Human Language Technologies - a.a. 2021/22**
>
> *September, 2022*
>
>**Authors** 
- Irene Pisani *i.pisani1@studenti.unipi.it* (560104)
- Alice Bergonzini *a.bergonzini1@studenti.unipi.it* (560680)


###### **FINAL PROJECT on KEY POINT ANALYSIS (KPA)**
# **Track 2: Key Point Matching**

***Abstract.*** This work aims to describe simple approaches for solving *Key Point Matching* (KPM) and *Key Point Generation* (KPG) tracks proposed at Argument Mining 2021 in the context of the shared task on *Quantitative Summarization and Key Point Analysis* (KPA). \
The presented methods rely on the fine-tuning of some state-of-the-art pre-trained language models both for KPM and KPG subtasks. \
Regarding the KPM task all the models explored were validated using the Hold-Out validation technique and their results were compared to analyze their effectiveness within the task.  Leveraging DeBERTa pre-trained transformer, our best model yields to competitive performance since it achieved  on the test set a mAP Strict and mAP Relaxed score of, respectively, 0,7035 and 0,8857. \
 For the KPG task, a simple baseline based on abstractive summarization approach was provided; our system takes advantage of the pre-trained Google mT5 transformer to generate several points that are finally properly selected.


## **Settings**

- Define Colab GPU to use
- Download TR, VL e TS set from offial IBM reporsitory
- Install some required tools and libraries

In [1]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Sun Aug 28 10:12:48 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
! git clone "https://github.com/IBM/KPA_2021_shared_task"

Cloning into 'KPA_2021_shared_task'...
remote: Enumerating objects: 44, done.[K
remote: Counting objects: 100% (44/44), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 44 (delta 14), reused 26 (delta 4), pack-reused 0[K
Unpacking objects: 100% (44/44), done.


In [3]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.21.2-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 7.4 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
[K     |████████████████████████████████| 120 kB 50.0 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 44.2 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.21.2


## **Argument Mining KPA 2021: Function provided for standard evaluation method**
- Load dataset (TR, VL e tS)
- Load predictions stored in json file
- Evaluate predictions with official KPA-2021 metrics:mAP strict and mAP relaxed

In [4]:
import sys
import pandas as pd
from sklearn.metrics import precision_recall_curve, average_precision_score, precision_score
import numpy as np
import os
import json

# this function are not written by us: they are provided by the ArgMining KPA 2021 Shared task
# source
# we use this function to homologate our evaluation methos to one used in Shared task

def get_ap(df, label_column, top_percentile=0.5):
    top = int(len(df)*top_percentile)
    df = df.sort_values('score', ascending=False).head(top)
    # after selecting top percentile candidates, we set the score for the dummy kp to 1, to prevent it from increasing the precision.
    df.loc[df['key_point_id'] == "dummy_id", 'score'] = 0.99
    ap = average_precision_score(y_true=df[label_column], y_score=df["score"])
    # multiply by the number of positives in top 50% and devide by the number of max positives within the top 50%, which is the number of top 50% instances
    positives_in_top_predictions = sum(df[label_column])
    max_num_of_positives = len(df)
    ap_retrieval = ap * positives_in_top_predictions/max_num_of_positives
    return ap_retrieval

def calc_mean_average_precision(df, label_column):
    precisions = [get_ap(group, label_column) for _, group in df.groupby(["topic", "stance"])]
    return np.mean(precisions)

def evaluate_predictions(merged_df):
    #print("\n** running evalution:")
    mAP_strict = calc_mean_average_precision(merged_df, "label_strict")
    mAP_relaxed = calc_mean_average_precision(merged_df, "label_relaxed")
    #print(f"mAP strict= {mAP_strict} ; mAP relaxed = {mAP_relaxed}")
    return mAP_strict, mAP_relaxed

def load_kpm_data(gold_data_dir, subset, submitted_kp_file=None):
    #print("\nֿ** loading task data:")
    arguments_file = os.path.join(gold_data_dir, f"arguments_{subset}.csv")
    if not submitted_kp_file:
        key_points_file = os.path.join(gold_data_dir, f"key_points_{subset}.csv")
    else:
        key_points_file=submitted_kp_file
    labels_file = os.path.join(gold_data_dir, f"labels_{subset}.csv")


    arguments_df = pd.read_csv(arguments_file)
    key_points_df = pd.read_csv(key_points_file)
    labels_file_df = pd.read_csv(labels_file)


    for desc, group in arguments_df.groupby(["stance", "topic"]):
        stance = desc[0]
        topic = desc[1]
        key_points = key_points_df[(key_points_df["stance"] == stance) & (key_points_df["topic"] == topic)]
        #print(f"\t{desc}: loaded {len(group)} arguments and {len(key_points)} key points")
    return arguments_df, key_points_df, labels_file_df


def get_predictions(predictions_file, labels_df, arg_df, kp_df):
    #print("\nֿ** loading predictions:")
    arg_df = arg_df[["arg_id", "topic", "stance"]]
    predictions_df = load_predictions(predictions_file, kp_df["key_point_id"].unique())

    #make sure each arg_id has a prediction
    predictions_df = pd.merge(arg_df, predictions_df, how="left", on="arg_id")
    #print(predictions_df[predictions_df.isna().any(axis=1)])
    #handle arguements with no matching key point
    predictions_df["key_point_id"] = predictions_df["key_point_id"].fillna("dummy_id")
    predictions_df["score"] = predictions_df["score"].fillna(0)

    #merge each argument with the gold labels
    merged_df = pd.merge(predictions_df, labels_df, how="left", on=["arg_id", "key_point_id"])

    merged_df.loc[merged_df['key_point_id'] == "dummy_id", 'label'] = 0
    merged_df["label_strict"] = merged_df["label"].fillna(0)
    merged_df["label_relaxed"] = merged_df["label"].fillna(1)

    return merged_df


"""
this method chooses the best key point for each argument
and generates a dataframe with the matches and scores
"""
def load_predictions(predictions_dir, correct_kp_list):
    arg =[]
    kp = []
    scores = []
    invalid_keypoints = set()
    with open(predictions_dir, "r") as f_in:
        res = json.load(f_in)
        for arg_id, kps in res.items():
            valid_kps = {key: value for key, value in kps.items() if key in correct_kp_list}
            invalid = {key: value for key, value in kps.items() if key not in correct_kp_list}
            for invalid_kp, _ in invalid.items():
                if invalid_kp not in invalid_keypoints:
                    #print(f"key point {invalid_kp} doesn't appear in the key points file and will be ignored")
                    invalid_keypoints.add(invalid_kp)
            if valid_kps:
                best_kp = max(valid_kps.items(), key=lambda x: x[1])
                arg.append(arg_id)
                kp.append(best_kp[0])
                scores.append(best_kp[1])
        #print(f"\tloaded predictions for {len(arg)} arguments")
        return pd.DataFrame({"arg_id" : arg, "key_point_id": kp, "score": scores})

## **Parser class**

- Dataset pre-processing (lower case and punctuations removal)
- Create a first dataset with <argument, keypoint> pairs having annotated label (0,1)
- Create a second dataset with <argument, keypoint> pairs having annotated label (0,1, undecided)
- Execute tokenization of both dataset

In [5]:

import re
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download("stopwords")
nltk.download("punkt")
nltk.download("wordnet")
nltk.download('omw-1.4')

import torch
from transformers import AutoTokenizer
from torch.utils.data import TensorDataset

class DatasetParser():
  
  def __init__(self, tokenizer_name):

    self.arguments = None     # dataframe of arguments: [arg_id, argument, topic, stance]
    self.keypoints = None     # dataframe of keypoints: [kp_id, key_point, topic, stance]
    self.labels    = None     # dataframe of labels:    [arg_id, kp_id, label]
    
    self.merged_dataset    = None     # dataframe of merged information: [arg_id, argument, kp_id, key_points, topic, stance, label]
    self.tokenized_dataset = None     # tensor dataset: [ids, token_types_info, attention_mask, stance, label]
    self.preds = None                 # dataframe dataset to store model predictions, also pairs with undecided label are reported here
    self.tokenized_preds = None       # tensor dataset: [ids, token_types_info, attention_mask, stance, label]

    self.lemmatizer = WordNetLemmatizer()               # lemmatizer object loaded from nltk 
    self.stemmer    = PorterStemmer()                   # Stemmer object loaded from nltk 
    self.stop_words = set(stopwords.words("english"))   # set of stopwords for english language

    # tokenizer object loaded from hugging face pretrained tokenizer
    self.tokenizer_name = tokenizer_name
    self.tokenizer      = AutoTokenizer.from_pretrained(self.tokenizer_name) 

  # -------------------- Execute preprocessing and data cleaning on textual information ----------------------------------------

  def remove_punctuations(self, text):
    return re.sub(r'[^\w\s]', ' ', text)

  def remove_stopwords(self,text):
    return ' '.join([word for word in nltk.word_tokenize(text) if word not in self.stop_words])
  
  def lemmatize_words(self, text): #ot used
    return ' '.join(self.lemmatizer.lemmatize(word) for word in text.split())
  
  def stemming_words(self, text): # not used
    return ' '.join(self.stemmer.stem(word) for word in text.split())
  
  def lower_case(self, text):           
    return text.lower()
  
  def ohe_stance (self, stance):
    if stance == -1:
      return 0 
    else:
      return 1
  
  def preprocess_data(self):

    # used preprocessing technique: lower case, remove punctuations.
    
    # execute data preprocessing on arguments
    self.arguments["stance"] = self.arguments["stance"].apply(self.ohe_stance)               # one hot encoder over stance
    self.arguments["argument"] = self.arguments["argument"].apply(self.lower_case)           # trasform to lower case
    #self.arguments["argument"] = self.arguments["argument"].apply(self.remove_stopwords)     # remove stop words 
    self.arguments["argument"] = self.arguments["argument"].apply(self.remove_punctuations)  # remove punctuations
    #self.arguments["argument"] = self.arguments["argument"].apply(self.lemmatize_words)      # lemmatize words
    self.arguments["topic"] = self.arguments["topic"].apply(self.lower_case)           # trasform to lower case
    #self.arguments["topic"] = self.arguments["topic"].apply(self.remove_stopwords)     # remove stop words 
    self.arguments["topic"] = self.arguments["topic"].apply(self.remove_punctuations)  # remove punctuations
    #self.arguments["topic"] = self.arguments["topic"].apply(self.lemmatize_words)      # lemmatize words

    # execute data preprocessing on keypoints
    self.keypoints["stance"] = self.keypoints["stance"].apply(self.ohe_stance)                # one hot encoder over stance
    self.keypoints["key_point"] = self.keypoints["key_point"].apply(self.lower_case)          # trasform to lower case
    #self.keypoints["key_point"] = self.keypoints["key_point"].apply(self.remove_stopwords)    # remove stop words 
    self.keypoints["key_point"] = self.keypoints["key_point"].apply(self.remove_punctuations) # remove punctuations
    #self.keypoints["key_point"] = self.keypoints["key_point"].apply(self.lemmatize_words)     # lemmatize words
    self.keypoints["topic"] = self.keypoints["topic"].apply(self.lower_case)          # trasform to lower case
    #self.keypoints["topic"] = self.keypoints["topic"].apply(self.remove_stopwords)    # remove stop words 
    self.keypoints["topic"] = self.keypoints["topic"].apply(self.remove_punctuations) # remove punctuations
    #self.keypoints["topic"] = self.keypoints["topic"].apply(self.lemmatize_words)     # lemmatize words

    return 
  
  # -------------------- Create a dataset merging information from arguments, keypoints and labels ----------------------------------------

  def get_merged_dataset (self):

    # create a merged dataset with all <argument, keypoint> pairs for which an annotated label (0 or 1) exist
    # do not consider <argument, keypoint> pairs with undecided label
    
    self.merged_dataset = self.labels.merge(self.arguments, on="arg_id")
    self.merged_dataset = self.merged_dataset.merge(self.keypoints, on = "key_point_id")
    self.merged_dataset.drop(["stance_x", "topic_x"], axis=1, inplace=True)
    self.merged_dataset = self.merged_dataset.rename(columns={"stance_y":"stance", "topic_y":"topic"})

    return

  # -------------------- Tokenize dataset to properly feed it to the model -----------------------------------------------------------------
  
  def get_tokenized_dataset (self ): 
    
    input_ids = []
    input_tti = []
    input_mask = []
    input_stance = []
    input_label = []

    # apply tokenization on all pairs <(argument+topic), keypoint> for 
    for i in range(len(self.merged_dataset)):
      
      encoded_input = self.tokenizer(self.merged_dataset["argument"][i] + self.merged_dataset["topic"][i],
                                     self.merged_dataset["key_point"][i],
                                     add_special_tokens = True, 
                                     max_length = 80, 
                                     padding = "max_length")
      
      input_ids.append(encoded_input["input_ids"])
      if self.tokenizer_name.startswith("bert-") == True:
        input_tti.append(encoded_input["token_type_ids"])
      input_mask.append(encoded_input["attention_mask"])
      input_stance.append(self.merged_dataset["stance"][i])
      input_label.append(self.merged_dataset["label"][i])

    # trasnform to tensors
    input_ids = torch.tensor(input_ids).squeeze()  
    input_mask = torch.tensor(input_mask).squeeze()
    input_stance = torch.tensor(input_stance).squeeze()
    input_label = torch.tensor(input_label).squeeze()
    
    # use token type id only if the used tokenizer is the bert tokenizer
    # create the final tokeinzed dataset made up with tensors 
    if self.tokenizer_name.startswith("bert-") == True:
      input_tti = torch.tensor(input_tti).squeeze()
      self.tokenized_dataset = TensorDataset(input_ids, input_tti, input_mask, input_stance, input_label)
    else:
      self.tokenized_dataset = TensorDataset(input_ids, input_mask, input_stance, input_label)

    return

    
  def get_tokenized_preds (self): 
    
    # do the same tokenization procedure reported above 
    # here use dataset composed by all <argument keypoint> pairs, also the pairs labelled with undecided
    
    input_ids = []
    input_tti = []
    input_mask = []
    input_stance = []
  
    for i in range(len(self.preds)):
      
      encoded_input = self.tokenizer(self.preds["argument"][i] + self.preds["topic"][i],
                                     self.preds["key_point"][i],
                                     add_special_tokens = True, 
                                     max_length = 80, 
                                     padding = "max_length")
      
      input_ids.append(encoded_input["input_ids"])
      if self.tokenizer_name.startswith("bert-") == True:
        input_tti.append(encoded_input["token_type_ids"])
      input_mask.append(encoded_input["attention_mask"])
      input_stance.append(self.preds["stance"][i])

    input_ids = torch.tensor(input_ids).squeeze()  
    input_mask = torch.tensor(input_mask).squeeze()
    input_stance = torch.tensor(input_stance).squeeze()
    
    if self.tokenizer_name.startswith("bert-") == True:
      input_tti = torch.tensor(input_tti).squeeze()
      self.tokenized_preds = TensorDataset(input_ids, input_tti, input_mask, input_stance)
    else:
      self.tokenized_preds= TensorDataset(input_ids, input_mask, input_stance)

    return
  
  def get_preds(self):

    arg_pred = []
    key_point_pred = []

    arg_id_pred = []
    key_point_id_pred = []
    stance = []
    topic = []

    # crate a dataset to use to evualate the model (not for training)
    # this dataset is composed by all <argument, keypoint> pairs and include also pairs labelled as undecided 
    # the pair labelled with undecided have NaN label

    for arg,arg_id,topic_arg,stance_arg  in zip(self.arguments['argument'],self.arguments['arg_id'],self.arguments['topic'],self.arguments['stance']):
      for kp,kp_id,topic_kp,stance_kp in zip(self.keypoints['key_point'],self.keypoints['key_point_id'],self.keypoints['topic'],self.keypoints['stance']):
        if ( topic_arg == topic_kp and stance_arg == stance_kp):
          
          arg_pred.append(arg)
          arg_id_pred.append(arg_id)
          key_point_pred.append(kp)
          key_point_id_pred.append(kp_id)
          topic.append(topic_arg)
          stance.append(stance_arg)

    self.preds = pd.DataFrame({'arg_id':arg_id_pred,'key_point_id':key_point_id_pred,'argument':arg_pred, 'key_point':key_point_pred, 'topic' : topic , 'stance': stance})
    
    return
  
  # -------------------- execute all preprocessing --------------------
  
  def get_data (self, data_directory, mode):

    # Load dataset using the official ArgMining function
    self.arguments, self.keypoints, self.labels = load_kpm_data(data_directory, mode)
    print(mode+" data has been loaded")

    # Execute data preprocessing and cleaning
    self.preprocess_data()
    print(mode+" data has been preprocessed")

    # Create a dataset with pairs <argument, keypoint> obtained merging information from argumets, keypoints and labels
    self.get_merged_dataset()
    print(mode+" dataset has been created")

    # tokenize the already created dataset
    self.get_tokenized_dataset()
    print(mode+" dataset has been tokenized and transformed to tensors")

    # Create a dataset with pairs <argument, keypoint> obtained merging information from argumets, keypoints
    self.get_preds()
    print(mode+" predictions file has been created")

    # tokenize the already created dataset
    self.get_tokenized_preds()
    print(mode+" tokenized predictions file has been created\n")

    # return a dataset dictionary with each processed information
    data_dict = {}
    data_dict["arguments_df"] = self.arguments                      # arguments set 
    data_dict["keypoints_df"] = self.keypoints                      # keypoints set 
    data_dict["labels_df"]    = self.labels                         # labels set 

    data_dict["merged_dataset_df"]    = self.merged_dataset         # dataset 1: <argument, keypoint, label> set with label (0,1)
    data_dict["tokenized_dataset_tensor"] = self.tokenized_dataset  # tokenized dataset 1
    data_dict["preds"] = self.preds                                 # dataset 2: <argument, keypoint, label> set with label (0,1, undecided)
    data_dict["tokenized_preds"] = self.tokenized_preds             # tokenized dataset 2

    return data_dict

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


## **Architecture class**

- Define model's architecture (by using a pretrained transformer)
- Define feed-forward procedure 

In [6]:
from transformers import AutoModel
import torch.nn as nn

# define model architecture to execute feed forward computations

class Architecture (nn.Module):

  def __init__(self, model_name, drop_out, out_unit_1):
    super(Architecture, self).__init__()
    
    self.model_name = model_name    # bert, roberta, albert or deberta in base version
    # transformer model 
    self.transformer_layer = AutoModel.from_pretrained(self.model_name)
   
    # add dense layers with dropuot
    self.dense_layer_1 = nn.Linear(769, out_unit_1)
    self.drop_out = nn.Dropout(drop_out)
    self.dense_layer_2 = nn.Linear(out_unit_1, 1)

    # apply sigmoid act. function in output (last dense layer)
    self.act_function = nn.Sigmoid()

  def forward(self, ids, mask, stance, tti=None):
    
    # execute feed-forward 

    # manage transformer ouput based on the given transformer 
    if self.model_name.startswith("bert-") == True:
      x = self.transformer_layer(input_ids=ids, token_type_ids=tti, attention_mask = mask).pooler_output
    else:
      hidden_state = self.transformer_layer(input_ids=ids, attention_mask = mask)[0]
      x = hidden_state[:,0]
    
    # concatenate transformer output with stance
    stance = torch.reshape(stance, (len(stance), 1))
    concat = torch.cat((x, stance), dim=1)
    
    x1 = self.dense_layer_1(concat)
    x1 = self.drop_out(x1)
    x2 = self.dense_layer_2(x1)
    x2 = self.drop_out(x2)
    
    out = self.act_function(x2)
    
    return out

## **Metrics**
- Monitor performance on TR set by computing classification report and confusion matrix
- Monitor performannce on Vl e TS set by computing classification report, connfusion matrix and official KPA-2021 metrics (mAP strict and relaxed) 

In [7]:
from sklearn.metrics import classification_report, confusion_matrix


def compute_metrics (mode, df, filename, lbl_df, arg_df, kp_df):

  # ----------------------------- Save predictions in json format -----------------------------
  
  # save all model predictions in json file
  # for each argumennt save the matching score with the corresponding key points
  args = {}
  for arg_,kp_,score in zip(df['arg_id'],df['key_point_id'],df['predictions']):
    args[arg_] = {}
  for arg_,kp_,score in zip(df['arg_id'],df['key_point_id'],df['predictions']):
    args[arg_][kp_] = score 
  with open(filename, 'w') as fp:
    fp.write(json.dumps(args))
    fp.close()

  merged_df = get_predictions(filename, lbl_df, arg_df, kp_df)  #DF CON PREDICTION (ARG, KP, SCORE, LABEL)
  
  # choose metrics to evaluate quality of model's prediction 


  # ----------------------------- Metric to analyze TR performance -----------------------------
  if mode=="train":
    
    merged_df.to_csv("prediction_results_TRAINING.csv")
    
    # compute Classification Report (Accuracy, Precision, Recall, F1) and Confusion Matrix
    merged_df['score'] = np.where(merged_df['score'] < 0.5, 0, 1) # put threshold to 0.5
    cr = classification_report(merged_df["label"].astype(int), merged_df["score"])
    cm = confusion_matrix(merged_df["label"].astype(int), merged_df["score"])
    
    return cr
  # ----------------------------- Metric to analyze VL and TS performance -----------------------------
  else: #mode=="test" or "eval"
    
    merged_df.to_csv("prediction_results_TEST.csv")
    
    # compute mAP Strict and mAP Relaxed
    mAP_strict, mAP_relaxed = evaluate_predictions(merged_df)
    
    # compute Accuracy, Precision, Recall, F1, Confusion Matrix
    merged_df = merged_df.dropna() # not consider undecided label
    merged_df['score'] = np.where(merged_df['score'] < 0.5, 0, 1) # put threshold to 0.5
    cr = classification_report(merged_df["label"].astype(int), merged_df["score"]) 
    cm = confusion_matrix(merged_df["label"].astype(int), merged_df["score"])
    
    return cr, cm, mAP_strict, mAP_relaxed


## **Trainer class**
- Define a trainer object to perform model's finetuning
  - Use training function to fine-tune the model on TR set
  - Use evaluate function to evaluate model behaviour on VL set during fine-tuning stes.
  - use fit function to runs all these mentioned procedures

In [8]:
import torch.optim as optim
from torch.optim.lr_scheduler import LinearLR
from  torch.utils.data import DataLoader
from tqdm import tqdm
from sklearn.utils import shuffle, class_weight



class Trainer():
  
  def __init__(self, model_name, model_checkpoint, train_data_dict, val_data_dict, param):
    
    # inizialize model and device
    self.model_name = model_name

    # get hypeparameter values
    self.epochs        = param["epochs"]
    self.batch_size    = param["batch_size"]
    self.learning_rate = param["learning_rate"]
    self.total_iters   = 5
    self.dropout       = param["drop_out"]
    self.n_out_unit_1  = param["unit_1"]

    if model_checkpoint == None:
      #  build model architecture
      self.model = Architecture(self.model_name, self.dropout, self.n_out_unit_1)
      self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    else:
      # load model from checkpoint 
      self.model = torch.load(model_checkpoint)
      self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    self.model.to(self.device) # pass model to gpu 

    # define TRAINING DATA 
    self.tr_arguments = train_data_dict["arguments_df"]
    self.tr_keypoints = train_data_dict["keypoints_df"]
    self.tr_labels    = train_data_dict["labels_df"]
    self.tr_merged    = train_data_dict["merged_dataset_df"]
    self.tr_dataset   = train_data_dict["tokenized_dataset_tensor"]

    # define VALIDATION DATA
    if val_data_dict != None: 
      self.vl_arguments = val_data_dict["arguments_df"]
      self.vl_keypoints = val_data_dict["keypoints_df"]
      self.vl_labels    = val_data_dict["labels_df"]
      self.vl_merged    = val_data_dict["merged_dataset_df"]
      self.vl_dataset   = val_data_dict["tokenized_dataset_tensor"]
      self.vl_merged_pred    = val_data_dict["preds"]
      self.vl_dataset_pred   = val_data_dict["tokenized_preds"]

    self.tr_dataloader = None  # define null dataloader object for training
    self.vl_dataloader = None  # define null dataloader object for validation
    self.vl_dataloader_pred = None  # define null dataloader object for validation also for undediced label
    
    self.loss_function = nn.BCELoss()
    self.optimizer = optim.AdamW(self.model.parameters(), 
                                 lr=self.learning_rate, 
                                 weight_decay = 0.01)
    self.scheduler = LinearLR(self.optimizer)

    self.tr_loss, self.tr_cr = None, None
    self.vl_loss, self.vl_cr, self.vl_map_strict, self.vl_map_relaxed = None, None, None, None

    self.history = {"tr_loss":[], "tr_cr":[],
                    "vl_loss":[], "vl_cr":[], "vl_map_strict":[], "vl_map_relaxed":[]}
      
    
  # ---------------------------------------- FINETUNING on TR set  ------------------------------
  def training(self):
    self.model.train()     # set model state in training mode
    print("Training started!!")
    running_loss = 0  # accumulate computed loss for each batch
    running_acc  = 0  # accumulate computed accuracy for each batch
    epoch_preds = []

    # iterate over batches
    for batch, dl in enumerate(self.tr_dataloader): 

      if self.model_name.startswith("bert-")==True:
      # define input features in the current batches and move it to the employed device
        ids, tti, mask, stance, label = dl  
        ids = ids.to(self.device)        # feature 1 = ids
        tti = tti.to(self.device)        # feature 2 = token types ids
        mask = mask.to(self.device)      # feature 3 = attention mask
        stance = stance.to(self.device)  # feature 4 = stance
        label = label.to(self.device)    # target = label
      else:
        ids, mask, stance, label = dl  
        ids = ids.to(self.device)        # feature 1 = ids
        mask = mask.to(self.device)      # feature 3 = attention mask
        stance = stance.to(self.device)  # feature 4 = stance
        label = label.to(self.device)  

      self.optimizer.zero_grad() #clear previously computed gradients
      
      # ---------- FORWARD ----------
      # compute model's output (this is matching score for each sample in the current batch)
      if self.model_name.startswith("bert-")==True:
        output = self.model (ids, mask, stance, tti)     
      else:
        output = self.model(ids, mask, stance) 
     
      # compute loss for the current batches
      label = torch.reshape(label, (label.shape[0], 1)).float() 
      loss = self.loss_function(output, label)
      
      # ---------- BACKWARD ----------
      loss.backward() # backpropagate loss
      # clip gradient to prevent exploding gradients
      torch.nn.utils.clip_grad_norm_(self.model.parameters(), 1.0) 
      self.optimizer.step() # update model parameter
      self.scheduler.step() # update learning rate value
      
      # ---------- METRICS ----------
      # accumulate loss of each batch
      running_loss+=loss.item()
      # accumulate predictions (matching score) of each batch
      pred = output.detach().cpu().numpy()
      pred = np.hstack(pred)
      epoch_preds.append(pred)

    # compute loss, classification report, map strict, map relaxed for the whole training epoch
    epoch_preds = np.concatenate(epoch_preds, axis=0)
    self.tr_merged["predictions"] = epoch_preds
    self.tr_cr = compute_metrics("train",
                                 self.tr_merged,
                                      "predictions_tr.p.",
                                      self.tr_labels,
                                      self.tr_arguments, 
                                      self.tr_keypoints)
    
    self.tr_loss = running_loss/len(self.tr_dataloader)
    
    return
  # ---------------------------------------- EVALUATION on VL set  ------------------------------
  def evaluation(self):

    epoch_preds = []
    epoch_preds_map = []
    
    self.model.eval()
    print("Evaluating...")
    running_loss = 0  # accumulate computed loss for each batch
   
    with torch.no_grad():
      
      # use dataset with only pairs havinng (0,1) label to compute vl loss
      for batch, dl in enumerate(self.vl_dataloader):
        
        if self.model_name.startswith("bert-")==True:
          ids, tti, mask, stance, label = dl
          ids = ids.to(self.device)
          tti = tti.to(self.device)
          mask = mask.to(self.device)
          stance = stance.to(self.device)
          label = label.to(self.device)

          output = self.model (ids, mask, stance, tti)
        else:
          ids, mask, stance, label = dl
          ids = ids.to(self.device)
          mask = mask.to(self.device)
          stance = stance.to(self.device)
          label = label.to(self.device)

          output = self.model (ids, mask, stance)
        
        label = torch.reshape(label, (label.shape[0], 1)).float() 
        loss = self.loss_function(output, label)
        running_loss+=loss.item()
        
        pred = output.detach().cpu().numpy()
        pred = np.hstack(pred)
        epoch_preds.append(pred)
      
      # this is NOT a good procedure but here we need it to not rewrite all code structures and still compute loss and others metrics
      # the use dataset with only pairs havinng (0,1, undecided) label to compute all others metrics
      for batch, dl in enumerate(self.vl_dataloader_pred):
        
        if self.model_name.startswith("bert-")==True:
          ids, tti, mask, stance = dl
          ids = ids.to(self.device)
          tti = tti.to(self.device)
          mask = mask.to(self.device)
          stance = stance.to(self.device)

          output = self.model (ids, mask, stance, tti)
        else:
          ids, mask, stance = dl
          ids = ids.to(self.device)
          mask = mask.to(self.device)
          stance = stance.to(self.device)

          output = self.model (ids, mask, stance)
        
        pred = output.detach().cpu().numpy()
        pred = np.hstack(pred)
        epoch_preds_map.append(pred)
    
    # compute map and classification report
    epoch_preds = np.concatenate(epoch_preds, axis=0)
    self.vl_merged["predictions"] = epoch_preds
    self.vl_cr = compute_metrics("train",
                                      self.vl_merged,
                                      "predictions_tr.p.",
                                      self.vl_labels,
                                      self.vl_arguments, 
                                      self.vl_keypoints)
    self.vl_loss = running_loss/len(self.vl_dataloader)
    epoch_preds_map = np.concatenate(epoch_preds_map, axis=0)
    self.vl_merged_pred["predictions"] = epoch_preds_map
    _, _, self.vl_map_strict, self.vl_map_relaxed= compute_metrics("test",
                                                                self.vl_merged_pred,
                                                                "predictions_vl.p.", 
                                                                self.vl_labels, 
                                                                self.vl_arguments, 
                                                                self.vl_keypoints)
    return
  
  # ------------------------------------- RUNS fine tuning cycle ----------------------------------------------
  def fit(self, retrain=False):
    
    # for each training epoch 
    for epoch in range(self.epochs):

      print("Epoch " + str(epoch+1) + "/" + str(self.epochs) + " started...")
      
      # shuffle TR set and create batches 
      self.tr_merged, self.tr_dataset = shuffle (self.tr_merged, self.tr_dataset)
      self.tr_dataloader = DataLoader(self.tr_dataset, batch_size = self.batch_size)

      if retrain==False:
        # if this is not the final retrain of the model

        # shuffle Vl set and create batches 
        self.vl_merged, self.vl_dataset = shuffle (self.vl_merged, self.vl_dataset)
        self.vl_dataloader = DataLoader(self.vl_dataset, batch_size = self.batch_size)

        # shuffle VL set with also undecided labels and create batches 
        self.vl_merged_pred, self.vl_dataset_pred = shuffle (self.vl_merged_pred, self.vl_dataset_pred)
        self.vl_dataloader_pred = DataLoader(self.vl_dataset_pred, batch_size = self.batch_size)
        
        self.training()     # train model on TR set
        self.evaluation()   # monitor model behaviour by evaluate them on VL set
        # note: we monitor model behaviour on both VL pairs with (0,1) labels as well as on VL pairs on (0,1,undecided) labels

        # return training and evaluation history 
        self.history["tr_loss"].append(self.tr_loss)
        self.history["tr_cr"].append(self.tr_cr)
        self.history["vl_loss"].append(self.vl_loss)
        self.history["vl_cr"].append(self.vl_cr)
        self.history["vl_map_strict"].append(self.vl_map_strict)
        self.history["vl_map_relaxed"].append(self.vl_map_relaxed)

      else:
        # if this is the final retrain of the model we don't have a VL set 
        #just retrain the model on TR+VL set 
        self.training()
        
        # return training history 
        self.history["tr_loss"].append(self.tr_loss)
        self.history["tr_cr"].append(self.tr_cr)
      print("Epoch " + str(epoch+1) + "/" + str(self.epochs) + " complete!!!")
    
    return self.model, self.history


## **Predictor class**

- Use the fully fine-tuned and validated model to get predictions over the test set
- Compute evalution metrics on test set 


In [9]:

class Predictor():
  
  def __init__(self, model_name, model_checkpoint, test_data_dict, batch_size):

    # load BEST MODEL from the given checkpoint
    # here the loaded model has been retrained on both VL e TR set
    self.model_name = model_name
    self.model  = torch.load(model_checkpoint)
    self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    self.model.to(self.device)
      
    # define TEST DATA (optional)
    self.ts_arguments = test_data_dict["arguments_df"]
    self.ts_keypoints = test_data_dict["keypoints_df"]
    self.ts_labels    = test_data_dict["labels_df"]
    self.ts_merged    = test_data_dict["preds"]
    self.ts_dataset   = test_data_dict["tokenized_preds"]

    self.ts_dataloader = None
    self.batch_size = batch_size

    self.ts_cr, self.ts_cm, self.ts_map_strict, self.ts_map_relaxed = None, None, None, None
    self.history = {"ts_cr":[],"ts_cm":[], "ts_map_strict":[], "ts_map_relaxed":[]}
  
  # -------------------- Get prediction on test set and evaluate model on test set --------------------------------------------------

  def predict(self):
    
    # divide data in batched
    self.ts_dataloader = DataLoader(self.ts_dataset, batch_size = self.batch_size)

    preds = []
    
    self.model.eval() # set evaluation mode
   
    with torch.no_grad(): # no gradient computation is required
      
      #for each batch
      for batch, dl in enumerate(self.ts_dataloader):
        
        #compute model output (i.e., matching score)
        if self.model_name.startswith("bert-")==True:
          ids, tti, mask, stance = dl
          ids = ids.to(self.device)
          tti = tti.to(self.device)
          mask = mask.to(self.device)
          stance = stance.to(self.device)
          output = self.model (ids, mask, stance, tti)

        else:
          ids, mask, stance = dl
          ids = ids.to(self.device)
          mask = mask.to(self.device)
          stance = stance.to(self.device)
          output = self.model (ids, mask, stance)

        pred = output.detach().cpu().numpy()
        pred = np.hstack(pred)
        preds.append(pred)

    # save all predictions
    preds = np.concatenate(preds, axis=0)
    self.ts_merged["predictions"] = preds
    # compute metrics to evaluate predictions
    self.ts_cr, self.ts_cm, self.ts_map_strict, self.ts_map_relaxed = compute_metrics("test",
                                                                                      self.ts_merged,
                                                                                      "predictions_ts.p.", 
                                                                                      self.ts_labels, 
                                                                                      self.ts_arguments, 
                                                                                      self.ts_keypoints)
    # return all required info
    self.history["ts_cr"].append(self.ts_cr)
    self.history["ts_cm"].append(self.ts_cm)
    self.history["ts_map_strict"].append(self.ts_map_strict)
    self.history["ts_map_relaxed"].append(self.ts_map_relaxed)

    return self.history

## **Validator class**

- Get all possible hyperparameter configiration you want to explore
- Run a model selection procedure with hold out validation technique
  - Choose the best model based on the validation loss
  - Retrain the best model on the full dataset (TR+ VL)
  - Store TR e VL peerformance
- Run a model assestent procedure with hold out validation technique
  - Get predictions on TS set
  - Store TS performance

In [10]:
from google.colab import files
import numpy as np



class Validator():
  
  def __init__(self, tr_data, vl_data, ts_data, hyperparam, model_name):
    
    self.hyperparams = hyperparam  # dictionary for storing for each  hyperparameter the list of all possible values of the hyperparameter
    self.config = {}                # dictionary for storing one possible configuration of hyperparameter values
    
    # get TR, VL e TS set
    self.tr_dict = tr_data
    self.vl_dict = vl_data
    self.ts_dict = ts_data

    # define employed transformer
    self.model_name = model_name
    self.best_param = None
    self.checkpoint = None

    self.full_dict = {}


  # ---------------------- Get all possible configuration of hyperparameters values --------------------------------------------

  def explodeCombination(self):

    # transform hyperparam dictionary to a list of all possible combinations of hyperparameter's values
    mesh = np.array(np.meshgrid(*self.hyperparams.values()))
    self.hyperparams = mesh.T.reshape(-1, len(self.hyperparams))
    
    return 
  
  # ---------------------- Get all possible configuration of hyperparameters values --------------------------------------------

  def get_full_dict(self):

    # crate a full dataset by concatenating TR and VL set: use this full dataset for final model retraining

    self.full_dict["arguments_df"] = pd.concat([self.tr_dict["arguments_df"],self.vl_dict["arguments_df"]])
    self.full_dict["keypoints_df"] = pd.concat([self.tr_dict["keypoints_df"],self.vl_dict["keypoints_df"]])
    self.full_dict["labels_df"] = pd.concat([self.tr_dict["labels_df"],self.vl_dict["labels_df"]])
    self.full_dict["merged_dataset_df"] = pd.concat([self.tr_dict["merged_dataset_df"], self.vl_dict["merged_dataset_df"]])
    self.full_dict["tokenized_dataset_tensor"] = torch.utils.data.ConcatDataset([self.tr_dict["tokenized_dataset_tensor"], self.vl_dict["tokenized_dataset_tensor"]])

    return 

  # ---------------------- MODEL SELECTION: Hold-Out Validation technique ------------------------------------------------------

  def modelSelection(self):

    self.explodeCombination()
    min_loss = float("inf")
    # define a list to store loss value of each model that will be trained
    all_model_loss = []

    i = 0
    # try out each one of the possible hyperparameter configuration 
    for configuration in self.hyperparams:
      
      # save current hyperparameter configuration and use it to train a model 
      self.config["epochs"]        = int(configuration[0])
      self.config["learning_rate"] = configuration[1]
      self.config["batch_size"]    = int(configuration[2])
      self.config["drop_out"] = configuration[3]
      self.config["unit_1"] = int(configuration[4])

      #  create a trainer object and start training the model on TR set
      model_trainer = Trainer(model_name = self.model_name, 
                              model_checkpoint = None,
                              train_data_dict = self.tr_dict, 
                              val_data_dict = self.vl_dict, 
                              param = self.config)
      model, history = model_trainer.fit()
      #if history["vl_loss"][-1] <= min_loss:
      #torch.save(model, str(self.model_name)+str(self.config)+"best_model.pt")
      all_model_loss.append(history["vl_loss"][-1])

      with open("model_selection_result.txt", "a") as file_result:
        file_result.write(str(self.config))
        file_result.write(json.dumps(history))
        file_result.write("\n")
        file_result.close()
      
      #if i%3 ==0:
        #files.download("/content/"+str(self.model_name)+"model_selection_result.txt")
      #i=i+1
    
    best_model_loss = min(all_model_loss)
    best_loss_idx = [idx for idx, val in enumerate(all_model_loss) if val==best_model_loss]
    self.best_param = self.hyperparams[best_loss_idx]

    # fine tuning the  best model

    self.config["epochs"] = int(self.best_param[0][0])
    self.config["learning_rate"] = self.best_param[0][1]
    self.config["batch_size"] = int(self.best_param[0][2])
    self.config["drop_out"] = self.best_param[0][3]
    self.config["unit_1"] = int(self.best_param[0][4])
    
    #self.checkpoint = str(self.model_name) + str(self.config) + "best_model.pt"

    return 
  
  # ---------------------- MODEL SELECTION: Hold-Out Validation technique ------------------------------------------------------

  def modelAssestment(self):
    
    # get full dataset and use it
    self.get_full_dict()

    # retrain model on full dataset 
    model_trainer = Trainer( model_name = self.model_name, 
                             model_checkpoint = None,
                             train_data_dict = self.full_dict, 
                             val_data_dict = None,
                             param = self.config)
    model, history_retrain = model_trainer.fit(retrain = True)
    
    # save performance obtained on full dataset durinng retrain
    with open("DEBERTA_retrain_results.txt", "a") as file_result:
      file_result.write(str(self.config))
      file_result.write(json.dumps(history_retrain))
      file_result.close()
    
    # save the retrained final model in a checkpoint 
    torch.save(model, "/content/"+"deberta"+str(self.config)+"_final_model.pt")
    self.checkpoint = "deberta" + str(self.config)+"_final_model.pt" 

    # use predictor class to get predictionn on test set 
    model_predictor = Predictor(self.model_name, self.checkpoint, self.ts_dict, batch_size = int(self.best_param[0][2]))
    history_test = model_predictor.predict()
    print(history_test)
    
    # save info about model performance on test set 
    with open("DEBERTA_test_results.txt", "a") as file_result:
      file_result.write(str(self.config))
      file_result.write(str(history_test))
      file_result.close()
    
    
    #files.download("/content/DEBERTA_retrain_results.txt")
    #files.download("/content/DEBERTA_test_results.txt")
    
    return 




## **Main**


In [11]:
# -------------------- Choose the Hugging face tokenizer and model you want to use for the task -----------------------------

# possible options: "bert-base-uncased", "roberta-base", "albert-base-v2"
used_tokenizer = "microsoft/deberta-base"
used_model     = "microsoft/deberta-base"

# -------------------- Prepare all dataset needed to solve the task: use DatasetParser class ---------------------------------

dataset_directory = "/content/KPA_2021_shared_task/kpm_data"  # directory for dataset used for training, finetuning and development
testset_directory = "/content/KPA_2021_shared_task/test_data" # directory for dataset used for testing

dataset_parser = DatasetParser(tokenizer_name = used_tokenizer)
tr_data_dict = dataset_parser.get_data(data_directory = dataset_directory, mode = "train")
vl_data_dict = dataset_parser.get_data(data_directory = dataset_directory, mode = "dev")
ts_data_dict = dataset_parser.get_data(data_directory = testset_directory, mode="test")


Downloading tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/474 [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

train data has been loaded
train data has been preprocessed
train dataset has been created
train dataset has been tokenized and transformed to tensors
train predictions file has been created
train tokenized predictions file has been created

dev data has been loaded
dev data has been preprocessed
dev dataset has been created
dev dataset has been tokenized and transformed to tensors
dev predictions file has been created
dev tokenized predictions file has been created

test data has been loaded
test data has been preprocessed
test dataset has been created
test dataset has been tokenized and transformed to tensors
test predictions file has been created
test tokenized predictions file has been created



In [13]:
# define hyperparameter value you want to use ... 
# (also more values for each hyperparameter if your exploring hypeparameters space)
hyperparameter_value = {"epoch":[2], 
                        "lr":[1e-05],
                        "batch_size":[32], 
                        "drop_out":[0.2], 
                        "unit_1":[10]}

# -------------------- Run a validation process --------------------------------------------------

validator = Validator(tr_data = tr_data_dict,
                      vl_data = vl_data_dict,
                      ts_data = ts_data_dict, 
                      hyperparam = hyperparameter_value, 
                      model_name = used_model)

validator.modelSelection()    # runs model selection
validator.modelAssestment()   # runs model assestments


Downloading pytorch_model.bin:   0%|          | 0.00/533M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaModel: ['lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.bias']
- This IS expected if you are initializing DebertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch 1/2 started...
Training started!!
Evaluating...
Epoch 1/2 complete!!!
Epoch 2/2 started...
Training started!!
Evaluating...
Epoch 2/2 complete!!!


Some weights of the model checkpoint at microsoft/deberta-base were not used when initializing DebertaModel: ['lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.LayerNorm.weight', 'lm_predictions.lm_head.bias', 'lm_predictions.lm_head.LayerNorm.bias', 'lm_predictions.lm_head.dense.bias']
- This IS expected if you are initializing DebertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Epoch 1/2 started...
Training started!!
Epoch 1/2 complete!!!
Epoch 2/2 started...
Training started!!
Epoch 2/2 complete!!!
{'ts_cr': ['              precision    recall  f1-score   support\n\n           0       0.83      0.62      0.71       170\n           1       0.86      0.95      0.90       422\n\n    accuracy                           0.85       592\n   macro avg       0.84      0.78      0.80       592\nweighted avg       0.85      0.85      0.85       592\n'], 'ts_cm': [array([[105,  65],
       [ 22, 400]])], 'ts_map_strict': [0.7347296941803435], 'ts_map_relaxed': [0.8929432314475476]}
