In this notebook, we will finetune the transformer model 'RoBERTa' on the Ethos dataset and keep track of the performance of the model for a number of 10 to 15 epochs in total. We will also examine the performance of different interpretability techiniques on RoBERTa. Lime was not included in the code.

In [None]:
#We first need to conect to our drive, in order to access the projects files and store results
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import sys
sys.path.append('/content/drive/MyDrive/Thesis')

In [None]:
#Now, it is time to install the appropriate version of the transformers library
!pip install transformers-interpret==0.5.2
!pip install transformers==4.15.0
!pip install lime==0.2.0.1 #this line is included in order for 'myExplainers.py' to load properly

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers-interpret==0.5.2
  Downloading transformers-interpret-0.5.2.tar.gz (29 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers>=3.0.0 (from transformers-interpret==0.5.2)
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m58.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting captum>=0.3.1 (from transformers-interpret==0.5.2)
  Downloading captum-0.6.0-py3-none-any.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m76.2 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers>=3.0.0->transformers-interpret==0.5.2)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m23.6 MB/s[0m e

In [None]:
#Imports of libraries required for finetuning and explaining RoBERTa
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score, precision_score, recall_score, average_precision_score
from sklearn.model_selection import train_test_split
from helper import print_results, print_results_ap
from sklearn.preprocessing import maxabs_scale
from myModel import MyModel, MyDataset
from myEvaluation import MyEvaluation
from myExplainers import MyExplainer
from dataset import Dataset
import tensorflow as tf
from tqdm import tqdm
import pandas as pd
import numpy as np
import warnings
import datetime
import pickle
import torch
import time
import csv
import re

In [None]:
#defining the paths of the model and data
data_path = '/content/drive/MyDrive/Thesis/'
model_path = '/content/drive/MyDrive/Thesis/'
save_path = '/content/drive/MyDrive/Thesis/Results/'

Now, it is time to name the model and to define the parameters of 'MyModel'
class that loads transformer models.

In [None]:
model_name = 'roberta'
existing_rationales = False #no explanations
task = 'multi_label' #multi-labeld ethos
labels = 8 #violence, directed_vs_generalized, gender, race, national_origin, disability, religion, sexual_orientation

Now, let us load the Ethos dataset, through the 'dataset.py' file and the 'load_ethos' function. X: are the instances, y: are the labels and label names: are the names of the labels(including 'hate speech'). The 'Dataset' class of 'dataset.py' is utilized.

In [None]:
hs = Dataset(path = data_path) #Dataset class is in 'dataset.py': parameters (path, x=None, y=None, rationales=None ,label_names=None)
x, y, label_names = hs.load_ethos() #function in Dataset class to load ethos dataset
label_names = label_names[1:] #Ethos multiclass labels(without 'hate speech')

In [None]:
indices = np.arange(len(y)) #len(y) -> 433

#at first train instances is 80% of the data
train_texts, test_texts, train_labels, test_labels, _, test_indexes = train_test_split(x, y, indices, test_size=.2, random_state=26) #reproducible results
#test size -> 20% of all data

#in our case there are no rationales in RoBERTa
if existing_rationales:
    test_rationales = [rationales[x] for x in test_indexes]

#We also need a validation dataset:
size = (0.1 * len(y)) / len(train_labels) #len(train_labels) -> 346
#43.3/346 -> 0.12 size
train_texts, validation_texts, train_labels, validation_labels = train_test_split(train_texts, train_labels, test_size=size, random_state=42)

Now the dataset is not in the appropriate form for the transformer to process. It is necessary to define the tokenizer of the model, so as to call 'myDataset' class in 'myModel.py'.

In [None]:
from transformers import RobertaTokenizerFast

#unlike BERT and Distilbert, RoBERTa does not contain 'cs'
tokenizer = RobertaTokenizerFast.from_pretrained('roberta-base')

Now, it is time to transform the train, test and validation sets to the
appropriate form. We will use 'MyDataset' class from 'myModel.py'.

In [None]:
train_dataset = MyDataset(train_texts, train_labels, tokenizer)
validation_dataset = MyDataset(validation_texts, validation_labels, tokenizer)
#test_dataset = MyDataset(test_texts, test_labels, tokenizer)

But before using 'MyModel' class from 'myModel.py', RoBERTa should be finetuned!

In [None]:
from transformers import Trainer, TrainingArguments
from myTransformer import RobertaForMultilabelSequenceClassification as transformer_model


#calling the base pretrained RoBERTa model
model = transformer_model.from_pretrained('roberta-base',num_labels = len(label_names), output_attentions=True,
                              output_hidden_states=True)

#the training arguments that we will pass to the trainer of the transformers. 15 epochs were used for training
training_arguments = TrainingArguments(evaluation_strategy='epoch', save_strategy='epoch', logging_strategy='epoch',
                                                log_level='critical', output_dir='./results', num_train_epochs=15,
                                                per_device_train_batch_size=8, per_device_eval_batch_size=8,
                                                warmup_steps=200, weight_decay=0.01, logging_dir='./logs')

#passing to the trainer the model, the arguments and all train and validation instances
trainer = Trainer(model=model, args=training_arguments, train_dataset=train_dataset, eval_dataset=validation_dataset)

#Let's train the model!
trainer.train()

Downloading:   0%|          | 0.00/478M [00:00<?, ?B/s]

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForMultilabelSequenceClassification: ['roberta.pooler.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForMultilabelSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized

Epoch,Training Loss,Validation Loss
1,0.6867,0.650396
2,0.5325,0.488342
3,0.4365,0.348541
4,0.3105,0.289782
5,0.2187,0.258373
6,0.158,0.238242
7,0.1137,0.251526
8,0.0859,0.254701
9,0.0691,0.236028
10,0.0533,0.26155


TrainOutput(global_step=570, training_loss=0.18982381506970053, metrics={'train_runtime': 463.2673, 'train_samples_per_second': 9.778, 'train_steps_per_second': 1.23, 'total_flos': 1191957289943040.0, 'train_loss': 0.18982381506970053, 'epoch': 15.0})

Now, a 'roberta_hs' folder will be created, containing the trained model. Now, it is time to make predictions. We will use 'MyModel' with the suitable parameters. It is worth noting that RoBERTa does not contain 'cased' or 'uncased' argument, but this parameter is passed in 'MyModel' anyway, because other transformers use it.

Now, it is time to save the model in 'roberta_hs' file.

In [None]:
trainer.model.save_pretrained('/content/drive/MyDrive/Thesis/roberta_hs')

Now, we can use 'MyModel' and make then make predictions.

In [None]:
#new model
model = MyModel(model_path,'roberta_hs', model_name, task, labels, 'cased')

#the maximum number of tokens a single sentence can have e.g. 512
max_sequence_len = model.tokenizer.max_len_single_sentence

#again the tokenizer is RobertaTokenizerFast, that is selected through 'MyModel' and '__load_model__' function
tokenizer = model.tokenizer

#gpu training
torch.cuda.is_available()
model.trainer.model.to('cuda')

RobertaForMultilabelSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
   

It is time to make predictions for the test instances.

In [None]:
predictions = []

#time for predictions
starting_prediction_time = time.time()

#make the predictions with the model that was trained
for test_instance in test_texts:
    outputs = model.my_predict(test_instance)
    predictions.append(outputs[0])

a = tf.constant(predictions, dtype = tf.float32)
b = tf.keras.activations.sigmoid(a)
predictions = b.numpy()

#printing the total time that predictions took
ending_prediction_time = time.time()
total_time = ending_prediction_time - starting_prediction_time
print('The total time for predictions is:' ,round(total_time,3),' seconds')

The total time for predictions is: 6.263  seconds


Let's print the precision and f1 score of RoBERTa's performance!

In [None]:
#labels of the predictions produced
pred_labels = []

for prediction in predictions:
    pred_labels.append([1 if i >= 0.5 else 0 for i in prediction]) #1 if the score for the label in the certain prediction is greater than or equal to 0.5

def average_precision_wrapper(y, y_pred, view):
    #predictions from list to array
    return average_precision_score(y, y_pred.toarray(), average=view)

#macro scores
p_s = f"Average precision score: {round(average_precision_score(test_labels, pred_labels, average='macro'),4)} %"
f1 = f"f1 score score: {round(f1_score(test_labels, pred_labels, average='macro'),4)} %"

#printing results
print(p_s)
print(f1)

Average precision score: 0.6756 %
f1 score score: 0.7909 %


We can also change the hyperparameters for training, but we notice that the performance of RoBERTa is already satisfactory and the focus should be shifted on the interpretations. Let's store the results in the 'Results' file.

In [None]:
#the data to write in the file
data = (p_s, f1)
now = datetime.datetime.now()
file_name = save_path + 'ROBERTA_'+str(now.day) + '_' + str(now.month) + '_' + str(now.year)

#results in files
with open(file_name+ 'PERFORMANCE.pickle', 'wb') as handle:
    pickle.dump(data, handle, protocol=pickle.HIGHEST_PROTOCOL) #data
    #pickle.dump(f1, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(file_name+'TIME.pickle', 'wb') as handle:
    pickle.dump(total_time, handle, protocol=pickle.HIGHEST_PROTOCOL)

Let's ensure that the results are properly loaded from the file that we stored them.

In [None]:
with open(file_name+'PERFORMANCE.pickle', 'rb') as handle:
     performance = pickle.load(handle)
     for score in performance:
         print(score)

with open(file_name+'TIME.pickle', 'rb') as handle:
     time = pickle.load(handle)
     print('The total time for predictions is:' ,round(time,3),' seconds')

Average precision score: 0.6756 %
f1 score score: 0.7909 %
The total time for predictions is: 6.263  seconds


Now, let us initialize the explainers and the evaluation module, as well as define the metrics that will be utilized. In this case, the following is true:
* F=Faithfulness
* FTP=RFT (Ranked Faithful Truthfulness)
* NZW=Complexity

In [None]:
#layers are 12 this time
my_explainers = MyExplainer(label_names, model, layers=12)

#complexity, faithfulness, RFT
my_evaluators = MyEvaluation(label_names, model.my_predict, False, True, tokenizer=tokenizer) #parameters: (label_names, predict, sentence_level, evaluation_level_all=True)
my_evaluatorsP = MyEvaluation(label_names, model.my_predict, False, False, tokenizer=tokenizer)

evaluation =  {'F':my_evaluators.faithfulness, 'FTP': my_evaluators.faithful_truthfulness_penalty,
          'NZW': my_evaluators.nzw}
evaluationP = {'F':my_evaluatorsP.faithfulness, 'FTP': my_evaluatorsP.faithful_truthfulness_penalty,
          'NZW': my_evaluatorsP.nzw}

We will now measure the performance of IG.

In [None]:
import time
with warnings.catch_warnings():

    #ignore the warnings
    warnings.simplefilter("ignore", category=RuntimeWarning)

    #date
    now = datetime.datetime.now()

    #saving results
    file_name = save_path + 'ETHOS_ROBERTA_IG_'+str(now.day) + '_' + str(now.month) + '_' + str(now.year)

    #metrics
    metrics = {'F':[], 'FTP':[], 'NZW':[]}
    metricsP = {'F':[], 'FTP':[], 'NZW':[]}

    #time_r = [[],[]]: sublists for each technique
    time_r = [ [] ] #now only ig is present

    #neighnbors
    #my_explainers.neighbours = 2000

    #ig
    techniques = [my_explainers.ig]

    #for each test instance
    for ind in tqdm(range(0,len(test_texts))): #progress bar

        #to not run out of memory
        torch.cuda.empty_cache()

        #the instance of test set
        instance = test_texts[ind]

        #reseting the state memory
        my_evaluators.clear_states()
        my_evaluatorsP.clear_states()

        #prediction, attention matrix and hidden states. Here we care about predictions
        prediction, _, _ = model.my_predict(instance)

        #RobetaTokenizerFast
        enc = model.tokenizer([instance,instance], truncation=True, padding=True)[0] #first element of output dict: input IDs

        #real tokens or padding: extracting the mask
        mask = enc.attention_mask

        #extract special tokens
        tokens = enc.tokens

        interpretations = []
        kk = 0

        #ig now. This piece of code did not change. because other techniques will be included later
        for technique in techniques:
            ts = time.time()

            #returns interpretations
            temp = technique(instance, prediction, tokens, enc.ids, _, _) #no attention and hidden states

            #normalization in interpretations
            interpretations.append([np.array(i)/np.max(abs(np.array(i))) for i in temp])

            #append the time it took
            time_r[kk].append(time.time()-ts)
            kk = kk + 1

        #'F','FTP','NZW'
        for metric in metrics.keys():
            evaluated = []
            for interpretation in interpretations:

                #all parameters: interpretation, tweaked_interpretation, instance, prediction, tokens, hidden_states, t_hidden_states, rationales
                evaluated.append(evaluation[metric](interpretation, _, instance, prediction, tokens, _, _, _))

            #save evaluations in dict
            metrics[metric].append(evaluated)

        #copy of saved state
        my_evaluatorsP.saved_state = my_evaluators.saved_state.copy()

        #clear again all states
        my_evaluators.clear_states()

        for metric in metrics.keys():
            evaluatedP = []
            for interpretation in interpretations:

                #in a similar way as 'evaluation'
                evaluatedP.append(evaluationP[metric](interpretation, _, instance, prediction, tokens, _, _, _))

            #save evaluations
            metricsP[metric].append(evaluatedP)

        #write results to files
        with open(file_name+'(A).pickle', 'wb') as handle:
            pickle.dump(metrics, handle, protocol=pickle.HIGHEST_PROTOCOL)
        with open(file_name+'(P).pickle', 'wb') as handle:
            pickle.dump(metricsP, handle, protocol=pickle.HIGHEST_PROTOCOL)
        with open(file_name+'_TIME.pickle', 'wb') as handle:
            pickle.dump(time_r, handle, protocol=pickle.HIGHEST_PROTOCOL)

time_r = np.array(time_r)
time_r.mean(axis=1)

100%|██████████| 87/87 [03:42<00:00,  2.55s/it]


array([1.74626522])

In [None]:
print(time_r)
print(time_r.mean(axis=1))

[[1.31893444 1.33769011 1.40460014 1.36466551 1.33518124 1.24277329
  1.33398581 1.75998497 1.24356627 1.77482724 1.85054612 1.79162192
  1.3898313  1.33179426 1.37649441 1.32211208 1.80220819 1.36521769
  2.40089607 2.31744409 1.42907691 1.77958608 1.57235146 1.34868598
  1.5073998  1.31763935 1.33617115 1.33087659 1.80887985 1.744807
  1.28214192 3.00801921 1.66237831 1.44230723 1.51115775 1.71406031
  1.31776309 1.54090023 4.43570399 1.55310607 1.32317424 1.26128888
  2.38433409 2.01465225 1.55646467 1.24797654 1.26046824 1.82316756
  1.52839518 1.33292866 1.38094306 1.34266639 1.35537744 1.51378632
  2.4644649  1.44261789 2.01437736 1.39239144 1.33971429 1.51242495
  1.50473189 1.5183568  3.13675046 1.51914215 1.51047182 1.49906969
  1.31907105 1.40593457 1.81319261 2.28925991 1.99164486 1.77896738
  1.84396577 1.94508815 1.31536436 3.07333517 2.00794172 1.24214888
  1.36410475 2.49327517 1.74712968 1.33194089 1.35196662 1.54331207
  8.88100004 1.29636335 2.00254226]]
[1.74626522]


Now, let us print the results for IG

In [None]:
print_results(file_name+'(A)', [' IG  '], metrics, label_names)

F
 IG    0.07733999937772751 | 0.14601 0.0758 0.08001 0.06485 0.05166 0.01702 0.10601 0.07732
FTP
 IG    0.11517 | 0.20215 0.10487 0.1306 0.07225 0.1353 0.03924 0.12895 0.10804
NZW
 IG    1.0 | 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0


  avg = a.mean(axis)
  ret = ret.dtype.type(ret / rcount)


In [None]:
print_results(file_name+'(P)', [' IG  '], metricsP, label_names)

F
 IG    0.33882 | 0.363 0.15866 0.3713 0.35689 0.39341 0.132 0.41996 0.51535
FTP
 IG    0.34684 | 0.35052 0.09373 0.39807 0.35734 0.41183 0.13708 0.46494 0.56119
NZW
 IG    1.0 | 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0


We will now experiment on various attention setups.

In [None]:
conf = []
#'Mean', 'Multi', 0, 1, 2, 3, 4, 5
for ci in ['Mean', 'Multi'] + list(range(12)):

    #'Mean', 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
    for ce in ['Mean'] + list(range(12)):

        # Matrix: From, To, MeanColumns, MeanRows, MaxColumns, MaxRows (rows?)
        for cp in ['From', 'To', 'MeanColumns', 'MaxColumns']:

            # Selection: True: select layers per head, False: do not
            for cl in [False]:
                conf.append([ci, ce, cp, cl])

len(conf) #8*13*4*1

728

In [None]:
import time
with warnings.catch_warnings():

    #ignore the warnings
    warnings.simplefilter("ignore", category=RuntimeWarning)

    #date
    now = datetime.datetime.now()

    #saving results
    file_name = save_path + 'ETHOS_ROBERTA_ATTENTION_'+str(now.day) + '_' + str(now.month) + '_' + str(now.year)

    #metrics
    metrics = {'FTP':[], 'F':[], 'NZW':[]}
    metricsP = {'FTP':[], 'F':[], 'NZW':[]}

    #times
    time_r = []
    time_b = []
    time_b2 = []

    #attentions setups
    for con in conf:
        time_r.append([])

    #for the first 10 instances
    for ind in tqdm(range(0,len(test_texts))):

        #to not run out of memory
        torch.cuda.empty_cache()

        #one instance
        instance = test_texts[ind]

        #clear states of evaluators
        my_evaluators.clear_states()
        my_evaluatorsP.clear_states()

        #save calculated configurations
        my_explainers.save_states = {}

        #prediction, attention matrix and hidden states. Here we care about predictions and attention.
        prediction, attention, _ = model.my_predict(instance)

        #RobertaTokenizerFast
        enc = model.tokenizer([instance,instance], truncation=True, padding=True)[0]

        #real tokens or padding: extracting the mask
        mask = enc.attention_mask

        #extract special tokens
        tokens = enc.tokens

        interpretations = []
        kk = 0
        for con in conf:

            #time
            ts = time.time()

            #set configuration
            my_explainers.config = con

            #returns interpretations
            temp = my_explainers.my_attention(instance, prediction, tokens, mask, attention, _) #no hidden states

            #scaling interpretations
            interpretations.append([maxabs_scale(i) for i in temp])

            #append time
            time_r[kk].append(time.time()-ts)
            kk = kk + 1

        #'F','FTP','NZW'
        for metric in metrics.keys():
            evaluated = []
            k = 0

            for interpretation in interpretations:
                tt = time.time()

                #all parameters: interpretation, tweaked_interpretation, instance, prediction, tokens, hidden_states, t_hidden_states, rationales
                evaluated.append(evaluation[metric](interpretation, _, instance, prediction, tokens, _, _, _))
                k = k + (time.time()-tt) #time
            if metric == 'FTP':
                time_b.append(k)
            metrics[metric].append(evaluated)

        my_evaluatorsP.saved_state = my_evaluators.saved_state.copy()

        for metricP in metricsP.keys():
            evaluated = []
            k = 0

            for interpretation in interpretations:
                tt = time.time()

                #all parameters: interpretation, tweaked_interpretation, instance, prediction, tokens, hidden_states, t_hidden_states, rationales
                evaluated.append(evaluationP[metricP](interpretation, _, instance, prediction, tokens, _, _, _))
                k = k + (time.time()-tt)

            if metricP == 'FTP':
                time_b2.append(k)
            metricsP[metricP].append(evaluated)

        if(ind != 0):
            with open(file_name+' (A).pickle', 'rb') as handle:
                old_metrics = pickle.load(handle)
            with open(file_name+' (P).pickle', 'rb') as handle:
                old_metricsP = pickle.load(handle)

            #append new results
            for key in metrics.keys():
                old_metrics[key].append(metrics[key][0])
                old_metricsP[key].append(metricsP[key][0])
        else:
            old_metrics = metrics
            old_metricsP = metricsP

        #save metrics as below
        with open(file_name+' (A).pickle', 'wb') as handle:
            pickle.dump(old_metrics, handle, protocol=pickle.HIGHEST_PROTOCOL)
        with open(file_name+' (P).pickle', 'wb') as handle:
            pickle.dump(old_metricsP, handle, protocol=pickle.HIGHEST_PROTOCOL)
        with open(file_name+'_TIME.pickle', 'wb') as handle:
            pickle.dump(time_r, handle, protocol=pickle.HIGHEST_PROTOCOL)

        del old_metrics,old_metricsP
        metrics = {'FTP':[], 'F':[], 'NZW':[]}
        metricsP = {'FTP':[], 'F':[], 'NZW':[]}

#times
time_r = np.array(time_r)
time_r.mean(axis=1).min(),time_r.mean(axis=1).max(), time_r.mean(axis=1).mean(), time_r.sum(axis=1).mean(), np.mean(time_b), np.mean(time_b2)

100%|██████████| 87/87 [1:52:14<00:00, 77.41s/it] 


(0.0018719804698023304,
 0.006683179701881847,
 0.0020431114580198506,
 0.177750696847727,
 55.99537901494695,
 11.694893620480066)

In [None]:
print(time_r)
print(time_r.mean(axis=1).min())
time_r.mean(axis=1).max()
time_r.sum(axis=1).mean()
print(time_b)
np.mean(time_b)
print(time_b2)
np.mean(time_b2)

[[0.0031631  0.00331402 0.00285316 ... 0.00708461 0.00264668 0.00271893]
 [0.00210071 0.00281692 0.00202346 ... 0.00413775 0.00185966 0.00203514]
 [0.00208426 0.0026989  0.00206923 ... 0.00427437 0.00177526 0.00213432]
 ...
 [0.00177693 0.00177336 0.00189734 ... 0.0045774  0.00176907 0.00179148]
 [0.00174618 0.00184512 0.00189376 ... 0.0049715  0.00174689 0.00182104]
 [0.00172853 0.00188494 0.00186753 ... 0.00406647 0.00177789 0.00181484]]
0.0018719804698023304
[18.000608921051025, 38.912978410720825, 41.171518087387085, 24.331618547439575, 34.26501727104187, 17.35102343559265, 36.63894701004028, 59.53835964202881, 23.71956181526184, 58.41548752784729, 62.89147472381592, 63.50250458717346, 34.04198455810547, 31.689624786376953, 47.89787244796753, 21.88539457321167, 58.87467360496521, 34.56255745887756, 100.50023698806763, 91.07251024246216, 27.273457765579224, 58.74209427833557, 53.275230169296265, 31.316423892974854, 43.74420404434204, 29.74014902114868, 36.26983332633972, 14.84141087

11.694893620480066

In [None]:
#print_results(file_name+' (A)', conf, metrics, label_names)

with open(file_name+' (A).pickle', 'rb') as handle:
    metrics = pickle.load(handle)

In [None]:
#print_results(file_name+' (P)', conf, metricsP, label_names)

with open(file_name+' (P).pickle', 'rb') as handle:
    metricsP = pickle.load(handle)

We calculate the best attention setup using Optimus variations (we do not use the Optimus implementation at this step).

In [None]:
print_results_ap(metrics, label_names, conf)

  avg = a.mean(axis)
  ret = ret.dtype.type(ret / rcount)


Baseline: 0.0137514943300328  and NZW: 1.0
Max Across: 0.03333355988806653  and NZW: 1.0
Per Label Per Instance: 0.12645743950058522  and NZW:  0.9926476899537244
Per Instance: 0.06521929940873479  and NZW:  1.0


In [None]:
print_results_ap(metricsP, label_names, conf)

Baseline: 0.3155989571335593  and NZW: 1.0
Max Across: 0.39349852286994225  and NZW: 1.0
Per Label Per Instance: 0.5690868685800059  and NZW:  0.9891859774212716


  return _methods._mean(a, axis=axis, dtype=dtype,


Per Instance: 0.47384973908107675  and NZW:  1.0


We repeat the process with Attention Scores with negative values (A*), thus by skipping the Softmax function. In the attention setups, we exclude the multiplication option in heads and layers, as a few combinations reach +/-inf.

In [None]:
conf = []
for ci in ['Mean'] + list(range(12)):
    for ce in ['Mean'] + list(range(12)):
        for cp in ['From', 'To', 'MeanColumns', 'MaxColumns']: # Matrix: From, To, MeanColumns, MeanRows, MaxColumns, MaxRows
            for cl in [False]: # Selection: True: select layers per head, False: do not
                conf.append([ci, ce, cp, cl])
len(conf)

676

In [None]:
import time
import math
with warnings.catch_warnings():

    warnings.simplefilter("ignore", category=RuntimeWarning)

    now = datetime.datetime.now()

    file_name = save_path + 'ETHOS_ROBERTA_A_ATTENTION_NO_SOFTMAX_'+str(now.day) + '_' + str(now.month) + '_' + str(now.year)

    metrics = {'FTP':[], 'F':[], 'NZW':[]}
    metricsP = {'FTP':[], 'F':[], 'NZW':[]}

    time_r = []
    time_b = []
    time_b2 = []

    for con in conf:
        time_r.append([])

    for ind in tqdm(range(0,len(test_texts))):
        torch.cuda.empty_cache()

        instance = test_texts[ind]

        my_evaluators.clear_states()
        my_evaluatorsP.clear_states()

        my_explainers.save_states = {}

        prediction, _, hidden_states = model.my_predict(instance)

        enc = model.tokenizer([instance,instance], truncation=True, padding=True)[0]

        mask = enc.attention_mask

        tokens = enc.tokens

        attention = []

        for la in range(12):
            our_new_layer = []
            bob = model.trainer.model.base_model.encoder.layer[la].attention
            has = hidden_states[la]
            aaa = bob.self.key(torch.tensor(has).to('cuda'))
            bbb = bob.self.query(torch.tensor(has).to('cuda'))
            for he in range(12):
                attention_scores = torch.matmul(bbb[:,he*64:(he+1)*64], aaa[:,he*64:(he+1)*64].transpose(-1, -2))
                attention_scores = attention_scores / math.sqrt(64)
                our_new_layer.append(attention_scores.cpu().detach().numpy())
            attention.append(our_new_layer)
        attention = np.array(attention)

        interpretations = []
        kk = 0
        for con in conf:
            ts = time.time()
            my_explainers.config = con
            temp = my_explainers.my_attention(instance, prediction, tokens, mask, attention, _)
            interpretations.append([maxabs_scale(i) for i in temp])
            time_r[kk].append(time.time()-ts)
            kk = kk + 1
        for metric in metrics.keys():
            evaluated = []
            k = 0
            for interpretation in interpretations:
                tt = time.time()
                evaluated.append(evaluation[metric](interpretation, _, instance, prediction, tokens, _, _, _))
                k = k + (time.time()-tt)
            if metric == 'FTP':
                time_b.append(k)
            metrics[metric].append(evaluated)
        my_evaluatorsP.saved_state = my_evaluators.saved_state.copy()
        for metric in metrics.keys():
            evaluated = []
            k = 0
            for interpretation in interpretations:
                tt = time.time()
                evaluated.append(evaluationP[metric](interpretation, _, instance, prediction, tokens, _, _, _))
                k = k + (time.time()-tt)
            if metric == 'FTP':
                time_b2.append(k)
            metricsP[metric].append(evaluated)
        with open(file_name+' (A).pickle', 'wb') as handle:
            pickle.dump(metrics, handle, protocol=pickle.HIGHEST_PROTOCOL)
        with open(file_name+' (P).pickle', 'wb') as handle:
            pickle.dump(metricsP, handle, protocol=pickle.HIGHEST_PROTOCOL)
        with open(file_name+'_TIME.pickle', 'wb') as handle:
            pickle.dump(time_r, handle, protocol=pickle.HIGHEST_PROTOCOL)
time_r = np.array(time_r)
time_r.mean(axis=1).min(),time_r.mean(axis=1).max(), time_r.mean(axis=1).mean(), time_r.sum(axis=1).mean(), np.mean(time_b), np.mean(time_b2)

 98%|█████████▊| 85/87 [1:40:48<06:31, 195.66s/it]

 99%|█████████▉| 86/87 [1:41:14<02:24, 144.90s/it]

100%|██████████| 87/87 [1:42:39<00:00, 70.80s/it] 


(0.0019015142287331067,
 0.003141545701300961,
 0.0020123858934420757,
 0.1750775727294606,
 51.73081672876731,
 10.793078184127808)

In [None]:
print_results(file_name+' (A)', conf, metrics, label_names)

  avg = a.mean(axis)
  ret = ret.dtype.type(ret / rcount)


FTP
['Mean', 'Mean', 'From', False]  0.01618 | -0.01241 -0.00865 0.00356 0.04074 -0.01405 -0.01213 0.05452 0.0779
['Mean', 'Mean', 'To', False]  0.01049 | -0.01482 -0.00577 -0.02024 0.03141 -0.02658 -0.0042 0.07259 0.05155
['Mean', 'Mean', 'MeanColumns', False]  0.0206 | 0.00046 0.04515 0.01817 0.03133 -0.0109 -0.01073 0.02365 0.06763
['Mean', 'Mean', 'MaxColumns', False]  0.02037 | 0.01205 0.08661 -0.00087 0.02095 -0.00902 0.00864 0.01559 0.029
['Mean', 0, 'From', False]  0.01288 | -0.02764 -0.02268 -0.00385 0.03976 0.00286 -0.01655 0.05329 0.07787
['Mean', 0, 'To', False]  0.01041 | -0.04313 -0.00979 -0.0205 0.05539 -0.02604 -0.01329 0.08 0.06067
['Mean', 0, 'MeanColumns', False]  0.02236 | 0.05018 0.02474 0.00959 0.02865 0.0108 -0.00464 0.00732 0.05222
['Mean', 0, 'MaxColumns', False]  0.01921 | 0.02297 0.03668 0.00195 0.03504 0.00991 0.00019 0.00829 0.03866
['Mean', 1, 'From', False]  0.01551 | -0.02743 -0.00293 -0.01599 0.04251 -0.00725 -0.01583 0.07914 0.07189
['Mean', 1, 'To', F

In [None]:
print_results(file_name+' (P)', conf, metricsP, label_names)

FTP
['Mean', 'Mean', 'From', False]  0.30974 | 0.1631 0.23266 0.23748 0.29964 0.48795 0.17011 0.26355 0.62341
['Mean', 'Mean', 'To', False]  0.25168 | 0.14237 0.12252 0.10843 0.21728 0.45916 0.17018 0.33696 0.45653
['Mean', 'Mean', 'MeanColumns', False]  0.28332 | 0.18717 0.29783 0.24088 0.25124 0.46858 0.13381 0.13584 0.55125
['Mean', 'Mean', 'MaxColumns', False]  0.20608 | 0.15659 0.29082 0.10312 0.17844 0.36759 0.16795 0.09577 0.28838
['Mean', 0, 'From', False]  0.31536 | 0.15039 0.20529 0.22588 0.29032 0.56005 0.17177 0.2594 0.65979
['Mean', 0, 'To', False]  0.29258 | 0.08088 0.159 0.2222 0.35735 0.43073 0.20673 0.34499 0.53875
['Mean', 0, 'MeanColumns', False]  0.25979 | 0.29211 0.23499 0.1892 0.22406 0.50011 0.11916 0.06939 0.44932
['Mean', 0, 'MaxColumns', False]  0.25106 | 0.20705 0.22391 0.14583 0.25185 0.57341 0.1409 0.10579 0.35971
['Mean', 1, 'From', False]  0.30868 | 0.1359 0.19666 0.21222 0.30176 0.48444 0.17391 0.35987 0.60465
['Mean', 1, 'To', False]  0.29191 | 0.15665 

We calculate the best attention setup using Optimus variations (we do not use the Optimus implementation script at this step).

In [None]:
print_results_ap(metrics, label_names, conf)

Baseline: 0.016184553026464526  and NZW: 1.0
Max Across: 0.03431235402967694  and NZW: 1.0
Per Label Per Instance: 0.20042428202728987  and NZW:  1.0
Per Instance: 0.08858631179198989  and NZW:  1.0


In [None]:
print_results_ap(metricsP, label_names, conf)

Baseline: 0.30973743971886836  and NZW: 1.0
Max Across: 0.38302775019284174  and NZW: 1.0
Per Label Per Instance: 0.58029645110402  and NZW:  1.0


  return _methods._mean(a, axis=axis, dtype=dtype,


Per Instance: 0.47833335912774183  and NZW:  1.0
