## Code to train the BERT-NLI Outgroup Hostility model
### **❗This notebook was copied from the Demo Notebook for Less Annotating, More Classifying. Most helpful comments in the code came from the Demo Notebook. The credit for the code and most comments belongs to the authors of this paper: https://doi.org/10.1017/pan.2023.20.❗**

Read more about the BERT-NLI approach used here: ["Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI"](https://github.com/MoritzLaurer/less-annotating-with-bert-nli) by Moritz Laurer, Wouter van Atteveldt, Andreu Casas, Kasper Welbers.


## Activate a GPU runtime

In order to run this notebook on a GPU, click on "Runtime" > "Change runtime type" > select "GPU" in the menue bar in to top left. Training a Transformer is much faster on a GPU. Given Google's usage limits for GPUs, it is advisable to first test your non-training code on a CPU (Hardware accelerator "None" instead of GPU) and only use the GPU once you know that everything is working.

## Install relevant packages

In [None]:
!pip install transformers[sentencepiece]==4.23
!pip install datasets==2.6
!pip install optuna==3.0




In [None]:
## Load general packages
# some more specialised packages are loaded in each sub section
import pandas as pd
import numpy as np
from google.colab.data_table import DataTable

In [None]:
df3 = pd.read_csv("/content/combined_utterances_2_final_csv - combined_utterances_2_final_csv (1).csv") #upload the file locally and set the path on the left
df3.head()

Unnamed: 0,id,subreddit,thread_title,convo,random_speaker,text,Parasocial_Language
0,dywzb12,NinjasHyper,Name of the song,Dj1312: Hi !\n\nCan someone say me the name of...,Dj1312,9,0
1,e2q0ijd,NinjasHyper,"So sad that ninja died if ligma, rip ninja he ...",DestroyerTheGuy: Wait is he dead? mdheintz21: ...,mdheintz21,Ya ligma got him so sad,1
2,duf5mqw,NinjasHyper,Raising 100k For Charity!! - Fortnite Battle R...,Potato_Private: https://www.twitch.tv/00flour,Potato_Private,https://www.twitch.tv/00flour\n,0
3,9jqg7i,NinjasHyper,5 ads in one YouTube video?,Fartmaster50000: I used to love watching uploa...,Fartmaster50000,I used to love watching uploads of Ninja's gam...,1
4,dxgam05,NinjasHyper,Leviathan Solo Squads! - Fortnite Battle Royal...,Ricmaniac: i love the way ninja plays... but t...,Ricmaniac,i love the way ninja plays... but the sellout ...,1


In [None]:
df3.head()# set random seed for reproducibility
SEED_GLOBAL = 42
np.random.seed(SEED_GLOBAL)

## Download and prepare data

In [None]:
df_train = pd.read_csv("/content/combined_utterances_2_final_csv - combined_utterances_2_final_csv (1).csv")

In [None]:
df_train.columns

Index(['id', 'subreddit', 'thread_title', 'convo', 'random_speaker', 'text',
       'Parasocial_Language'],
      dtype='object')

In [None]:
from sklearn.model_selection import train_test_split
sample_size = 4000
df_train, df_test = train_test_split(df_train[:sample_size], test_size=.2, random_state=SEED_GLOBAL,stratify=df_train['Parasocial_Language'][:sample_size])
#note: because here we strarified the data by the training label the solidarity and hostility test sets are not the same
print("Length of training and test sets after sampling: ", len(df_train), " (train) ", len(df_test), " (test).")

Length of training and test sets after sampling:  3200  (train)  800  (test).


In [None]:
df_train['label_text'] = df_train.Parasocial_Language.apply(lambda x: 'Parasocial Language' if x else "Other")
df_test['label_text'] = df_test.Parasocial_Language.apply(lambda x: 'Parasocial Language' if x else "Other")
df_train['label'] = df_train.Parasocial_Language.apply(lambda x: 1 if x else 0)
df_test['label'] = df_test.Parasocial_Language.apply(lambda x: 1 if x else 0)

In [None]:
df_test.to_csv("/content/ Exporting Parasocial Language Test Set")

In [None]:
## inspect the data
# label distribution train set
print("Train set label distribution: ", df_train.label_text.value_counts())
# label distribution test set
print("Test set label distribution: ", df_test.label_text.value_counts())

# full training data table
DataTable(df_train, num_rows_per_page=5)

#Train set label distribution:  Other                                   1290
#Hostility towards Russia or Russians     310
#Name: label_text, dtype: int64
#Test set label distribution:  Other                                   322
#Hostility towards Russia or Russians     78
#Name: label_text, dtype: int64

Train set label distribution:  Other                  2119
Parasocial Language    1081
Name: label_text, dtype: int64
Test set label distribution:  Other                  530
Parasocial Language    270
Name: label_text, dtype: int64


Unnamed: 0,id,subreddit,thread_title,convo,random_speaker,text,Parasocial_Language,label_text,label
2486,db2hx3g,Angory_Tom,Welcome!,AngMod: This is an Angor subreddit for any new...,AngMod,oops.exe,0,Other,0
1693,3s0yl7,KittyKatGaming,Editor Question/Idea?,"RedLikeRoses: Currently with the new outro, th...",RedLikeRoses,"Currently with the new outro, the video fades ...",1,Parasocial Language,1
1711,ct41bge,KittyKatGaming,Suzy...please just use the Salon (Kitty Powers...,PresidentPoogie: I understand that it's weird ...,PresidentPoogie,I apologize for seeming really ranty about thi...,1,Parasocial Language,1
1935,e1jioud,lilypichu,Drew Lily! :D,The_Rice_Tosser: https://i.redd.it/4h19867zn27...,lilypichu,Thanks so much!!!,1,Parasocial Language,1
485,dbporf7,Angory_Tom,Astroneer - MEGA STORM - Part #2,BaaruRaimu: This is quickly becoming my favour...,The_WubWub,Wish they did now of this game. Wish Tom would...,1,Parasocial Language,1
...,...,...,...,...,...,...,...,...,...
1212,e4qvtux,Amouranth,Shut up Tenkini,tenkini: Fucking Shitkini! Ô£ø√º¬ß¬® [deleted]...,Break_Bot72,"eat the largest dong that is available, pls.",0,Other,0
1148,7fddd0,Pokimane,Poki keeps it interesting.,swuni: I've been watching Poki for a while now...,swuni,I've been watching Poki for a while now even t...,1,Parasocial Language,1
686,dzw49dg,NakeyJakey,When BioShock Was My Only Friend,aspargus62: Finally! eyesack27: JAKE I LOVE YO...,eyesack27,JAKE I LOVE YOU &lt;3,1,Parasocial Language,1
472,devss5k,Angory_Tom,Warhammer - Underground Wars- Part #10,LordSwedish: So does anyone know how you deal ...,Smarthinus,The ghost friends did not win because the unde...,0,Other,0


**If you want to run the notebook on your own dataset:**

You can load your own training and test data above to fine-tune your own BERT-NLI model. Your own dataframe only needs two columns to be compatible with the code below: (1) a "label_text" column with the label texts of your classes, (2) a "text_prepared" column with the texts for training (you might need to delete/adapt the text preparation code cell below for your dataset).

## Create NLI hypotheses

**Formulate a hypothesis, which verbalises the classes/task you are interested in.**

For this example, we base our task on the Manifesto Project codebook:  https://manifesto-project.wzb.eu/coding_schemes/mp_v4

We store the hypothesis in a dictionary: The keys of the dictionary should be the names of the respective label from the training dataframe ('label_text' column); the values of the dictionary should be your manually formulated hypothesis linked to the respective labels.

In [None]:
# dictionary mapping the dataset's label to manually formulated hypotheses based on the codebook
hypothesis_label_dic = {
    "Other": "The quote does not indicate the presence of a parasocial relationship or interaction with a gaming celebrity",
    "Parasocial Language": "The quote contains language which indicates the presence of a parasocial interaction or relationship with the target gaming celebrity, for example including language resemblant of PSI and ESPI scale items and language indicating a sense of identification, wishful identification and relationship closeness.",
}

**Prepare the input text**

1.) We prepare the target texts by making them more naturally fit to the hypothesis. Here we simply wrap each target text into the string ' The quote: "{target_text}" - end of the quote. '

2.) We surround the target text by its preceeding and following sentence. Adding context like this systematically increases performance.


In [None]:
import re
df_train["text_prepared"] = 'The quote: "' + df_train.text.apply(lambda x: re.sub(r"http\S+", "<url>", str(x)), 1) + '" - end of the quote.'
df_test["text_prepared"] =  'The quote: "' +  df_test.text.apply(lambda x: re.sub(r"http\S+", "<url>", str(x)), 1) + '" - end of the quote.'


## Format the training and test datasets for NLI classification


**Format the training data**

1.) For each text with a specific class (label), the corresponding class-hypothesis needs to be added in the same row with the label 'true' (also expressed with the numeric label value 0).

2.) Adding 'false' examples: The NLI task consists of predicting, whether a hypothesis is true or false given a context.
If we only give 'true' hypothesis-context pairs to the algorithm, it will not learn the 'false' class properly.
For each text, we therefore also add a row where the text is matched with a random wrong class label and give it the NLI label 'false' (also expressed with the numeric label value 1). This increases the training data by up to 2x.

See the table below for the concrete format the training data takes after this pre-processing step.
Note that NLI can be formulated as a 3-class (entailment/neutral/contradiction) or 2-class (entailment/not-entailment) task. Both can be used here. We use the 2-class variant.
Note that the words entailment/neutral/contradition and true/neutral/false are used interchangably here. Both terminologies are used in the literature and coding instructions.


In [None]:
## function for reformatting the train set
def format_nli_trainset(df_train=None, hypo_label_dic=None, random_seed=42):
  print(f"Length of df_train before formatting step: {len(df_train)}.")
  length_original_data_train = len(df_train)

  df_train_lst = []
  for label_text, hypothesis in hypo_label_dic.items():
    ## entailment
    df_train_step = df_train[df_train.label_text == label_text].copy(deep=True)
    df_train_step["hypothesis"] = [hypothesis] * len(df_train_step)
    df_train_step["label"] = [0] * len(df_train_step)
    ## not_entailment
    df_train_step_not_entail = df_train[df_train.label_text != label_text].copy(deep=True)
    df_train_step_not_entail = df_train_step_not_entail.sample(n=min(len(df_train_step), len(df_train_step_not_entail)), random_state=random_seed)
    df_train_step_not_entail["hypothesis"] = [hypothesis] * len(df_train_step_not_entail)
    df_train_step_not_entail["label"] = [1] * len(df_train_step_not_entail)
    # append
    df_train_lst.append(pd.concat([df_train_step, df_train_step_not_entail]))
  df_train = pd.concat(df_train_lst)

  # shuffle
  df_train = df_train.sample(frac=1, random_state=random_seed)
  df_train["label"] = df_train.label.apply(int)
  df_train["label_nli_explicit"] = ["True" if label == 0 else "Not-True" for label in df_train["label"]]  # adding this just to simplify readibility

  print(f"After adding not_entailment training examples, the training data was augmented to {len(df_train)} texts.")
  print(f"Max augmentation could be: len(df_train) * 2 = {length_original_data_train*2}. It can also be lower, if there are more entail examples than not-entail for a majority class.")

  return df_train.copy(deep=True)


df_train_formatted = format_nli_trainset(df_train=df_train, hypo_label_dic=hypothesis_label_dic, random_seed=SEED_GLOBAL)

Length of df_train before formatting step: 3200.
After adding not_entailment training examples, the training data was augmented to 5362 texts.
Max augmentation could be: len(df_train) * 2 = 6400. It can also be lower, if there are more entail examples than not-entail for a majority class.


**Inspect reformatted training dataset**

Label 0 means that the hypothesis is 'true', label 1 means that the hypothesis is 'not-true'.


In [None]:
DataTable(df_train_formatted[["label", "label_nli_explicit", "hypothesis", "text_prepared"]], num_rows_per_page=5)

Unnamed: 0,label,label_nli_explicit,hypothesis,text_prepared
1280,1,Not-True,The quote contains language which indicates th...,"The quote: ""[mp4 link](<url>\n\n---\nThis mp4 ..."
1599,0,True,The quote contains language which indicates th...,"The quote: ""I'd fuck her"" - end of the quote."
1922,0,True,The quote does not indicate the presence of a ...,"The quote: ""Thanks!"" - end of the quote."
507,0,True,The quote contains language which indicates th...,"The quote: ""Hey Everyone,\n\nRecently Tom has ..."
1578,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""I've fapped to far worse. Most wom..."
...,...,...,...,...
2469,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""What about us silly internationals..."
2339,0,True,The quote contains language which indicates th...,"The quote: ""play project pokemon on roblox!\n""..."
552,1,Not-True,The quote contains language which indicates th...,"The quote: ""All hail Lijot the new king and he..."
1374,1,Not-True,The quote contains language which indicates th...,"The quote: ""Boing Boing Boing Boing Boing Boin..."


**Format the test set**

To know which class-hypothesis is true for a specific text, we need to test every possible class-hypothesis for each text. We therefore multiple the rows/texts in the test set by the number of hypothesis and pair each text with all possible hypotheses. The table below shows what the reformatted test set looks like.

In [None]:
## function for reformatting the test set
def format_nli_testset(df_test=None, hypo_label_dic=None):
  ## explode test dataset for N hypotheses
  hypothesis_lst = [value for key, value in hypo_label_dic.items()]
  print("Number of hypotheses/classes: ", len(hypothesis_lst))

  # label lists with 0 at alphabetical position of their true hypo, 1 for not-true hypos
  label_text_label_dic_explode = {}
  for key, value in hypo_label_dic.items():
    label_lst = [0 if value == hypo else 1 for hypo in hypothesis_lst]
    label_text_label_dic_explode[key] = label_lst

  df_test["label"] = df_test.label_text.map(label_text_label_dic_explode)
  df_test["hypothesis"] = [hypothesis_lst] * len(df_test)
  print(f"Original test set size: {len(df_test)}")

  # explode dataset to have K-1 additional rows with not_entail label and K-1 other hypotheses
  # ! after exploding, cannot sample anymore, because distorts the order to true label values, which needs to be preserved for evaluation code
  df_test = df_test.explode(["hypothesis", "label"])  # multi-column explode requires pd.__version__ >= '1.3.0'
  print(f"Test set size for NLI classification: {len(df_test)}\n")

  df_test["label_nli_explicit"] = ["True" if label == 0 else "Not-True" for label in df_test["label"]]  # adding this just to simplify readibility

  return df_test.copy(deep=True)


df_test_formatted = format_nli_testset(df_test=df_test, hypo_label_dic=hypothesis_label_dic)

Number of hypotheses/classes:  2
Original test set size: 800
Test set size for NLI classification: 1600



**Inspect the reformatted test dataset**


In [None]:
DataTable(df_test_formatted[["label", "label_nli_explicit", "hypothesis", "text_prepared"]].sort_values(["text_prepared", "hypothesis"]), num_rows_per_page=6, max_rows=10_000)

Unnamed: 0,label,label_nli_explicit,hypothesis,text_prepared
3713,1,Not-True,The quote contains language which indicates th...,"The quote: "" I think in saying what you said y..."
3713,0,True,The quote does not indicate the presence of a ...,"The quote: "" I think in saying what you said y..."
1742,1,Not-True,The quote contains language which indicates th...,"The quote: """"Slap on the slap! .... Slap.... o..."
1742,0,True,The quote does not indicate the presence of a ...,"The quote: """"Slap on the slap! .... Slap.... o..."
1745,0,True,The quote contains language which indicates th...,"The quote: """"We were supposed to throw the gol..."
...,...,...,...,...
1210,0,True,The quote does not indicate the presence of a ...,"The quote: ""you're very talented"" - end of the..."
1609,1,Not-True,The quote contains language which indicates th...,"The quote: ""you're welcome"" - end of the quote."
1609,0,True,The quote does not indicate the presence of a ...,"The quote: ""you're welcome"" - end of the quote."
2424,1,Not-True,The quote contains language which indicates th...,"The quote: ""yup, exactly what i thought when i..."


## Fine-tuning

We use [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) for loading and training our model. They provide great documentation and also a very good [course](https://huggingface.co/course/chapter1/1) on how to use Transformers.

**Loading an NLI model**

You can can use any NLI model on the Hugging Face Hub. For normal English use-cases, we recommend this [base-size model](https://huggingface.co/MoritzLaurer/DeBERTa-v3-base-mnli-fever-docnli-ling-2c); for multilingual/non-English use-cases, we recommend this [multilingual model](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7); for best performance in English (but high compute and memory requirements) we recommend this [large model](https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli).


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

## load the BERT-NLI model and its tokenizer
# you can choose any of the NLI models here: https://huggingface.co/MoritzLaurer
model_name = "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# use GPU (cuda) if available, otherwise use CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")
model.to(device);


Device: cuda


**Tokenize data**

In [None]:
# convert pandas dataframes to Hugging Face dataset object to facilitate pre-processing
import datasets

dataset = datasets.DatasetDict({
    "train": datasets.Dataset.from_pandas(df_train_formatted),
    "test": datasets.Dataset.from_pandas(df_test_formatted)
})

# tokenize
def tokenize_nli_format(examples):
  return tokenizer(examples["text_prepared"], examples["hypothesis"], truncation=True, max_length=512)  #512 max_length can be reduced to e.g. 256 to increase speed, but long texts will be cut off
dataset["train"] = dataset["train"].map(tokenize_nli_format, batched=True)
dataset["test"] = dataset["test"].map(tokenize_nli_format, batched=True)

# remove unnecessary columns for model training 'id', 'subreddit', 'random_speaker', 'text'

dataset = dataset.remove_columns([ 'label_text','id', 'subreddit',  'thread_title', 'convo', 'random_speaker', 'text', 'Parasocial_Language', '__index_level_0__', ])

  0%|          | 0/6 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

**Inspect processed data**

In [None]:
print("The overall structure of the pre-processed train and test sets:\n")
print(dataset)

print("\n\nAn example for a tokenized hypothesis-context pair:\n")
print(dataset["train"][0])

The overall structure of the pre-processed train and test sets:

DatasetDict({
    train: Dataset({
        features: ['label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 5362
    })
    test: Dataset({
        features: ['label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 1600
    })
})


An example for a tokenized hypothesis-context pair:

{'label': 1, 'text_prepared': 'The quote: "[mp4 link](<url>\n\n---\nThis mp4 version is 92.02% smaller than the gif (446.84 KB vs 5.47 MB).  \n\n\n---\n*Beep, I\'m a " - end of the quote.', 'hypothesis': 'The quote contains language which indicates the presence of a parasocial interaction or relationship with the target gaming celebrity, for example including language resemblant of PSI and ESPI scale items and language indicating a sense of identification, wishful identification and relationship

### Setting training arguments / hyperparameters

The following cell sets several important hyperparameters. We chose parameters that work well in general to avoid the need for hyperparameter search. Further below, we also provide code for hyperparameter search, if researchers want to try to increase performance by a few percentage points.

In [None]:
from transformers import TrainingArguments, Trainer, logging

# Set the directory to write the fine-tuned model and training logs to.
# With google colab, this will create a temporary folder, which will be deleted once you disconnect.
# You can connect to your personal google drive to save models and logs properly.
training_directory = "BERT-nli-ua-host"

# FP16 is a hyperparameter which can increase training speed and reduce memory consumption, but only on GPU and if batch-size > 8, see here: https://huggingface.co/transformers/performance.html?#fp16
# FP16 does not work on CPU or for multilingual mDeBERTa models
fp16_bool = True if torch.cuda.is_available() else False
if "mdeberta" in model_name.lower(): fp16_bool = False  # multilingual mDeBERTa does not support FP16 yet: https://github.com/microsoft/DeBERTa/issues/77
# in case of hyperparameter search end the end: FP16 has to be set to False. The integrated hyperparameter search with the Hugging Face Trainer can lead to errors otherwise.
fp16_bool = False

# Hugging Face tipps to increase training speed and decrease out-of-memory (OOM) issues: https://huggingface.co/transformers/performance.html?
# Overview of all training arguments: https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments
train_args = TrainingArguments(
    output_dir=f'./results/{training_directory}',
    logging_dir=f'./logs/{training_directory}',
    learning_rate=2e-5,
    per_device_train_batch_size=16,  # if you get an out-of-memory error, reduce this value to 8 or 4 and restart the runtime. Higher values increase training speed, but also increase memory requirements. Ideal values here are always a multiple of 8.
    per_device_eval_batch_size=32,  # if you get an out-of-memory error, reduce this value, e.g. to 40 and restart the runtime
    #gradient_accumulation_steps=4, # Can be used in case of memory problems to reduce effective batch size. accumulates gradients over X steps, only then backward/update. decreases memory usage, but also slightly speed. (!adapt/halve batch size accordingly)
    num_train_epochs=10,  # this can be increased, but higher values increase training time. Good values for NLI are between 3 and 20.
    warmup_ratio=0.25,  # a good normal default value is 0.06 for normal BERT-base models, but since we want to reuse prior NLI knowledge and avoid catastrophic forgetting, we set the value higher
    weight_decay=0.1,
    seed=SEED_GLOBAL,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    fp16=fp16_bool,  # Can speed up training and reduce memory consumption, but only makes sense at batch-size > 8. loads two copies of model weights, which creates overhead. https://huggingface.co/transformers/performance.html?#fp16
    fp16_full_eval=fp16_bool,
    evaluation_strategy="no", # options: "no"/"steps"/"epoch"
    #eval_steps=10_000,  # evaluate after n steps if evaluation_strategy!='steps'. defaults to logging_steps
    save_strategy = "no",  # options: "no"/"steps"/"epoch"
    #save_steps=10_000,              # Number of updates steps before two checkpoint saves.
    #save_total_limit=10,             # If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir
    #logging_strategy="steps",
    report_to="all",  # "all"  # logging
    #push_to_hub=False,
    #push_to_hub_model_id=f"{model_name}-finetuned-{task}",
)


In [None]:
# helper function to clean memory and reduce risk of out-of-memory error
import gc
def clean_memory():
  #del(model)
  if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()
  gc.collect()

clean_memory()

### Custom function to compute metrics for NLI

We multiplied each text N times for each class in the test set and NLI can only predict 2 or 3 classes: true/not-true or true/neutral/false. This means that we cannot use standard functions for computing metrics. The following function reformats the model's output in a way that allows for the calculation of standard metrics like accuracy, F1-macro etc.

In [None]:
from sklearn.metrics import balanced_accuracy_score, precision_recall_fscore_support, accuracy_score, classification_report

def compute_metrics_nli_binary(eval_pred, label_text_alphabetical=None):
    predictions, labels = eval_pred

    ### reformat model output to enable calculation of standard metrics
    # split in chunks with predictions for each hypothesis for one unique premise
    def chunks(lst, n):  # Yield successive n-sized chunks from lst. https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
        for i in range(0, len(lst), n):
            yield lst[i:i + n]

    # for each chunk/premise, select the most likely hypothesis
    softmax = torch.nn.Softmax(dim=1)
    prediction_chunks_lst = list(chunks(predictions, len(set(label_text_alphabetical)) ))
    hypo_position_highest_prob = []
    for i, chunk in enumerate(prediction_chunks_lst):
        hypo_position_highest_prob.append(np.argmax(np.array(chunk)[:, 0]))  # only accesses the first column of the array, i.e. the entailment/true prediction logit of all hypos and takes the highest one

    label_chunks_lst = list(chunks(labels, len(set(label_text_alphabetical)) ))
    label_position_gold = []
    for chunk in label_chunks_lst:
        label_position_gold.append(np.argmin(chunk))  # argmin to detect the position of the 0 among the 1s

    print("Highest probability prediction per premise: ", hypo_position_highest_prob)
    print("Correct label per premise: ", label_position_gold)

    ### calculate standard metrics
    precision_macro, recall_macro, f1_macro, _ = precision_recall_fscore_support(label_position_gold, hypo_position_highest_prob, average='macro')  # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
    precision_micro, recall_micro, f1_micro, _ = precision_recall_fscore_support(label_position_gold, hypo_position_highest_prob, average='micro')  # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
    acc_balanced = balanced_accuracy_score(label_position_gold, hypo_position_highest_prob)
    acc_not_balanced = accuracy_score(label_position_gold, hypo_position_highest_prob)
    metrics = {'f1_macro': f1_macro,
               'f1_micro': f1_micro,
               'accuracy_balanced': acc_balanced,
               'accuracy_not_b': acc_not_balanced,
               #'precision_macro': precision_macro,
               #'recall_macro': recall_macro,
               #'precision_micro': precision_micro,
               #'recall_micro': recall_micro,
               #'label_gold_raw': label_position_gold,
               #'label_predicted_raw': hypo_position_highest_prob
               }
    print("Aggregate metrics: ", {key: metrics[key] for key in metrics if key not in ["label_gold_raw", "label_predicted_raw"]} )  # print metrics but without label lists
    print("Detailed metrics: ", classification_report(label_position_gold, hypo_position_highest_prob, labels=np.sort(pd.factorize(label_text_alphabetical, sort=True)[0]), target_names=label_text_alphabetical, sample_weight=None, digits=2, output_dict=True,
                                zero_division='warn'), "\n")
    return metrics

# Create alphabetically ordered list of the original dataset classes/labels
# This is necessary to be sure that the ordering of the test set labels and predictions is the same. Otherwise there is a risk that labels and predictions are in a different order and resulting metrics are wrong.
label_text_alphabetical = np.sort(df_train.label_text.unique())


### Fine-tuning and evaluation  (IGNORE THIS AND PROCEED TO HP SEARCH)

Let's start fine-tuning the model!

If you get an 'out-of-memory' error, reduce the 'per_device_train_batch_size' to 8 or 4 in the TrainingArguments above and restart the runtime. If you don't restart your runtime (menu to the to left 'Runtime' > 'Restart runtime') and rerun the entire script, the 'out-of-memory' error will probably not go away.

In [None]:
# training
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset["train"],  #.shard(index=1, num_shards=100),  # could shard data for faster testing https://huggingface.co/docs/datasets/processing.html#sharding-the-dataset-shard
    eval_dataset=dataset["test"],  #.shard(index=1, num_shards=100),
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
)

trainer.train()


The following columns in the training set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: label_nli_explicit, hypothesis, text_prepared. If label_nli_explicit, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 5362
  Num Epochs = 10
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 3360
You're using a DebertaV2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  attention_scores = torch.bmm(query_layer, key_layer.transpose(-1, -2)) / torch.tensor(
  score += c2p_att / torch.tensor(scale, dtype=c2p_att.dtype)
  score += p2c_att

Step,Training Loss


KeyboardInterrupt: 

In [None]:
## Evaluate the fine-tuned model on the held-out test set
results = trainer.evaluate()

In [None]:
print(results)

## Hyperparameter Search

To increase performance, you can also conduct a hyperparameter search (hp-search), to try and find the best hyperparameters for your specific task and dataset. The trade-off is that hp-search is very compute intensive, but finding better hyperparameters for your task can increase performance. Make sure to conduct hp-search on a sub-set of the training set (i.e. validation set) and not the final test set to avoid data leakage of the test set before final testing.

Note that for small datasets, running the hp-search only on one train-validation split is not ideal. For datasets with less than around 2000 training data points, we recommend running the hp-search on two different random train-validation split. We implemented this for our paper, but not in this notebook as this would make the code harder to understand.

Documentation with more information on hp-search with Hugging Face Transformers is available [here](https://huggingface.co/docs/transformers/main/hpo_train).

In [None]:
## train-validation split - test set should not be visible during hp-search
# https://huggingface.co/docs/datasets/v2.5.1/en/package_reference/main_classes#datasets.Dataset.train_test_split

# the ideal size of the validation set depends on the size of your training data. Each label should have at the very least a few dozen examples in the validation set (ideally several hundred)
validation_set_size = 0.4  # for a training data size of 1000 with 3 classes we use 40% of the training data for validating hyperparameters

# reformatting of label column to enable dataset stratification
from datasets import ClassLabel
new_features = dataset["train"].features.copy()
if len(model.config.label2id.keys()) == 2:  # for 2-class NLI model
  label_names = ["entailment", "not-entailment"]
elif len(model.config.label2id.keys()) == 3:  # for 3-class NLI model
  label_names = ["entailment", "neutral", "contradiction"]
new_features['label'] = ClassLabel(names=label_names)
print(len(dataset['train']))
dataset = dataset.cast(new_features)
print(len(dataset['train']))

# train-validation split for hp-search
dataset_hp = dataset["train"].train_test_split(test_size=validation_set_size, seed=SEED_GLOBAL, shuffle=True, stratify_by_column="label")
print(dataset_hp)

5362


Casting the dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

Casting the dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

1000
DatasetDict({
    train: Dataset({
        features: ['label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 600
    })
    test: Dataset({
        features: ['label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 400
    })
})


In [None]:
## Reinitialize trainer for hp-search
# https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/10

def model_init():
  clean_memory()
  return AutoModelForSequenceClassification.from_pretrained(model_name).to(device)  # return_dict=True

trainer = Trainer(
    model_init=model_init,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset_hp["train"],
    eval_dataset=dataset_hp["test"],
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
);


loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

**Define the hyperparameters you want to optimise**

For a detailed discussion of different hyperparameters, see the appendix of our paper.

In [None]:
# we use Optuna for hp-search: https://optuna.readthedocs.io/en/stable/
def my_hp_space(trial):
    return {
        "learning_rate": trial.suggest_categorical("learning_rate", [9e-6, 2e-5, 4e-5]),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 4, 16, log=False, step=2),   # increasing the maximum number of epochs here could increase performance but will take (much) longer to train
        "warmup_ratio": trial.suggest_float("warmup_ratio", 0.1, 0.6, log=True),
        "per_device_train_batch_size": 8,  # lower this value in case of out-of-memory errors and restart the runtime
        #"per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [8, 16, 32]),
    }


**Run HP search!**

Choose the number of hyperparameter configurations you want to test. In our experiments we found that after 10 to 15 trials with around 4 hyperparameters, performance is unlikely to increase meaningfully. 15 trials seems to be a safe value, but can take a while to run.

In [None]:
import optuna

# number of differen hp configurations to test
numer_of_trials = 16  # increasing this value can lead to better hyperparameters, but will take longer
# chose the sampler for sampling hp configurations
optuna_sampler = optuna.samplers.TPESampler(seed=SEED_GLOBAL, consider_prior=True, prior_weight=1.0, consider_magic_clip=True, consider_endpoints=False, n_startup_trials=numer_of_trials/2, n_ei_candidates=24, multivariate=False, group=False, warn_independent_sampling=True, constant_liar=False)  # https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.TPESampler.html#optuna.samplers.TPESampler

# Hugging Face Documentation: https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.hyperparameter_search
best_run = trainer.hyperparameter_search(
    n_trials=numer_of_trials,
    direction="maximize",
    hp_space=my_hp_space,
    backend='optuna',
    **{"sampler": optuna_sampler}
)

[32m[I 2024-04-20 08:15:29,614][0m A new study created in memory with name: no-name-cf7b90f4-0a58-409a-b396-891c736318b7[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.13225317293225414}
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_em

Step,Training Loss
500,0.471




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32


[32m[I 2024-04-20 08:27:26,149][0m Trial 0 finished with value: 2.293909355274401 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.13225317293225414}. Best is trial 0 with value: 2.293909355274401.[0m
Trial: {'learning_rate': 4e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.3556211334079358}


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.5302




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 08:40:28,785][0m Trial 1 finished with value: 2.1486278342177703 and parameters: {'learning_rate': 4e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.3556211334079358}. Best is trial 0 with value: 2.293909355274401.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.13851197621057668}


Highest probability prediction per premise:  [0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 08:46:45,684][0m Trial 2 finished with value: 2.3451013814616757 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.13851197621057668}. Best is trial 2 with value: 2.3451013814616757.[0m
Trial: {'learning_rate': 4e-05, 'num_train_epochs': 10, 'warmup_ratio': 0.16850792067367906}


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.4172




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 09:03:28,248][0m Trial 3 finished with value: 2.3705419920478272 and parameters: {'learning_rate': 4e-05, 'num_train_epochs': 10, 'warmup_ratio': 0.16850792067367906}. Best is trial 3 with value: 2.3705419920478272.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 8, 'warmup_ratio': 0.22640782282355432}


Highest probability prediction per premise:  [0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.5836




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 09:16:03,995][0m Trial 4 finished with value: 2.413398066485753 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 8, 'warmup_ratio': 0.22640782282355432}. Best is trial 4 with value: 2.413398066485753.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.10867895322868468}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.5566




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 09:33:16,798][0m Trial 5 finished with value: 2.3792415974638086 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.10867895322868468}. Best is trial 4 with value: 2.413398066485753.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5641671230331442}


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7597
1000,0.2687




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 09:50:38,125][0m Trial 6 finished with value: 2.464675941817558 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5641671230331442}. Best is trial 6 with value: 2.464675941817558.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.22004181237640907}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6082




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 10:03:16,186][0m Trial 7 finished with value: 2.4005555555555556 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.22004181237640907}. Best is trial 6 with value: 2.464675941817558.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5681284201674024}


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7601
1000,0.2712




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 10:20:36,885][0m Trial 8 finished with value: 2.5160238529718457 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5681284201674024}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5085111679683939}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7458
1000,0.2448




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 10:36:45,478][0m Trial 9 finished with value: 2.447576566951567 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5085111679683939}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.37971592370080687}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 10:40:42,283][0m Trial 10 finished with value: 2.3195338647670734 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.37971592370080687}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5655371299233337}


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7599
1000,0.2712




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 10:56:04,170][0m Trial 11 finished with value: 2.477460253554301 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5655371299233337}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.38428515223315357}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7086
1000,0.199




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 11:13:44,137][0m Trial 12 finished with value: 2.383508133743863 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.38428515223315357}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 14, 'warmup_ratio': 0.5979965247422729}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7492
1000,0.2582




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 11:28:45,929][0m Trial 13 finished with value: 2.3963174438906254 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 14, 'warmup_ratio': 0.5979965247422729}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 4e-05, 'num_train_epochs': 14, 'warmup_ratio': 0.29821253572590256}


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.5263
1000,0.038




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 11:44:04,508][0m Trial 14 finished with value: 2.3365320738386446 and parameters: {'learning_rate': 4e-05, 'num_train_epochs': 14, 'warmup_ratio': 0.29821253572590256}. Best is trial 8 with value: 2.5160238529718457.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 14, 'warmup_ratio': 0.4551583046884898}


Highest probability prediction per premise:  [0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7132
1000,0.207




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-20 12:01:45,132][0m Trial 15 finished with value: 2.3664659656264133 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 14, 'warmup_ratio': 0.4551583046884898}. Best is trial 8 with value: 2.5160238529718457.[0m


Highest probability prediction per premise:  [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0]
Correct label per premise:  [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0

In [None]:
# show best hyperparameters based on hp-search
print(best_run)

BestRun(run_id='8', objective=2.5160238529718457, hyperparameters={'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5681284201674024})


**Training Time with optimised hyperparameters!**

Here we can use the original train and test set again.

In [None]:
# update the training arguments with the best hyperparameters
for k,v in best_run.hyperparameters.items():
    setattr(train_args, k, v)
print("\n", train_args)

# hp-search with hf causes errors with FP16 for some reason
#setattr(train_args, "fp16", False)
#setattr(train_args, "fp16_full_eval", False)


 TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_n

In [None]:
dataset = datasets.DatasetDict({
    "train": datasets.Dataset.from_pandas(df_train_formatted),
    "test": datasets.Dataset.from_pandas(df_test_formatted)
})

# tokenize
def tokenize_nli_format(examples):
  return tokenizer(examples["text_prepared"], examples["hypothesis"], truncation=True, max_length=512)  #512 max_length can be reduced to e.g. 256 to increase speed, but long texts will be cut off
dataset["train"] = dataset["train"].map(tokenize_nli_format, batched=True)
dataset["test"] = dataset["test"].map(tokenize_nli_format, batched=True)

# remove unnecessary columns for model training

dataset = dataset.remove_columns([ 'label_text','id', 'subreddit',  'thread_title', 'convo', 'random_speaker', 'text', 'Parasocial_Language', '__index_level_0__', ])

  0%|          | 0/6 [00:00<?, ?ba/s]

  0%|          | 0/2 [00:00<?, ?ba/s]

In [None]:
# Training
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset["train"],  #.shard(index=1, num_shards=100),  # https://huggingface.co/docs/datasets/processing.html#sharding-the-dataset-shard
    eval_dataset=dataset["test"],  #.shard(index=1, num_shards=100),
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
)

trainer.train()


The following columns in the training set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 5362
  Num Epochs = 16
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 10736
  attention_scores = torch.bmm(query_layer, key_layer.transpose(-1, -2)) / torch.tensor(
  score += c2p_att / torch.tensor(scale, dtype=c2p_att.dtype)
  score += p2c_att / torch.tensor(scale, dtype=p2c_att.dtype)


Step,Training Loss
500,1.128
1000,0.658
1500,0.6152
2000,0.5653
2500,0.5397
3000,0.5057
3500,0.4668
4000,0.4445
4500,0.3857
5000,0.3702




Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=10736, training_loss=0.38833518817065904, metrics={'train_runtime': 16177.0341, 'train_samples_per_second': 5.303, 'train_steps_per_second': 0.664, 'total_flos': 6680210777143020.0, 'train_loss': 0.38833518817065904, 'epoch': 16.0})

In [None]:
## Evaluate the fine-tuned model on the held-out test set
results = trainer.evaluate()


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: hypothesis, label_nli_explicit, text_prepared. If hypothesis, label_nli_explicit, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 1600
  Batch size = 32


Highest probability prediction per premise:  [0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 

In [None]:
print(results)

{'eval_loss': 1.7399128675460815, 'eval_f1_macro': 0.7208289712860098, 'eval_f1_micro': 0.7487500000000001, 'eval_accuracy_balanced': 0.7222571628232006, 'eval_accuracy_not_b': 0.74875, 'eval_runtime': 42.1529, 'eval_samples_per_second': 37.957, 'eval_steps_per_second': 1.186, 'epoch': 16.0}


## Save and load your fine-tuned model

This segment provides code for saving the model to your hard-disk or for uploading it to the Hugging Face hub.

In [None]:
## first you need to connect to your google drive with your google account
from google.colab import drive
import os
#drive.flush_and_unmount()

# insert the path where you want to save the model
os.chdir("/content/results/")
print(os.getcwd())


/content/results


In [None]:
model_custom_path = "content/results2"
trainer.save_model(output_dir=model_custom_path)

Saving model checkpoint to content/results2
Configuration saved in content/results2/config.json
Model weights saved in content/results2/pytorch_model.bin
tokenizer config file saved in content/results2/tokenizer_config.json
Special tokens file saved in content/results2/special_tokens_map.json


In [None]:
### Push to hub
# install necessary dependencies
!sudo apt-get install git-lfs
!huggingface-cli login

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    

In [None]:
# load your models from disk
model = AutoModelForSequenceClassification.from_pretrained(model_custom_path)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)  # we load the tokenizer from the original BERT-NLI model

# https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.push_to_hub
repo_id = 'mellonn/Parasocial-Language-Classifier'
model.push_to_hub(repo_id=repo_id)
tokenizer.push_to_hub(repo_id=repo_id)


loading configuration file content/results2/config.json
Model config DebertaV2Config {
  "_name_or_path": "content/results2",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_hidden_act": "gelu",
  "pooler_hidden_size": 768,
  "pos_att_type": [
    "p2c",
    "c2p"
  ],
  "position_biased_input": false,
  "position_buckets": 256,
  "relative_attention": true,
  "share_a

pytorch_model.bin:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

tokenizer config file saved in /tmp/tmpbcjk0euh/tokenizer_config.json
Special tokens file saved in /tmp/tmpbcjk0euh/special_tokens_map.json
Uploading the following files to mellonn/Parasocial-Language-Classifier: spm.model,tokenizer_config.json,tokenizer.json,special_tokens_map.json,added_tokens.json


Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

spm.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/mellonn/Parasocial-Language-Classifier/commit/3da78deb64edd36b82438beeebee939a250477d0', commit_message='Upload tokenizer', commit_description='', oid='3da78deb64edd36b82438beeebee939a250477d0', pr_url=None, pr_revision=None, pr_num=None)