## Code to train the BERT-NLI Outgroup Hostility model
### **❗This notebook was copied from the Demo Notebook for Less Annotating, More Classifying. Most helpful comments in the code came from the Demo Notebook. The credit for the code and most comments belongs to the authors of this paper: https://doi.org/10.1017/pan.2023.20.❗**

Read more about the BERT-NLI approach used here: ["Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI"](https://github.com/MoritzLaurer/less-annotating-with-bert-nli) by Moritz Laurer, Wouter van Atteveldt, Andreu Casas, Kasper Welbers.


## Activate a GPU runtime

In order to run this notebook on a GPU, click on "Runtime" > "Change runtime type" > select "GPU" in the menue bar in to top left. Training a Transformer is much faster on a GPU. Given Google's usage limits for GPUs, it is advisable to first test your non-training code on a CPU (Hardware accelerator "None" instead of GPU) and only use the GPU once you know that everything is working.

## Install relevant packages

In [None]:
!pip install transformers[sentencepiece]==4.23
!pip install datasets==2.6
!pip install optuna==3.0


Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 3108, in _dep_map
    return self.__dep_map
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 2901, in __getattr__
    raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 242, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 441, in run
    conflicts = self._determine_conflicts(to_install)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 

In [None]:
## Load general packages
# some more specialised packages are loaded in each sub section
import pandas as pd
import numpy as np
from google.colab.data_table import DataTable

In [None]:
df3 = pd.read_csv("sampled_utterances_final_BIN.csv") #upload the file locally and set the path on the left
df3.head()

Unnamed: 0,id,subreddit,thread_title,convo,random_speaker,text,Parasocial_Language
0,dywzb12,NinjasHyper,Name of the song,Dj1312: Hi !\n\nCan someone say me the name of...,Dj1312,9,0
1,e2q0ijd,NinjasHyper,"So sad that ninja died if ligma, rip ninja he ...",DestroyerTheGuy: Wait is he dead? mdheintz21: ...,mdheintz21,Ya ligma got him so sad,1
2,duf5mqw,NinjasHyper,Raising 100k For Charity!! - Fortnite Battle R...,Potato_Private: https://www.twitch.tv/00flour,Potato_Private,https://www.twitch.tv/00flour\n,0
3,9jqg7i,NinjasHyper,5 ads in one YouTube video?,Fartmaster50000: I used to love watching uploa...,Fartmaster50000,I used to love watching uploads of Ninja's gam...,1
4,dxgam05,NinjasHyper,Leviathan Solo Squads! - Fortnite Battle Royal...,Ricmaniac: i love the way ninja plays... but t...,Ricmaniac,i love the way ninja plays... but the sellout ...,1


In [None]:
df3.head()# set random seed for reproducibility
SEED_GLOBAL = 42
np.random.seed(SEED_GLOBAL)

## Download and prepare data

In [None]:
df_train = pd.read_csv("/content/drive/MyDrive/Dissertation/sample/sampled_utterances_final_BIN.csv")

Unnamed: 0,id,subreddit,thread_title,convo,random_speaker,text,Parasocial_Language
0,dywzb12,NinjasHyper,Name of the song,Dj1312: Hi !\n\nCan someone say me the name of...,Dj1312,9,0
1,e2q0ijd,NinjasHyper,"So sad that ninja died if ligma, rip ninja he ...",DestroyerTheGuy: Wait is he dead? mdheintz21: ...,mdheintz21,Ya ligma got him so sad,1
2,duf5mqw,NinjasHyper,Raising 100k For Charity!! - Fortnite Battle R...,Potato_Private: https://www.twitch.tv/00flour,Potato_Private,https://www.twitch.tv/00flour\n,0
3,9jqg7i,NinjasHyper,5 ads in one YouTube video?,Fartmaster50000: I used to love watching uploa...,Fartmaster50000,I used to love watching uploads of Ninja's gam...,1
4,dxgam05,NinjasHyper,Leviathan Solo Squads! - Fortnite Battle Royal...,Ricmaniac: i love the way ninja plays... but t...,Ricmaniac,i love the way ninja plays... but the sellout ...,1


In [None]:
df_train.columns

Index(['id', 'subreddit', 'thread_title', 'convo', 'random_speaker', 'text',
       'Parasocial_Language'],
      dtype='object')

In [None]:
from sklearn.model_selection import train_test_split
sample_size = 2000
df_train, df_test = train_test_split(df_train[:sample_size], test_size=.2, random_state=SEED_GLOBAL,stratify=df_train['Parasocial_Language'][:sample_size])
#note: because here we strarified the data by the training label the solidarity and hostility test sets are not the same
print("Length of training and test sets after sampling: ", len(df_train), " (train) ", len(df_test), " (test).")

Length of training and test sets after sampling:  1600  (train)  400  (test).


In [None]:
df_train['label_text'] = df_train.Parasocial_Language.apply(lambda x: 'Parasocial Language' if x else "Other")
df_test['label_text'] = df_test.Parasocial_Language.apply(lambda x: 'Parasocial Language' if x else "Other")
df_train['label'] = df_train.Parasocial_Language.apply(lambda x: 1 if x else 0)
df_test['label'] = df_test.Parasocial_Language.apply(lambda x: 1 if x else 0)

In [None]:
#df_test.to_csv("/content/drive/MyDrive/Dissertation/ Exporting Parasocial Language Test Set")

In [None]:
## inspect the data
# label distribution train set
print("Train set label distribution: ", df_train.label_text.value_counts())
# label distribution test set
print("Test set label distribution: ", df_test.label_text.value_counts())

# full training data table
DataTable(df_train, num_rows_per_page=5)

#Train set label distribution:  Other                                   1290
#Hostility towards Russia or Russians     310
#Name: label_text, dtype: int64
#Test set label distribution:  Other                                   322
#Hostility towards Russia or Russians     78
#Name: label_text, dtype: int64

Train set label distribution:  Other                  982
Parasocial Language    618
Name: label_text, dtype: int64
Test set label distribution:  Other                  246
Parasocial Language    154
Name: label_text, dtype: int64


Unnamed: 0,id,subreddit,thread_title,convo,random_speaker,text,Parasocial_Language,label_text,label
1936,dplawij,lilypichu,"„ÄêLoL„Äë LCS Ready! ~ ft. Doublelift, Marc Me...",27brian: i fucked up the title sorry,27brian,i fucked up the title sorry,0,Other,0
1296,dtntjc9,Amouranth,Pokemon Painting is turning out GREAT!,D3vil_0: Its SO DAMN GOOD. iSuckMore: No wonde...,iSuckMore,No wonder it's posted by dyeoxyde. *rolls eyes...,1,Parasocial Language,1
1308,dqro967,Amouranth,Christmas Harley Bunny Hops,anti-gif-bot: [mp4 link](https://g.redditmedia...,D3vil_0,Boing Boing Boing Boing Boing Boing LUL,0,Other,0
399,dyfewj6,DanTDM,Dan's new videos kinda suck...,keenonthedaywalker: I was a fan ever since his...,keenonthedaywalker,Welcome friend. I do really hate complaining a...,1,Parasocial Language,1
853,9qex2t,LazarBeam,watch this,lazarbeamfan1234: fkdiofhieqaskhicfbuowvtruiwj...,lazarbeamfan1234,fkdiofhieqaskhicfbuowvtruiwjosu,0,Other,0
...,...,...,...,...,...,...,...,...,...
1025,dvv3tly,Pokimane,SKRATTAR DU F√ñLORAR DU MANNEEEN!!!,Wantya: Our goddess is now a living meme\n\nEd...,breadtheripper,"""Reuploaded due to network claim of streamer c...",1,Parasocial Language,1
1991,e21o3z0,lilypichu,Lily's Voice„Äå Edit „Äç- League of Legends,Command-0: WAIT WHEN DID SHE SING THIS IM SCRE...,Rennoku,ITS AT THE END OF HER KOREA VLOG!! :D,0,Other,0
1073,7ii80z,Pokimane,Poki's twin flame,"TwinFlame2: Hi, my name is Luis. Spirit is tel...",TwinFlame2,"Hi, my name is Luis. Spirit is telling me that...",0,Other,0
1612,cr18tas,KittyKatGaming,Super glad Arin helped Suzy in todays Huniepop...,WoodPlanking: She learned gifts and to get mor...,Liquidshredder,"Ya, and it's always nice to see Suzy and Arin ...",1,Parasocial Language,1




**If you want to run the notebook on your own dataset:**

You can load your own training and test data above to fine-tune your own BERT-NLI model. Your own dataframe only needs two columns to be compatible with the code below: (1) a "label_text" column with the label texts of your classes, (2) a "text_prepared" column with the texts for training (you might need to delete/adapt the text preparation code cell below for your dataset).

## Create NLI hypotheses

**Formulate a hypothesis, which verbalises the classes/task you are interested in.**

For this example, we base our task on the Manifesto Project codebook:  https://manifesto-project.wzb.eu/coding_schemes/mp_v4

We store the hypothesis in a dictionary: The keys of the dictionary should be the names of the respective label from the training dataframe ('label_text' column); the values of the dictionary should be your manually formulated hypothesis linked to the respective labels.

In [None]:
# dictionary mapping the dataset's label to manually formulated hypotheses based on the codebook
hypothesis_label_dic = {
    "Other": "The quote does not indicate the presence of a parasocial relationship or interaction with a gaming celebrity",
    "Parasocial Language": "The quote contains language which indicates the presence of a parasocial interaction or relationship with the target gaming celebrity, for example including language resemblant of PSI and ESPI scale items and language indicating a sense of identification, wishful identification and relationship closeness.",
}

**Prepare the input text**

1.) We prepare the target texts by making them more naturally fit to the hypothesis. Here we simply wrap each target text into the string ' The quote: "{target_text}" - end of the quote. '

2.) We surround the target text by its preceeding and following sentence. Adding context like this systematically increases performance.


In [None]:
import re
df_train["text_prepared"] = 'The quote: "' + df_train.text.apply(lambda x: re.sub(r"http\S+", "<url>", str(x)), 1) + '" - end of the quote.'
df_test["text_prepared"] =  'The quote: "' +  df_test.text.apply(lambda x: re.sub(r"http\S+", "<url>", str(x)), 1) + '" - end of the quote.'


## Format the training and test datasets for NLI classification


**Format the training data**

1.) For each text with a specific class (label), the corresponding class-hypothesis needs to be added in the same row with the label 'true' (also expressed with the numeric label value 0).

2.) Adding 'false' examples: The NLI task consists of predicting, whether a hypothesis is true or false given a context.
If we only give 'true' hypothesis-context pairs to the algorithm, it will not learn the 'false' class properly.
For each text, we therefore also add a row where the text is matched with a random wrong class label and give it the NLI label 'false' (also expressed with the numeric label value 1). This increases the training data by up to 2x.

See the table below for the concrete format the training data takes after this pre-processing step.
Note that NLI can be formulated as a 3-class (entailment/neutral/contradiction) or 2-class (entailment/not-entailment) task. Both can be used here. We use the 2-class variant.
Note that the words entailment/neutral/contradition and true/neutral/false are used interchangably here. Both terminologies are used in the literature and coding instructions.


In [None]:
## function for reformatting the train set
def format_nli_trainset(df_train=None, hypo_label_dic=None, random_seed=42):
  print(f"Length of df_train before formatting step: {len(df_train)}.")
  length_original_data_train = len(df_train)

  df_train_lst = []
  for label_text, hypothesis in hypo_label_dic.items():
    ## entailment
    df_train_step = df_train[df_train.label_text == label_text].copy(deep=True)
    df_train_step["hypothesis"] = [hypothesis] * len(df_train_step)
    df_train_step["label"] = [0] * len(df_train_step)
    ## not_entailment
    df_train_step_not_entail = df_train[df_train.label_text != label_text].copy(deep=True)
    df_train_step_not_entail = df_train_step_not_entail.sample(n=min(len(df_train_step), len(df_train_step_not_entail)), random_state=random_seed)
    df_train_step_not_entail["hypothesis"] = [hypothesis] * len(df_train_step_not_entail)
    df_train_step_not_entail["label"] = [1] * len(df_train_step_not_entail)
    # append
    df_train_lst.append(pd.concat([df_train_step, df_train_step_not_entail]))
  df_train = pd.concat(df_train_lst)

  # shuffle
  df_train = df_train.sample(frac=1, random_state=random_seed)
  df_train["label"] = df_train.label.apply(int)
  df_train["label_nli_explicit"] = ["True" if label == 0 else "Not-True" for label in df_train["label"]]  # adding this just to simplify readibility

  print(f"After adding not_entailment training examples, the training data was augmented to {len(df_train)} texts.")
  print(f"Max augmentation could be: len(df_train) * 2 = {length_original_data_train*2}. It can also be lower, if there are more entail examples than not-entail for a majority class.")

  return df_train.copy(deep=True)


df_train_formatted = format_nli_trainset(df_train=df_train, hypo_label_dic=hypothesis_label_dic, random_seed=SEED_GLOBAL)

Length of df_train before formatting step: 1600.
After adding not_entailment training examples, the training data was augmented to 2836 texts.
Max augmentation could be: len(df_train) * 2 = 3200. It can also be lower, if there are more entail examples than not-entail for a majority class.


**Inspect reformatted training dataset**

Label 0 means that the hypothesis is 'true', label 1 means that the hypothesis is 'not-true'.


In [None]:
DataTable(df_train_formatted[["label", "label_nli_explicit", "hypothesis", "text_prepared"]], num_rows_per_page=5)

Unnamed: 0,label,label_nli_explicit,hypothesis,text_prepared
43,0,True,The quote does not indicate the presence of a ...,"The quote: ""It‚Äôs ‚ÄúHello‚Äù by Galantis"" - ..."
1588,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""and fuck off that cuck idiot named..."
1425,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""I want to put it in her raw from b..."
728,1,Not-True,The quote contains language which indicates th...,"The quote: ""wow, you did. Thanks man!"" - end o..."
78,0,True,The quote contains language which indicates th...,"The quote: ""Love this ninja "" - end of the quote."
...,...,...,...,...
1072,0,True,The quote contains language which indicates th...,"The quote: ""uSe YoUr BrAiN a LiTtLe."" - end of..."
378,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""Feel free to subscribe and share i..."
873,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""Can I please play Fortnite with yo..."
1085,1,Not-True,The quote does not indicate the presence of a ...,"The quote: ""She posted a video of her being in..."


**Format the test set**

To know which class-hypothesis is true for a specific text, we need to test every possible class-hypothesis for each text. We therefore multiple the rows/texts in the test set by the number of hypothesis and pair each text with all possible hypotheses. The table below shows what the reformatted test set looks like.

In [None]:
## function for reformatting the test set
def format_nli_testset(df_test=None, hypo_label_dic=None):
  ## explode test dataset for N hypotheses
  hypothesis_lst = [value for key, value in hypo_label_dic.items()]
  print("Number of hypotheses/classes: ", len(hypothesis_lst))

  # label lists with 0 at alphabetical position of their true hypo, 1 for not-true hypos
  label_text_label_dic_explode = {}
  for key, value in hypo_label_dic.items():
    label_lst = [0 if value == hypo else 1 for hypo in hypothesis_lst]
    label_text_label_dic_explode[key] = label_lst

  df_test["label"] = df_test.label_text.map(label_text_label_dic_explode)
  df_test["hypothesis"] = [hypothesis_lst] * len(df_test)
  print(f"Original test set size: {len(df_test)}")

  # explode dataset to have K-1 additional rows with not_entail label and K-1 other hypotheses
  # ! after exploding, cannot sample anymore, because distorts the order to true label values, which needs to be preserved for evaluation code
  df_test = df_test.explode(["hypothesis", "label"])  # multi-column explode requires pd.__version__ >= '1.3.0'
  print(f"Test set size for NLI classification: {len(df_test)}\n")

  df_test["label_nli_explicit"] = ["True" if label == 0 else "Not-True" for label in df_test["label"]]  # adding this just to simplify readibility

  return df_test.copy(deep=True)


df_test_formatted = format_nli_testset(df_test=df_test, hypo_label_dic=hypothesis_label_dic)

Number of hypotheses/classes:  2
Original test set size: 400
Test set size for NLI classification: 800



**Inspect the reformatted test dataset**


In [None]:
DataTable(df_test_formatted[["label", "label_nli_explicit", "hypothesis", "text_prepared"]].sort_values(["text_prepared", "hypothesis"]), num_rows_per_page=6, max_rows=10_000)

Unnamed: 0,label,label_nli_explicit,hypothesis,text_prepared
1438,1,Not-True,The quote contains language which indicates th...,"The quote: "" Very nice tush"" - end of the quote."
1438,0,True,The quote does not indicate the presence of a ...,"The quote: "" Very nice tush"" - end of the quote."
874,1,Not-True,The quote contains language which indicates th...,"The quote: ""&amp;#x200B;\n\n<url>"" - end of th..."
874,0,True,The quote does not indicate the presence of a ...,"The quote: ""&amp;#x200B;\n\n<url>"" - end of th..."
990,1,Not-True,The quote contains language which indicates th...,"The quote: ""&amp;#x200B;\n\n[SMASH LIKE !!!!](..."
...,...,...,...,...
1210,0,True,The quote does not indicate the presence of a ...,"The quote: ""you're very talented "" - end of th..."
1538,1,Not-True,The quote contains language which indicates th...,"The quote: ""youre a good guy"" - end of the quote."
1538,0,True,The quote does not indicate the presence of a ...,"The quote: ""youre a good guy"" - end of the quote."
1122,1,Not-True,The quote contains language which indicates th...,"The quote: ""üòÄüòÄüéßüëçüçé"" - end of the..."


## Fine-tuning

We use [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) for loading and training our model. They provide great documentation and also a very good [course](https://huggingface.co/course/chapter1/1) on how to use Transformers.

**Loading an NLI model**

You can can use any NLI model on the Hugging Face Hub. For normal English use-cases, we recommend this [base-size model](https://huggingface.co/MoritzLaurer/DeBERTa-v3-base-mnli-fever-docnli-ling-2c); for multilingual/non-English use-cases, we recommend this [multilingual model](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7); for best performance in English (but high compute and memory requirements) we recommend this [large model](https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli).


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

## load the BERT-NLI model and its tokenizer
# you can choose any of the NLI models here: https://huggingface.co/MoritzLaurer
model_name = "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# use GPU (cuda) if available, otherwise use CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")
model.to(device);


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/467 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Device: cuda


**Tokenize data**

In [None]:
# convert pandas dataframes to Hugging Face dataset object to facilitate pre-processing
import datasets

dataset = datasets.DatasetDict({
    "train": datasets.Dataset.from_pandas(df_train_formatted),
    "test": datasets.Dataset.from_pandas(df_test_formatted)
})

# tokenize
def tokenize_nli_format(examples):
  return tokenizer(examples["text_prepared"], examples["hypothesis"], truncation=True, max_length=512)  #512 max_length can be reduced to e.g. 256 to increase speed, but long texts will be cut off
dataset["train"] = dataset["train"].map(tokenize_nli_format, batched=True)
dataset["test"] = dataset["test"].map(tokenize_nli_format, batched=True)

# remove unnecessary columns for model training

dataset = dataset.remove_columns(['id', 'subreddit', 'convo', 'random_speaker', 'Parasocial_Language', '__index_level_0__', ])

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

**Inspect processed data**

In [None]:
print("The overall structure of the pre-processed train and test sets:\n")
print(dataset)

print("\n\nAn example for a tokenized hypothesis-context pair:\n")
print(dataset["train"][0])

The overall structure of the pre-processed train and test sets:

DatasetDict({
    train: Dataset({
        features: ['thread_title', 'text', 'label_text', 'label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 2836
    })
    test: Dataset({
        features: ['thread_title', 'text', 'label_text', 'label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 800
    })
})


An example for a tokenized hypothesis-context pair:

{'thread_title': 'Does anyone know this song at 16:56?', 'text': 'It‚Äôs ‚ÄúHello‚Äù by Galantis', 'label_text': 'Other', 'label': 0, 'text_prepared': 'The quote: "It‚Äôs ‚ÄúHello‚Äù by Galantis" - end of the quote.', 'hypothesis': 'The quote does not indicate the presence of a parasocial relationship or interaction with a gaming celebrity', 'label_nli_explicit': 'True', 'input_ids': [1, 487, 66640, 268, 314, 11781, 1292,

### Setting training arguments / hyperparameters

The following cell sets several important hyperparameters. We chose parameters that work well in general to avoid the need for hyperparameter search. Further below, we also provide code for hyperparameter search, if researchers want to try to increase performance by a few percentage points.

In [None]:
from transformers import TrainingArguments, Trainer, logging

# Set the directory to write the fine-tuned model and training logs to.
# With google colab, this will create a temporary folder, which will be deleted once you disconnect.
# You can connect to your personal google drive to save models and logs properly.
training_directory = "BERT-nli-ua-host"

# FP16 is a hyperparameter which can increase training speed and reduce memory consumption, but only on GPU and if batch-size > 8, see here: https://huggingface.co/transformers/performance.html?#fp16
# FP16 does not work on CPU or for multilingual mDeBERTa models
fp16_bool = True if torch.cuda.is_available() else False
if "mdeberta" in model_name.lower(): fp16_bool = False  # multilingual mDeBERTa does not support FP16 yet: https://github.com/microsoft/DeBERTa/issues/77
# in case of hyperparameter search end the end: FP16 has to be set to False. The integrated hyperparameter search with the Hugging Face Trainer can lead to errors otherwise.
fp16_bool = False

# Hugging Face tipps to increase training speed and decrease out-of-memory (OOM) issues: https://huggingface.co/transformers/performance.html?
# Overview of all training arguments: https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments
train_args = TrainingArguments(
    output_dir=f'./results/{training_directory}',
    logging_dir=f'./logs/{training_directory}',
    learning_rate=2e-5,
    per_device_train_batch_size=8,  # if you get an out-of-memory error, reduce this value to 8 or 4 and restart the runtime. Higher values increase training speed, but also increase memory requirements. Ideal values here are always a multiple of 8.
    per_device_eval_batch_size=32,  # if you get an out-of-memory error, reduce this value, e.g. to 40 and restart the runtime
    #gradient_accumulation_steps=4, # Can be used in case of memory problems to reduce effective batch size. accumulates gradients over X steps, only then backward/update. decreases memory usage, but also slightly speed. (!adapt/halve batch size accordingly)
    num_train_epochs=10,  # this can be increased, but higher values increase training time. Good values for NLI are between 3 and 20.
    warmup_ratio=0.25,  # a good normal default value is 0.06 for normal BERT-base models, but since we want to reuse prior NLI knowledge and avoid catastrophic forgetting, we set the value higher
    weight_decay=0.1,
    seed=SEED_GLOBAL,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    fp16=fp16_bool,  # Can speed up training and reduce memory consumption, but only makes sense at batch-size > 8. loads two copies of model weights, which creates overhead. https://huggingface.co/transformers/performance.html?#fp16
    fp16_full_eval=fp16_bool,
    evaluation_strategy="no", # options: "no"/"steps"/"epoch"
    #eval_steps=10_000,  # evaluate after n steps if evaluation_strategy!='steps'. defaults to logging_steps
    save_strategy = "no",  # options: "no"/"steps"/"epoch"
    #save_steps=10_000,              # Number of updates steps before two checkpoint saves.
    #save_total_limit=10,             # If a value is passed, will limit the total amount of checkpoints. Deletes the older checkpoints in output_dir
    #logging_strategy="steps",
    report_to="all",  # "all"  # logging
    #push_to_hub=False,
    #push_to_hub_model_id=f"{model_name}-finetuned-{task}",
)


In [None]:
# helper function to clean memory and reduce risk of out-of-memory error
import gc
def clean_memory():
  #del(model)
  if torch.cuda.is_available():
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()
  gc.collect()

clean_memory()

### Custom function to compute metrics for NLI

We multiplied each text N times for each class in the test set and NLI can only predict 2 or 3 classes: true/not-true or true/neutral/false. This means that we cannot use standard functions for computing metrics. The following function reformats the model's output in a way that allows for the calculation of standard metrics like accuracy, F1-macro etc.

In [None]:
from sklearn.metrics import balanced_accuracy_score, precision_recall_fscore_support, accuracy_score, classification_report

def compute_metrics_nli_binary(eval_pred, label_text_alphabetical=None):
    predictions, labels = eval_pred

    ### reformat model output to enable calculation of standard metrics
    # split in chunks with predictions for each hypothesis for one unique premise
    def chunks(lst, n):  # Yield successive n-sized chunks from lst. https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
        for i in range(0, len(lst), n):
            yield lst[i:i + n]

    # for each chunk/premise, select the most likely hypothesis
    softmax = torch.nn.Softmax(dim=1)
    prediction_chunks_lst = list(chunks(predictions, len(set(label_text_alphabetical)) ))
    hypo_position_highest_prob = []
    for i, chunk in enumerate(prediction_chunks_lst):
        hypo_position_highest_prob.append(np.argmax(np.array(chunk)[:, 0]))  # only accesses the first column of the array, i.e. the entailment/true prediction logit of all hypos and takes the highest one

    label_chunks_lst = list(chunks(labels, len(set(label_text_alphabetical)) ))
    label_position_gold = []
    for chunk in label_chunks_lst:
        label_position_gold.append(np.argmin(chunk))  # argmin to detect the position of the 0 among the 1s

    print("Highest probability prediction per premise: ", hypo_position_highest_prob)
    print("Correct label per premise: ", label_position_gold)

    ### calculate standard metrics
    precision_macro, recall_macro, f1_macro, _ = precision_recall_fscore_support(label_position_gold, hypo_position_highest_prob, average='macro')  # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
    precision_micro, recall_micro, f1_micro, _ = precision_recall_fscore_support(label_position_gold, hypo_position_highest_prob, average='micro')  # https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html
    acc_balanced = balanced_accuracy_score(label_position_gold, hypo_position_highest_prob)
    acc_not_balanced = accuracy_score(label_position_gold, hypo_position_highest_prob)
    metrics = {'f1_macro': f1_macro,
               'f1_micro': f1_micro,
               'accuracy_balanced': acc_balanced,
               'accuracy_not_b': acc_not_balanced,
               #'precision_macro': precision_macro,
               #'recall_macro': recall_macro,
               #'precision_micro': precision_micro,
               #'recall_micro': recall_micro,
               #'label_gold_raw': label_position_gold,
               #'label_predicted_raw': hypo_position_highest_prob
               }
    print("Aggregate metrics: ", {key: metrics[key] for key in metrics if key not in ["label_gold_raw", "label_predicted_raw"]} )  # print metrics but without label lists
    print("Detailed metrics: ", classification_report(label_position_gold, hypo_position_highest_prob, labels=np.sort(pd.factorize(label_text_alphabetical, sort=True)[0]), target_names=label_text_alphabetical, sample_weight=None, digits=2, output_dict=True,
                                zero_division='warn'), "\n")
    return metrics

# Create alphabetically ordered list of the original dataset classes/labels
# This is necessary to be sure that the ordering of the test set labels and predictions is the same. Otherwise there is a risk that labels and predictions are in a different order and resulting metrics are wrong.
label_text_alphabetical = np.sort(df_train.label_text.unique())


### Fine-tuning and evaluation  (IGNORE THIS AND PROCEED TO HP SEARCH)

Let's start fine-tuning the model!

If you get an 'out-of-memory' error, reduce the 'per_device_train_batch_size' to 8 or 4 in the TrainingArguments above and restart the runtime. If you don't restart your runtime (menu to the to left 'Runtime' > 'Restart runtime') and rerun the entire script, the 'out-of-memory' error will probably not go away.

In [None]:
# training
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset["train"],  #.shard(index=1, num_shards=100),  # could shard data for faster testing https://huggingface.co/docs/datasets/processing.html#sharding-the-dataset-shard
    eval_dataset=dataset["test"],  #.shard(index=1, num_shards=100),
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
)

trainer.train()


In [None]:
## Evaluate the fine-tuned model on the held-out test set
results = trainer.evaluate()

In [None]:
print(results)

## Hyperparameter Search

To increase performance, you can also conduct a hyperparameter search (hp-search), to try and find the best hyperparameters for your specific task and dataset. The trade-off is that hp-search is very compute intensive, but finding better hyperparameters for your task can increase performance. Make sure to conduct hp-search on a sub-set of the training set (i.e. validation set) and not the final test set to avoid data leakage of the test set before final testing.

Note that for small datasets, running the hp-search only on one train-validation split is not ideal. For datasets with less than around 2000 training data points, we recommend running the hp-search on two different random train-validation split. We implemented this for our paper, but not in this notebook as this would make the code harder to understand.

Documentation with more information on hp-search with Hugging Face Transformers is available [here](https://huggingface.co/docs/transformers/main/hpo_train).

In [None]:
## train-validation split - test set should not be visible during hp-search
# https://huggingface.co/docs/datasets/v2.5.1/en/package_reference/main_classes#datasets.Dataset.train_test_split

# the ideal size of the validation set depends on the size of your training data. Each label should have at the very least a few dozen examples in the validation set (ideally several hundred)
validation_set_size = 0.4  # for a training data size of 1000 with 3 classes we use 40% of the training data for validating hyperparameters

# reformatting of label column to enable dataset stratification
from datasets import ClassLabel
new_features = dataset["train"].features.copy()
if len(model.config.label2id.keys()) == 2:  # for 2-class NLI model
  label_names = ["entailment", "not-entailment"]
elif len(model.config.label2id.keys()) == 3:  # for 3-class NLI model
  label_names = ["entailment", "neutral", "contradiction"]
new_features['label'] = ClassLabel(names=label_names)
print(len(dataset['train']))
dataset = dataset.cast(new_features)
print(len(dataset['train']))

# train-validation split for hp-search
dataset_hp = dataset["train"].train_test_split(test_size=validation_set_size, seed=SEED_GLOBAL, shuffle=True, stratify_by_column="label")
print(dataset_hp)

2836


Casting the dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

Casting the dataset:   0%|          | 0/1 [00:00<?, ?ba/s]

1000
DatasetDict({
    train: Dataset({
        features: ['thread_title', 'text', 'label_text', 'label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 600
    })
    test: Dataset({
        features: ['thread_title', 'text', 'label_text', 'label', 'text_prepared', 'hypothesis', 'label_nli_explicit', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 400
    })
})


In [None]:
## Reinitialize trainer for hp-search
# https://discuss.huggingface.co/t/using-hyperparameter-search-in-trainer/785/10

def model_init():
  clean_memory()
  return AutoModelForSequenceClassification.from_pretrained(model_name).to(device)  # return_dict=True

trainer = Trainer(
    model_init=model_init,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset_hp["train"],
    eval_dataset=dataset_hp["test"],
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
);


loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

**Define the hyperparameters you want to optimise**

For a detailed discussion of different hyperparameters, see the appendix of our paper.

In [None]:
# we use Optuna for hp-search: https://optuna.readthedocs.io/en/stable/
def my_hp_space(trial):
    return {
        "learning_rate": trial.suggest_categorical("learning_rate", [9e-6, 2e-5, 4e-5]),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 4, 16, log=False, step=2),   # increasing the maximum number of epochs here could increase performance but will take (much) longer to train
        "warmup_ratio": trial.suggest_float("warmup_ratio", 0.1, 0.6, log=True),
        "per_device_train_batch_size": 8,  # lower this value in case of out-of-memory errors and restart the runtime
        #"per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [8, 16, 32]),
    }


**Run HP search!**

Choose the number of hyperparameter configurations you want to test. In our experiments we found that after 10 to 15 trials with around 4 hyperparameters, performance is unlikely to increase meaningfully. 15 trials seems to be a safe value, but can take a while to run.

In [None]:
import optuna

# number of differen hp configurations to test
numer_of_trials = 16  # increasing this value can lead to better hyperparameters, but will take longer
# chose the sampler for sampling hp configurations
optuna_sampler = optuna.samplers.TPESampler(seed=SEED_GLOBAL, consider_prior=True, prior_weight=1.0, consider_magic_clip=True, consider_endpoints=False, n_startup_trials=numer_of_trials/2, n_ei_candidates=24, multivariate=False, group=False, warn_independent_sampling=True, constant_liar=False)  # https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.TPESampler.html#optuna.samplers.TPESampler

# Hugging Face Documentation: https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.hyperparameter_search
best_run = trainer.hyperparameter_search(
    n_trials=numer_of_trials,
    direction="maximize",
    hp_space=my_hp_space,
    backend='optuna',
    **{"sampler": optuna_sampler}
)

[32m[I 2024-04-17 07:52:48,293][0m A new study created in memory with name: no-name-1ee5a3c3-5be0-480b-8af1-5b4a7a87b3d6[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.13225317293225414}
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_em

Step,Training Loss
500,0.7212
1000,0.2953
1500,0.0864




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32


[32m[I 2024-04-17 07:59:49,295][0m Trial 0 finished with value: 2.5135216335407433 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.13225317293225414}. Best is trial 0 with value: 2.5135216335407433.[0m
Trial: {'learning_rate': 4e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.3556211334079358}


Highest probability prediction per premise:  [0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7536
1000,0.41
1500,0.1006




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:06:41,723][0m Trial 1 finished with value: 2.4604013878981843 and parameters: {'learning_rate': 4e-05, 'num_train_epochs': 12, 'warmup_ratio': 0.3556211334079358}. Best is trial 0 with value: 2.5135216335407433.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.13851197621057668}


Highest probability prediction per premise:  [0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6748




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:10:00,642][0m Trial 2 finished with value: 2.5470892676469097 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.13851197621057668}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 4e-05, 'num_train_epochs': 10, 'warmup_ratio': 0.16850792067367906}


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7053
1000,0.3008
1500,0.0699




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:15:35,349][0m Trial 3 finished with value: 2.4604013878981843 and parameters: {'learning_rate': 4e-05, 'num_train_epochs': 10, 'warmup_ratio': 0.16850792067367906}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 8, 'warmup_ratio': 0.22640782282355432}


Highest probability prediction per premise:  [0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7817
1000,0.4639




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:19:57,018][0m Trial 4 finished with value: 2.407266598176106 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 8, 'warmup_ratio': 0.22640782282355432}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.10867895322868468}


Highest probability prediction per premise:  [1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7576
1000,0.4419
1500,0.1987




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:26:41,668][0m Trial 5 finished with value: 2.4938146153317664 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.10867895322868468}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5641671230331442}


Highest probability prediction per premise:  [1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.9334
1000,0.5888
1500,0.4299
2000,0.1161




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:35:27,115][0m Trial 6 finished with value: 2.5135216335407433 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 16, 'warmup_ratio': 0.5641671230331442}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.22004181237640907}


Highest probability prediction per premise:  [1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.8168
1000,0.4936
1500,0.2206




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:42:04,604][0m Trial 7 finished with value: 2.52933224770544 and parameters: {'learning_rate': 9e-06, 'num_train_epochs': 12, 'warmup_ratio': 0.22004181237640907}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.10136148299152041}


Highest probability prediction per premise:  [0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6374




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:44:24,049][0m Trial 8 finished with value: 2.5312282946301496 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.10136148299152041}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.3444424132703427}


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.7034




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:46:39,306][0m Trial 9 finished with value: 2.5037453277725468 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.3444424132703427}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.16282832714933196}


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6852




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:49:58,959][0m Trial 10 finished with value: 2.5135216335407433 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.16282832714933196}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.10166866394803553}


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6392




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:52:12,371][0m Trial 11 finished with value: 2.5312282946301496 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.10166866394803553}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.1411231410646433}


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6798




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:55:35,626][0m Trial 12 finished with value: 2.511574255497903 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.1411231410646433}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.10075581658748772}


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6724




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 08:58:53,948][0m Trial 13 finished with value: 2.3639925895982756 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 6, 'warmup_ratio': 0.10075581658748772}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.18346456415665668}


Highest probability prediction per premise:  [1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6702




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 09:01:07,439][0m Trial 14 finished with value: 2.4170574480238214 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 4, 'warmup_ratio': 0.18346456415665668}. Best is trial 2 with value: 2.5470892676469097.[0m
Trial: {'learning_rate': 2e-05, 'num_train_epochs': 8, 'warmup_ratio': 0.12716540662926828}


Highest probability prediction per premise:  [1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--MoritzLaurer--mDeBERTa-v3-base-xnli-multilingual-nli-2mil7/snapshots/b5113eb38ab63efdd7f280f8c144ea8b13f978ce/config.json
Model config DebertaV2Config {
  "_name_or_path": "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_

Step,Training Loss
500,0.6965
1000,0.2803




Training completed. Do not forget to share your model on huggingface.co/models =)


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 400
  Batch size = 32
[32m[I 2024-04-17 09:05:33,645][0m Trial 15 finished with value: 2.6259969929485414 and parameters: {'learning_rate': 2e-05, 'num_train_epochs': 8, 'warmup_ratio': 0.12716540662926828}. Best is trial 15 with value: 2.6259969929485414.[0m


Highest probability prediction per premise:  [1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1]
Correct label per premise:  [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0

In [None]:
# show best hyperparameters based on hp-search
print(best_run)

BestRun(run_id='15', objective=2.6259969929485414, hyperparameters={'learning_rate': 2e-05, 'num_train_epochs': 8, 'warmup_ratio': 0.12716540662926828})


**Training Time with optimised hyperparameters!**

Here we can use the original train and test set again.

In [None]:
# update the training arguments with the best hyperparameters
for k,v in best_run.hyperparameters.items():
    setattr(train_args, k, v)
print("\n", train_args)

# hp-search with hf causes errors with FP16 for some reason
#setattr(train_args, "fp16", False)
#setattr(train_args, "fp16_full_eval", False)


 TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_n

In [None]:
dataset = datasets.DatasetDict({
    "train": datasets.Dataset.from_pandas(df_train_formatted),
    "test": datasets.Dataset.from_pandas(df_test_formatted)
})

# tokenize
def tokenize_nli_format(examples):
  return tokenizer(examples["text_prepared"], examples["hypothesis"], truncation=True, max_length=512)  #512 max_length can be reduced to e.g. 256 to increase speed, but long texts will be cut off
dataset["train"] = dataset["train"].map(tokenize_nli_format, batched=True)
dataset["test"] = dataset["test"].map(tokenize_nli_format, batched=True)

# remove unnecessary columns for model training

dataset = dataset.remove_columns(['id', 'subreddit', 'convo', 'random_speaker', 'Parasocial_Language', '__index_level_0__', ])

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [None]:
# Training
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=train_args,
    train_dataset=dataset["train"],  #.shard(index=1, num_shards=100),  # https://huggingface.co/docs/datasets/processing.html#sharding-the-dataset-shard
    eval_dataset=dataset["test"],  #.shard(index=1, num_shards=100),
    compute_metrics=lambda eval_pred: compute_metrics_nli_binary(eval_pred, label_text_alphabetical=label_text_alphabetical)
)

trainer.train()


The following columns in the training set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 2836
  Num Epochs = 8
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 5672
You're using a DebertaV2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  attention_scores = torch.bmm(query_layer, key_layer.transpose(-1, -2)) / torch.tensor(
  score += c2p_att 

Step,Training Loss
500,0.0944
1000,0.1686
1500,0.1661


In [None]:
## Evaluate the fine-tuned model on the held-out test set
results = trainer.evaluate()


The following columns in the evaluation set don't have a corresponding argument in `DebertaV2ForSequenceClassification.forward` and have been ignored: thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared. If thread_title, label_text, label_nli_explicit, text, hypothesis, text_prepared are not expected by `DebertaV2ForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 800
  Batch size = 32


Highest probability prediction per premise:  [0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 

In [None]:
print(results)

{'eval_loss': 2.1639678478240967, 'eval_f1_macro': 0.6690538194444445, 'eval_f1_micro': 0.695, 'eval_accuracy_balanced': 0.6658219828951537, 'eval_accuracy_not_b': 0.695, 'eval_runtime': 6.5102, 'eval_samples_per_second': 122.884, 'eval_steps_per_second': 3.84, 'epoch': 8.0}


## Save and load your fine-tuned model

This segment provides code for saving the model to your hard-disk or for uploading it to the Hugging Face hub.

In [None]:
## first you need to connect to your google drive with your google account
from google.colab import drive
import os
#drive.flush_and_unmount()

# insert the path where you want to save the model
os.chdir("/content/results/")
print(os.getcwd())


/content/results


In [None]:
model_custom_path = "content/results"
trainer.save_model(output_dir=model_custom_path)

Saving model checkpoint to content/results
Configuration saved in content/results/config.json
Model weights saved in content/results/pytorch_model.bin
tokenizer config file saved in content/results/tokenizer_config.json
Special tokens file saved in content/results/special_tokens_map.json


In [None]:
### Push to hub
# install necessary dependencies
!sudo apt-get install git-lfs
!huggingface-cli login

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: Traceback (most recent call last):
  File "/usr/lib/python3.10/getpass.py", line 77, in unix_get

In [None]:
# load your models from disk
model = AutoModelForSequenceClassification.from_pretrained(model_custom_path)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, model_max_length=512)  # we load the tokenizer from the original BERT-NLI model

# https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.push_to_hub
repo_id = 'enter your repo id'
model.push_to_hub(repo_id=repo_id)
tokenizer.push_to_hub(repo_id=repo_id)


loading configuration file content/results/config.json
Model config DebertaV2Config {
  "_name_or_path": "content/results",
  "architectures": [
    "DebertaV2ForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "entailment",
    "1": "neutral",
    "2": "contradiction"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "contradiction": 2,
    "entailment": 0,
    "neutral": 1
  },
  "layer_norm_eps": 1e-07,
  "max_position_embeddings": 512,
  "max_relative_positions": -1,
  "model_type": "deberta-v2",
  "norm_rel_ebd": "layer_norm",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_dropout": 0,
  "pooler_hidden_act": "gelu",
  "pooler_hidden_size": 768,
  "pos_att_type": [
    "p2c",
    "c2p"
  ],
  "position_biased_input": false,
  "position_buckets": 256,
  "relative_attention": true,
  "share_att

HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'enter your repo id'.