# NLP Standard Project - Identifying Human Values behind Arguments

*Objective*: given a textual argument and a human value category, classify whether or not the argument draws on that category.

Arguments are given as premise text, conclusion text, and binary
stance of the premise to the conclusion (“in favor of” or “against”). 20 value categories compiled from the social science literature. It’s your choice to focus on one, a subset, or all values in arguments.

## Imports

*N.B.* `zenodo-get` is a Python package that provides a simple way to download data from the Zenodo reository. 

In [1]:
!pip install transformers
!pip install evaluate
!pip install zenodo-get
!zenodo_get 10.5281/zenodo.7550385

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.4-py3-none-any.whl (6.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m39.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 KB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m69.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.4 tokenizers-0.13.3 transformers-4.27.4
Looking in indexes: https://pypi.org/simple, http

In [2]:
import evaluate, torch, shutil, random, re, nltk, transformers
import pandas as pd
import numpy as np
import torch.nn.functional as F
from nltk.corpus import stopwords
from functools import reduce
from transformers import AutoTokenizer, AutoModelForSequenceClassification, EvalPrediction, TrainingArguments, Trainer
from datasets import Dataset, DatasetDict, load_dataset
from typing import List, Callable

We set a random seed to repeat experiments.

In [3]:
random_seed = 42
transformers.set_seed(random_seed)
torch.manual_seed(random_seed)

<torch._C.Generator at 0x7f2eba3690b0>

We check whether a GPU is available on the system and, if so, we run the code on the GPU by assigning `cuda` to the device variable; otherwise, we assign `cpu`.



In [4]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cuda


## Arranging the Data

Once the data have been downloaded from the Zenodo repository, we merge the arguments and their respective labels into a single dataset. We do this for both training and test set.

*N.B.* The test set has not been labelled so we adopt the validation set as test one.

In [5]:
arguments_train_path = 'arguments-training.tsv'
arguments_val_path = 'arguments-validation.tsv'
arguments_test_path = 'arguments-validation-zhihu.tsv'

labels_train_path = 'labels-training.tsv'
labels_val_path = 'labels-validation.tsv'
labels_test_path = 'labels-validation-zhihu.tsv'

In [6]:
df_arguments_train = pd.read_csv(arguments_train_path, sep='\t')
df_labels_train = pd.read_csv(labels_train_path, sep='\t')
df_train = pd.merge(df_arguments_train, df_labels_train, on='Argument ID')

df_arguments_val = pd.read_csv(arguments_val_path, sep='\t')
df_labels_val = pd.read_csv(labels_val_path, sep='\t')
df_val = pd.merge(df_arguments_val, df_labels_val, on='Argument ID')

df_arguments_test = pd.read_csv(arguments_test_path, sep='\t')
df_labels_test = pd.read_csv(labels_test_path, sep='\t')
df_test = pd.merge(df_arguments_test, df_labels_test, on='Argument ID')

In [7]:
df_train.head()

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,A01002,We should ban human cloning,in favor of,we should ban human cloning as it will only ca...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,A01005,We should ban fast food,in favor of,fast food should be banned because it is reall...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,A01006,We should end the use of economic sanctions,against,sometimes economic sanctions are the only thin...,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,A01007,We should abolish capital punishment,against,capital punishment is sometimes the only optio...,0,0,0,0,0,0,...,0,1,0,0,0,0,1,0,0,0
4,A01008,We should ban factory farming,against,factory farming allows for the production of c...,0,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,0


In [8]:
df_val.head()

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,A01001,Entrapment should be legalized,in favor of,if entrapment can serve to more easily capture...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,A01012,The use of public defenders should be mandatory,in favor of,the use of public defenders should be mandator...,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,A02001,Payday loans should be banned,in favor of,payday loans create a more impoverished societ...,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,A02002,Surrogacy should be banned,against,Surrogacy should not be banned as it is the wo...,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,A02009,Entrapment should be legalized,against,entrapment is gravely immoral and against huma...,0,0,0,0,0,0,...,0,1,0,0,0,0,1,0,0,1


In [9]:
df_test.head()

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,C26001,We should abolish the 996 overtime system,in favor of,China's 996 overtime system is very inefficien...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,C26002,We should abolish the 996 overtime system,in favor of,China's 996 overtime system leaves you with no...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,C26003,We should abolish the 996 overtime system,against,"For the poor people, if they can go to the Int...",0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
3,C26004,We should abolish the 996 overtime system,in favor of,China's 996 overtime system violates labor laws.,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,C26005,We should abolish the 996 overtime system,against,Corporate management seeks to maximize profits...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


## Exploring the Data

In [10]:
# check the distribution of the data

## Preprocessing the Data

In [11]:
replace_by_space_re = re.compile('[/(){}\[\]\|@,;]')
good_symbols_re = re.compile('[^0-9a-z #+_]')
replace_multiple_spaces_re = re.compile(' +')
good_stopwords = ['favor','against']

try:
    stopwords = set(stopwords.words('english'))
    print(stopwords)
    stopwords = stopwords - set(good_stopwords) # removing word: against and favor from stopwords
    print(stopwords)
except LookupError:
    nltk.download('stopwords')
    stopwords = set(stopwords.words('english'))
    stopwords = stopwords - set(good_stopwords) # removing word: against and favor from stopwords

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [12]:
def lower(text: str) -> str:
    """
    Transforms given text to lower case.
    """
    return text.lower()
 
def replace_special_characters(text: str) -> str:
    """
    Replaces special characters, such as paranthesis, with spacing character
    """
    return replace_by_space_re.sub(' ', text)

def replace_br(text: str) -> str:
    """
    Replaces br characters
    """
    return text.replace('br', '')

def filter_out_uncommon_symbols(text: str) -> str:
    """
    Removes any special character that is not in the good symbols list (check regular expression)
    """
    return good_symbols_re.sub('', text)
 
def remove_stopwords(text: str) -> str:
    return ' '.join([x for x in text.split() if x and x not in stopwords])
 
def strip_text(text: str) -> str:
    """
    Removes any left or right spacing (including carriage return) from text.
    """
    return text.strip()

def replace_double_spaces(text: str) -> str:
    """
    Replaces special characters, such as paranthesis, with spacing character
    """
    return replace_multiple_spaces_re.sub(' ', text)

In [13]:
preprocessing_pipeline = [
                          lower,
                          replace_special_characters,
                          replace_br,
                          filter_out_uncommon_symbols,
                          #remove_stopwords,
                          #strip_text,
                          #replace_double_spaces
                          ]

def text_prepare(text: str,
                 filter_methods: List[Callable[[str], str]] = None) -> str:
    """
    Applies a list of pre-processing functions in sequence (reduce).
    Note that the order is important here!
    """
    filter_methods = filter_methods if filter_methods is not None else preprocessing_pipeline
    return reduce(lambda txt, f: f(txt), filter_methods, text)

In [14]:
# Replace each sentence with its pre-processed version
df_train['Conclusion'] = df_train['Conclusion'].apply(lambda txt: text_prepare(txt))
df_train['Stance'] = df_train['Stance'].apply(lambda txt: text_prepare(txt))
df_train['Premise'] = df_train['Premise'].apply(lambda txt: text_prepare(txt))

df_val['Conclusion'] = df_val['Conclusion'].apply(lambda txt: text_prepare(txt))
df_val['Stance'] = df_val['Stance'].apply(lambda txt: text_prepare(txt))
df_val['Premise'] = df_val['Premise'].apply(lambda txt: text_prepare(txt))

df_test['Conclusion'] = df_test['Conclusion'].apply(lambda txt: text_prepare(txt))
df_test['Stance'] = df_test['Stance'].apply(lambda txt: text_prepare(txt))
df_test['Premise'] = df_test['Premise'].apply(lambda txt: text_prepare(txt))

In [15]:
df_train.head()

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,A01002,we should ban human cloning,in favor of,we should ban human cloning as it will only ca...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,A01005,we should ban fast food,in favor of,fast food should be banned because it is reall...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,A01006,we should end the use of economic sanctions,against,sometimes economic sanctions are the only thin...,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,A01007,we should abolish capital punishment,against,capital punishment is sometimes the only optio...,0,0,0,0,0,0,...,0,1,0,0,0,0,1,0,0,0
4,A01008,we should ban factory farming,against,factory farming allows for the production of c...,0,0,0,0,0,0,...,0,0,0,0,1,0,1,0,0,0


In [16]:
df_val.head()

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,A01001,entrapment should be legalized,in favor of,if entrapment can serve to more easily capture...,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,A01012,the use of public defenders should be mandatory,in favor of,the use of public defenders should be mandator...,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,A02001,payday loans should be banned,in favor of,payday loans create a more impoverished societ...,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,A02002,surrogacy should be banned,against,surrogacy should not be banned as it is the wo...,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,A02009,entrapment should be legalized,against,entrapment is gravely immoral and against huma...,0,0,0,0,0,0,...,0,1,0,0,0,0,1,0,0,1


In [17]:
df_test.head()

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,C26001,we should abolish the 996 overtime system,in favor of,chinas 996 overtime system is very inefficient...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,C26002,we should abolish the 996 overtime system,in favor of,chinas 996 overtime system leaves you with no ...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,C26003,we should abolish the 996 overtime system,against,for the poor people if they can go to the int...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
3,C26004,we should abolish the 996 overtime system,in favor of,chinas 996 overtime system violates labor laws,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,C26005,we should abolish the 996 overtime system,against,corporate management seeks to maximize profits...,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


### Tokenization

In [None]:
max_length = 94

In [None]:
model_name = "distilbert-base-uncased" #"bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, truncation=True, max_length=max_length)
#add_special_tokens=False
#use_auth_token=True

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

## Define evaluation metrics

In [None]:
columns = df_train.columns.tolist()
columns

['Argument ID',
 'Conclusion',
 'Stance',
 'Premise',
 'Self-direction: thought',
 'Self-direction: action',
 'Stimulation',
 'Hedonism',
 'Achievement',
 'Power: dominance',
 'Power: resources',
 'Face',
 'Security: personal',
 'Security: societal',
 'Tradition',
 'Conformity: rules',
 'Conformity: interpersonal',
 'Humility',
 'Benevolence: caring',
 'Benevolence: dependability',
 'Universalism: concern',
 'Universalism: nature',
 'Universalism: tolerance',
 'Universalism: objectivity']

In [None]:
"""
def tokenize_and_encode(samples):
    #Tokenizes each arguments "Premise"
    print(samples)
    return tokenizer(samples['Premise'], truncation=True)
"""

def tokenize_and_encode(samples):   # max_len = 512
    """Tokenizes each arguments "Premise" """
    #print(type(samples))
    #print(samples.get('Premise'))
    input_list = [samples.get(key) for key in ['Conclusion', 'Stance', 'Premise']]
    #print(input_list)
    input = ' '.join(input_list)
    #print(input)
    return tokenizer(input, truncation=True) #add_special_tokens=False


def convert_to_dataset(train_dataframe, test_dataframe, labels):
    """
        Converts pandas DataFrames into a DatasetDict

        Parameters
        ----------
        train_dataframe : pd.DataFrame
            Arguments to be listed as "train"
        test_dataframe : pd.DataFrame
            Arguments to be listed as "test"
        labels : list[str]
            The labels in both DataFrames

        Returns
        -------
        tuple(DatasetDict, list[str])
            a `DatasetDict` with attributes "train" and "test" for the listed arguments,
            a `list` with the contained labels
        """
#    column_intersect = [x for x in (['Premise'] + labels) if x in train_dataframe.columns.values]

    train_dataset = Dataset.from_dict((df_train[labels]).to_dict('list'))
    test_dataset = Dataset.from_dict((df_test[labels]).to_dict('list')) # QUA C'è IL VALIDATION :)

    ds = DatasetDict()
    ds['train'] = train_dataset
    ds['test'] = test_dataset

    # put the classes in labels
    ds = ds.map(lambda x: {"labels": [float(x[c]) for c in ds['train'].column_names if                  # le labels devono essere float (?)
                                      c not in ['Argument ID', 'Conclusion', 'Stance', 'Premise']]})
    
    #print(ds['train']['labels'])
    cols = ds['train'].column_names
    #print(cols)
    cols.remove('labels')
    #print(cols)

    #print("Sono qui")
    #ds_enc = ds.map(tokenize_and_encode, batched=True, remove_columns=cols)
    ds_enc = ds.map(tokenize_and_encode, remove_columns=cols)

    #print(cols)
  
    cols = [ele for ele in cols if ele not in {'Argument ID', 'Conclusion', 'Stance', 'Premise'}]
    #cols.remove('Premise')
    #print(cols)

    return ds_enc, cols
    #return 1

In [None]:
ds_enc, labels = convert_to_dataset(df_train, df_test, columns)

Map:   0%|          | 0/5393 [00:00<?, ? examples/s]

Map:   0%|          | 0/1896 [00:00<?, ? examples/s]

Map:   0%|          | 0/5393 [00:00<?, ? examples/s]

Map:   0%|          | 0/1896 [00:00<?, ? examples/s]

In [None]:
ds_enc # dataset formattato da dare input al modello

DatasetDict({
    train: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 5393
    })
    test: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 1896
    })
})

In [None]:
# Computing the max length of the token
length = []
for i in range(0,len(ds_enc["train"])):
  length.append(len(ds_enc["train"][i]["input_ids"]))

print("ARGMAX:",np.argmax(length))
print("ARGMIN:",np.argmin(length))
max_length =np.max(length)
print(max_length)

ARGMAX: 4758
ARGMIN: 3208
94


## Training of the model

In [None]:
ds_enc["train"].format

{'type': None,
 'format_kwargs': {},
 'columns': ['labels', 'input_ids', 'attention_mask'],
 'output_all_columns': False}

In [None]:
#ds_enc.set_format("torch")
print(len(ds_enc["train"][0]["input_ids"]))
print(ds_enc["train"][0]["input_ids"])
print(len(tokenizer.decode(ds_enc["train"][0]["input_ids"])))
print(tokenizer.decode(ds_enc["train"][0]["input_ids"]))
print(len(ds_enc["train"]))

19
[101, 7221, 2529, 18856, 13369, 5684, 7221, 2529, 18856, 13369, 3426, 4121, 3314, 9129, 4286, 2770, 2105, 3772, 102]
106
[CLS] ban human cloning favor ban human cloning cause huge issues bunch humans running around acting [SEP]
5393


In [None]:
ds_enc["train"].format

{'type': None,
 'format_kwargs': {},
 'columns': ['labels', 'input_ids', 'attention_mask'],
 'output_all_columns': False}

In [None]:
num_labels = len(labels)

In [None]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, 
                                                           problem_type="multi_label_classification", 
                                                           num_labels=num_labels
                                                           #use_auth_token=True
                                                           )

Downloading pytorch_model.bin:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

In [None]:
model = model.to(device)

In [None]:
model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [None]:
batch_size_train = 128 # max dimension : 128
batch_size_eval = 128

In [None]:
'''
from sklearn.metrics import classification_report

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    
    # Apply a threshold of 0.5 to the predicted labels to obtain binary predictions
    binary_pred_labels = np.where(preds > 0.5, 1, 0)
    
    # Get the list of target names from the label encoder
    target_names = p.label_ids.dtype.names
    
    # Generate a classification report
    report = classification_report(p.label_ids, binary_pred_labels, target_names=target_names, zero_division=1)
    
    return {"classification_report": report}
'''

'\nfrom sklearn.metrics import classification_report\n\ndef compute_metrics(p: EvalPrediction):\n    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions\n    \n    # Apply a threshold of 0.5 to the predicted labels to obtain binary predictions\n    binary_pred_labels = np.where(preds > 0.5, 1, 0)\n    \n    # Get the list of target names from the label encoder\n    target_names = p.label_ids.dtype.names\n    \n    # Generate a classification report\n    report = classification_report(p.label_ids, binary_pred_labels, target_names=target_names, zero_division=1)\n    \n    return {"classification_report": report}\n'

In [None]:
args = TrainingArguments(
                         model_name,
                         evaluation_strategy = "epoch",
                         #save_strategy = "epoch",
                         logging_strategy='epoch',
                         #logging_steps = 100,
                         learning_rate=2e-5, #1e-5
                         per_device_train_batch_size=batch_size_train,
                         per_device_eval_batch_size=batch_size_eval,
                         num_train_epochs=10,
                         save_strategy='no',
                         weight_decay=0.1, #0.01
                         #load_best_model_at_end=True,
                         seed=42,
                         #metric_for_best_model=metric_name,
                         #push_to_hub=True,
                         )

In [None]:
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=ds_enc["train"],
    eval_dataset=ds_enc["test"], # validation set
    tokenizer=tokenizer
)

In [None]:
trainer

<transformers.trainer.Trainer at 0x7f5d09377e80>

In [None]:
trainer.train()

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,0.5441,0.453324
2,0.4375,0.414445
3,0.412,0.39607
4,0.388,0.382573
5,0.3692,0.372705
6,0.3534,0.367469
7,0.3414,0.360666
8,0.3292,0.356246
9,0.3202,0.353014
10,0.3129,0.353194


TrainOutput(global_step=774, training_loss=0.3398753777338861, metrics={'train_runtime': 557.626, 'train_samples_per_second': 174.084, 'train_steps_per_second': 1.388, 'total_flos': 1748439683552760.0, 'train_loss': 0.3398753777338861, 'epoch': 18.0})

In [None]:
print(ds_enc['train']['labels'][2])

[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]


The values of "0.00" in the classification report indicate that the model did not predict any positive examples for those particular classes. In other words, the precision, recall, and F1-score for those classes are all zero because there were no true positives or false positives predicted by the model. This can happen for a variety of reasons, such as a lack of training data for those classes or a lack of predictive power for the features associated with those classes. It's important to consider why the model is failing to predict positive examples for certain classes and whether additional data or feature engineering could improve performance on those classes.

## Evaluation

In [None]:
trainer.evaluate()

{'eval_loss': 0.3533085584640503,
 'eval_runtime': 2.3351,
 'eval_samples_per_second': 811.97,
 'eval_steps_per_second': 6.424,
 'epoch': 18.0}

## Inference

In [None]:
print((ds_enc['train']['input_ids'][2]))
print(tokenizer.decode(ds_enc['train']['input_ids'][2]))

[101, 2203, 2224, 3171, 17147, 2114, 2823, 3171, 17147, 2518, 2131, 13593, 6867, 2202, 2895, 102]
[CLS] end use economic sanctions against sometimes economic sanctions thing get corrupt governments take action [SEP]


In [None]:
id2label = {
 0:'Self-direction: thought',
 1:'Self-direction: action',
 2:'Stimulation',
 3:'Hedonism',
 4:'Achievement',
 5:'Power: dominance',
 6:'Power: resources',
 7:'Face',
 8:'Security: personal',
 9:'Security: societal',
 10:'Tradition',
 11:'Conformity: rules',
 12:'Conformity: interpersonal',
 13:'Humility',
 14:'Benevolence: caring',
 15:'Benevolence: dependability',
 16:'Universalism: concern',
 17:'Universalism: nature',
 18:'Universalism: tolerance',
 19:'Universalism: objectivity'}

In [None]:
example = random.randint(1,5000)
print("Number example:", example)


device = torch.device("cuda")
text = tokenizer.decode(ds_enc['train']['input_ids'][example])
print("Input text:\n", text)
inputs = tokenizer(text, return_tensors = 'pt')
for key in inputs:
    inputs[key] = inputs[key].to(device)

with torch.no_grad():
  outputs = model(inputs['input_ids'])
  outputs = {key: output.cpu() for key, output in outputs.items()}

probabilities = F.softmax(outputs['logits'], dim=1)

threshold = 0.2

predictions = (probabilities > threshold).float()

pred_labels = torch.where(predictions == 1.0)
#trasform in a list of indeces
pred_labels = pred_labels[1].tolist()
pred_labels = [id2label[idx] for idx in pred_labels]
print("The predicted labels are:\n", pred_labels)

true_labels = ds_enc['train']['labels'][example]
indexes = []
index = 0
for i in true_labels:
  if i == 1.0:
    indexes.append(index)
    index += 1
  else:
    index += 1
true_labels = [id2label[idx] for idx in indexes]
print("The actual labels are:\n", true_labels)

Number example: 768
Input text:
 [CLS] ban missionary work favor missionary work dangerous necessary [SEP]
The predicted labels are:
 ['Security: personal', 'Security: societal']
The actual labels are:
 ['Achievement', 'Security: personal', 'Security: societal']


In [None]:
'''
from sklearn.metrics import classification_report

def compute_metrics(p: EvalPrediction):
    
    #EvalPrediction
    #predictions (np.ndarray) — Predictions of the model.
    #label_ids (np.ndarray) — Targets to be matched.
    #inputs (np.ndarray, optional) —
    
    print("inputs: \n", p.inputs)
    print("label_ids: \n", p.label_ids)
    print("predictions: \n", p.predictions)
    print("type_inputs: \n", type(p.inputs))
    print("type_label_ids: \n", type(p.label_ids))
    print("type_predictions: \n", type(p.predictions))
   


    predictions = F.softmax(p.predictions, dim=1)
    print('predictions: \n', predictions)
    # Apply a threshold of 0.2 to the predicted labels to obtain binary predictions
    threshold = 0.2
    predictions = (probabilities > threshold).float()
    predictions = torch.where(predictions == 1.0)
    predictions = predictions[1].tolist()

    # Get the list of target names from the label encoder
    #target_names = p.label_ids.dtype.names
    
    # Generate a classification report
    report = classification_report(y_true=p.label_ids, y_pred=predictions, target_names=labels, zero_division=0)
    
    return {"classification_report": report}
'''

'\nfrom sklearn.metrics import classification_report\n\ndef compute_metrics(p: EvalPrediction):\n    \n    #EvalPrediction\n    #predictions (np.ndarray) — Predictions of the model.\n    #label_ids (np.ndarray) — Targets to be matched.\n    #inputs (np.ndarray, optional) —\n    \n    print("inputs: \n", p.inputs)\n    print("label_ids: \n", p.label_ids)\n    print("predictions: \n", p.predictions)\n    print("type_inputs: \n", type(p.inputs))\n    print("type_label_ids: \n", type(p.label_ids))\n    print("type_predictions: \n", type(p.predictions))\n   \n\n\n    predictions = F.softmax(p.predictions, dim=1)\n    print(\'predictions: \n\', predictions)\n    # Apply a threshold of 0.2 to the predicted labels to obtain binary predictions\n    threshold = 0.2\n    predictions = (probabilities > threshold).float()\n    predictions = torch.where(predictions == 1.0)\n    predictions = predictions[1].tolist()\n\n    # Get the list of target names from the label encoder\n    #target_names = p.l