# Clickbait Challenge at SemEval 2023 - Clickbait Spoiling

https://pan.webis.de/semeval23/pan23-web/clickbait-challenge.html

Clickbait posts link to web pages and advertise their content by arousing curiosity instead of providing informative summaries. Clickbait spoiling aims at generating short texts that satisfy the curiosity induced by a clickbait post.

##Tasks
###Task 1: Spoiler Type Classification
The input is the clickbait post and the linked document. The task is to classify the spoiler type that the clickbait post warrants (either "phrase", "passage", "multi"). For each input, an output like `{"uuid": "<UUID>", "spoilerType": "<SPOILER-TYPE>"}` has to be generated where `<SPOILER-TYPE>` is either `phrase`, `passage`, or `multi`.

###Task 2: Spoiler Generation
The input is the clickbait post and the linked document (and, optional, the spoiler type if your approach uses this field). The task is to generate the spoiler for the clickbait post. For each input, an output like `{"uuid": "<UUID>", "spoiler": "<SPOILER>"}` has to be generated where `<SPOILER>` is the spoiler for the clickbait post.

For each entry in the training and validation dataset, the following fields are available (https://aclanthology.org/2022.acl-long.484.pdf):

* ``uuid``: The uuid of the dataset entry.
* ``postText``: The text of the clickbait post which is to be spoiled.
* ``targetParagraphs``: The main content of the linked web page to classify the spoiler type (task 1) and to generate the spoiler (task 2). Consists of the paragraphs of manually extracted main content.
* ``targetTitle``: The title of the linked web page to classify the spoiler type (task 1) and to generate the spoiler (task 2).
* ``targetUrl``: The URL of the linked web page.
* ``humanSpoiler``: The human generated spoiler (abstractive) for the clickbait post from the linked web page. This field is only available in the training and validation dataset (not during test).
* ``spoiler``: The human extracted spoiler for the clickbait post from the linked web page. This field is only available in the training and validation dataset (not during test).
* ``spoilerPositions``: The position of the human extracted spoiler for the clickbait post from the linked web page. This field is only available in the training and validation dataset (not during test).
* ``tags``: The spoiler type (might be "phrase", "passage", or "multi") that is to be classified in task 1 (spoiler type classification). For task 1, this field is only available in the training and validation dataset (not during test). For task 2, this field is always available and can be used.
* Some fields contain additional metainformation about the entry but are unused: postId, postPlatform, targetDescription, targetKeywords, targetMedia.

## First steps


### Installations

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m53.0 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m200.1/200.1 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.4 tokenizers-0.13.3 transformers-4.28.1


In [2]:
!pip install datasets

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting datasets
  Downloading datasets-2.11.0-py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.7/468.7 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess
  Downloading multiprocess-0.70.14-py39-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.9/132.9 kB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting xxhash
  Downloading xxhash-3.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.2/212.2 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp
  Downloading aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m

In [3]:
!pip install evaluate

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting evaluate
  Downloading evaluate-0.4.0-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: evaluate
Successfully installed evaluate-0.4.0


In [4]:
!pip install spacy
!python -m spacy download en_core_web_sm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
2023-04-21 11:17:00.726863: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-21 11:17:04.664130: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-04-21 11:17:04.664758: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at htt

### Imports

In [5]:
import pandas as pd
import json
import random
import numpy as np
import os
import evaluate
import spacy

from datasets import DatasetDict, Dataset, ClassLabel, Features
from datasets import load_dataset, load_metric
from datasets import Dataset, Value

import transformers
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support

import torch
import tensorflow as tf

### Google Drive and GPU

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [8]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)

'Tesla T4'

### Initializations
To choose the general parameters for our model.

In [9]:
# Name of the model chosen
model_checkpoint = "distilbert-base-uncased"

# True if Spacy is applied. False, Spacy is not applied.
spacy_check = True

# True if [CLS] and [SEP] are added mannually. False, the columns are concatenated without special tokens.
tokens_check = False

# True so add_special_tokens = False. False, special tokens are added.
tokenizer_check = False

## Creating the right data

### Load in Pandas Dataframe

In [10]:
# Load the train data
train_data = pd.read_json('/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/data/train.jsonl', lines=True)
train_data = train_data.drop(columns=['postId', 'postPlatform', 'targetDescription', 'targetKeywords', 'targetMedia', 'targetUrl', 'provenance', 'spoiler', 'spoilerPositions'])

# Load the validation data
val_data = pd.read_json('/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/data/validation.jsonl', lines=True)
val_data = val_data.drop(columns=['postId', 'postPlatform', 'targetDescription', 'targetKeywords', 'targetMedia', 'targetUrl', 'provenance', 'spoiler', 'spoilerPositions'])

In [11]:
train_data.head()

Unnamed: 0,uuid,postText,targetParagraphs,targetTitle,tags
0,0af11f6b-c889-4520-9372-66ba25cb7657,"[Wes Welker Wanted Dinner With Tom Brady, But ...",[It’ll be just like old times this weekend for...,"Wes Welker Wanted Dinner With Tom Brady, But P...",[passage]
1,b1a1f63d-8853-4a11-89e8-6b2952a393ec,[NASA sets date for full recovery of ozone hole],[2070 is shaping up to be a great year for Mot...,Hole In Ozone Layer Expected To Make Full Reco...,[phrase]
2,008b7b19-0445-4e16-8f9e-075b73f80ca4,[This is what makes employees happy -- and it'...,"[Despite common belief, money isn't the key to...",Intellectual Stimulation Trumps Money For Empl...,[phrase]
3,31ecf93c-3e21-4c80-949b-aa549a046b93,[Passion is overrated — 7 work habits you need...,"[It’s common wisdom. Near gospel really, and n...","‘Follow your passion’ is wrong, here are 7 hab...",[multi]
4,31b108a3-c828-421a-a4b9-cf651e9ac859,[The perfect way to cook rice so that it's per...,"[Boiling rice may seem simple, but there is a ...",Revealed: The perfect way to cook rice so that...,[phrase]


### Convert to strings

In [12]:
# To strings (run just once)
train_data['tags'] = train_data['tags'].apply(lambda x: ','.join(x))
print(type(train_data.loc[0, 'tags']))
train_data['postText'] = train_data['postText'].apply(lambda x: ','.join(x))
print(type(train_data.loc[0, 'postText']))
train_data['targetParagraphs'] = train_data['targetParagraphs'].apply(lambda x: ','.join(x))
print(type(train_data.loc[0, 'targetParagraphs']))
print(type(train_data.loc[0, 'targetTitle']))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


In [13]:
# To strings (run just once)
val_data['tags'] = val_data['tags'].apply(lambda x: ','.join(x))
print(type(val_data.loc[0, 'tags']))
val_data['postText'] = val_data['postText'].apply(lambda x: ','.join(x))
print(type(val_data.loc[0, 'postText']))
val_data['targetParagraphs'] = val_data['targetParagraphs'].apply(lambda x: ','.join(x))
print(type(val_data.loc[0, 'targetParagraphs']))
print(type(val_data.loc[0, 'targetTitle']))

<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>


In [14]:
train_data.head()

Unnamed: 0,uuid,postText,targetParagraphs,targetTitle,tags
0,0af11f6b-c889-4520-9372-66ba25cb7657,"Wes Welker Wanted Dinner With Tom Brady, But P...",It’ll be just like old times this weekend for ...,"Wes Welker Wanted Dinner With Tom Brady, But P...",passage
1,b1a1f63d-8853-4a11-89e8-6b2952a393ec,NASA sets date for full recovery of ozone hole,2070 is shaping up to be a great year for Moth...,Hole In Ozone Layer Expected To Make Full Reco...,phrase
2,008b7b19-0445-4e16-8f9e-075b73f80ca4,This is what makes employees happy -- and it's...,"Despite common belief, money isn't the key to ...",Intellectual Stimulation Trumps Money For Empl...,phrase
3,31ecf93c-3e21-4c80-949b-aa549a046b93,Passion is overrated — 7 work habits you need ...,"It’s common wisdom. Near gospel really, and no...","‘Follow your passion’ is wrong, here are 7 hab...",multi
4,31b108a3-c828-421a-a4b9-cf651e9ac859,The perfect way to cook rice so that it's perf...,"Boiling rice may seem simple, but there is a v...",Revealed: The perfect way to cook rice so that...,phrase


### Cleaning the texts

In [15]:
# Load Spacy
nlp = spacy.load("en_core_web_sm")

# Set the function to lowercase, remove punctuation and stopwords, and lemmatize.
def clean_text(text):
    doc = nlp(text)
    clean_tokens = [token.lemma_.lower().strip() for token in doc if not token.is_stop and not token.is_punct and token.is_alpha]
    return " ".join(clean_tokens)

In [16]:
if spacy_check == True:
  train_data['targetParagraphs'] = train_data['targetParagraphs'].apply(clean_text)
  val_data['targetParagraphs'] = val_data['targetParagraphs'].apply(clean_text)

### Concatenate the fields

In [17]:
if tokens_check == True:
  train_data['text'] = '[CLS]' + ' ' + train_data['postText'] + ' ' + '[SEP]' + ' ' + train_data['targetParagraphs'] + ' ' + '[SEP]' + ' ' + train_data['targetTitle']
  val_data['text'] = '[CLS]' + ' ' + val_data['postText'] + ' ' + '[SEP]' + ' ' + val_data['targetParagraphs'] + ' ' + '[SEP]' + ' ' + val_data['targetTitle']
else:
  train_data['text'] = train_data['postText'] + ' ' + train_data['targetParagraphs'] + ' ' + train_data['targetTitle']
  val_data['text'] = val_data['postText'] + ' ' + val_data['targetParagraphs'] + ' ' + val_data['targetTitle']

In [18]:
train_data.head()
# val_data.head()

Unnamed: 0,uuid,postText,targetParagraphs,targetTitle,tags,text
0,0af11f6b-c889-4520-9372-66ba25cb7657,"Wes Welker Wanted Dinner With Tom Brady, But P...",like old time weekend tom brady wes welker rev...,"Wes Welker Wanted Dinner With Tom Brady, But P...",passage,"Wes Welker Wanted Dinner With Tom Brady, But P..."
1,b1a1f63d-8853-4a11-89e8-6b2952a393ec,NASA sets date for full recovery of ozone hole,shape great year mother earth nasa scientist p...,Hole In Ozone Layer Expected To Make Full Reco...,phrase,NASA sets date for full recovery of ozone hole...
2,008b7b19-0445-4e16-8f9e-075b73f80ca4,This is what makes employees happy -- and it's...,despite common belief money key employee happi...,Intellectual Stimulation Trumps Money For Empl...,phrase,This is what makes employees happy -- and it's...
3,31ecf93c-3e21-4c80-949b-aa549a046b93,Passion is overrated — 7 work habits you need ...,common wisdom near gospel young people founder...,"‘Follow your passion’ is wrong, here are 7 hab...",multi,Passion is overrated — 7 work habits you need ...
4,31b108a3-c828-421a-a4b9-cf651e9ac859,The perfect way to cook rice so that it's perf...,boiling rice simple fine line cook crunchy gra...,Revealed: The perfect way to cook rice so that...,phrase,The perfect way to cook rice so that it's perf...


### Create new dataframes with the columns we want

In [19]:
train_set = pd.DataFrame(train_data[['uuid', 'text', 'tags']])
# We change the name of 'tags' by 'labels' so that it can be understood by the transformer.
train_set = train_set.rename(columns={'tags': 'labels'})

val_set = pd.DataFrame(val_data[['uuid', 'text', 'tags']])
val_set = val_set.rename(columns={'tags': 'labels'})

val_set, test_set = train_test_split(val_set, test_size=0.2)

# train_set.info()
# train_set.head()
# val_set.info()
# val_set.head()
test_set.info()
test_set.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 160 entries, 21 to 693
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   uuid    160 non-null    object
 1   text    160 non-null    object
 2   labels  160 non-null    object
dtypes: object(3)
memory usage: 5.0+ KB


Unnamed: 0,uuid,text,labels
21,080bd61d-86f7-41f8-801e-3efc956b42aa,Is It Safe To Take Melatonin Pills To Help You...,passage
527,834a9942-a03b-4b60-ab11-12087683d844,What The Heck Was This Smelly Pink Blob Floati...,phrase
105,32b0206a-4e69-4715-a630-82b324609f7c,Just a slight flaw with this argument... presi...,passage
440,890fd675-4629-410b-a574-759848b92dce,Blocking this color light may help you sleep b...,phrase
368,5cf96771-c43e-4589-aeb6-bbc33915f8ff,Well this is awkward (PHOTOS) conventional wis...,passage


### Save as jsonl files to load as datasets type

In [20]:
data_path = '/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/data/'

# save the json data to a file
with open(data_path + 'train_new.jsonl', 'w') as f:
  for index, row in train_set.iterrows():
    json.dump(row.to_dict(), f)
    f.write('\n')

with open(data_path + 'val_new.jsonl', 'w') as f:
  for index, row in val_set.iterrows():
    json.dump(row.to_dict(), f)
    f.write('\n')

with open(data_path + 'test_new.jsonl', 'w') as f:
  for index, row in test_set.iterrows():
    json.dump(row.to_dict(), f)
    f.write('\n')

## Preprocessing the data

In [21]:
datasets = load_dataset('json', data_files = {
    "train": data_path + 'train_new.jsonl',
    "validation": data_path + 'val_new.jsonl',
    "test": data_path + 'test_new.jsonl'
})

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-db29048cde64e73a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...


Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-db29048cde64e73a/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [22]:
datasets

DatasetDict({
    train: Dataset({
        features: ['uuid', 'text', 'labels'],
        num_rows: 3200
    })
    validation: Dataset({
        features: ['uuid', 'text', 'labels'],
        num_rows: 640
    })
    test: Dataset({
        features: ['uuid', 'text', 'labels'],
        num_rows: 160
    })
})

In [None]:
datasets["test"][0]

{'uuid': '563a7892-96ce-4237-ba75-69dec9d496a7',
 'text': '@iamdiddy lost HOW MUCH to Rick Ross playing Craps?! million dollar pal rick ross play game dice yesterday specifically craps make bad diddy end owe ross million dollar rap mogul snap video event instagram caption lose million dollar be nothin suckmydickbitch turn post diddy iou note instagram write puff write contract win rollin dice dreamteam boutdatlife vegas family subsequently post photo caption vegas come meet club rain palm cirocboyz Diddy Loses A Million Dollars To Rick Ross In A Game Of Craps, Treats It Like Chump Change',
 'labels': 'phrase'}

In [23]:
# We convert the labels to ClassLabels (integers)
label_map = ClassLabel(names=['multi', 'passage', 'phrase'])
datasets = datasets.map(lambda example: {'uuid': example['uuid'], 
                                         'text': example['text'], 
                                         'labels': label_map.str2int(example['labels'])})

Map:   0%|          | 0/3200 [00:00<?, ? examples/s]

Map:   0%|          | 0/640 [00:00<?, ? examples/s]

Map:   0%|          | 0/160 [00:00<?, ? examples/s]

### Tokenizing

In [None]:
if tokenizer_check == True:
  tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, add_special_tokens=False)
else:
  tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

In [None]:
def preprocess_function(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

In [None]:
tokenized_datasets = datasets.map(preprocess_function, batched=True)

Map:   0%|          | 0/3200 [00:00<?, ? examples/s]

Map:   0%|          | 0/640 [00:00<?, ? examples/s]

Map:   0%|          | 0/160 [00:00<?, ? examples/s]

## Fine-tunning the model

### Model

In [None]:
# Create a dictionary with the ids and the relevant label.
label_list = ['multi', 'passage', 'phrase']
id2label = {i: label for i, label in enumerate(label_list)}
label2id = {v: k for k, v in id2label.items()}

# Download the model.
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=len(label_list), id2label=id2label, label2id=label2id)

# Set the DataCollator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_projector.weight']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias', 'pre_classifier

### Training arguments

In [24]:
# We set configuration to save the models in the corresponding directory.
configuration = ""
if spacy_check:
  configuration = configuration + "Spacy_"
else:
  configuration = configuration + "False_"

if tokens_check:
  configuration = configuration + "Tokens_"
else:
   configuration = configuration + "False_"

if tokenizer_check:
  configuration = configuration + "Tokenizer_"
else:
   configuration = configuration + "False_"

configuration

'Spacy_False_False_'

In [26]:
model_name = model_checkpoint.split("/")[-1]

training_args = TrainingArguments(
    output_dir= f"/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/models/" + configuration + model_name,
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=10,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    optim="adamw_torch",
    load_best_model_at_end=True,
)

### Metrics

In [None]:
accuracy = evaluate.load("accuracy")

In [None]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

### Trainer

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

In [None]:
trainer.train()

You're using a DistilBertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.892285,0.589063
2,No log,0.755582,0.673438
3,0.840300,0.781053,0.66875
4,0.840300,0.867232,0.664062
5,0.397300,1.028234,0.673438
6,0.397300,1.16611,0.657813
7,0.397300,1.314376,0.66875
8,0.125300,1.479267,0.659375
9,0.125300,1.556238,0.665625
10,0.042700,1.593402,0.6625


TrainOutput(global_step=2000, training_loss=0.3514093017578125, metrics={'train_runtime': 1710.4165, 'train_samples_per_second': 18.709, 'train_steps_per_second': 1.169, 'total_flos': 4239032352768000.0, 'train_loss': 0.3514093017578125, 'epoch': 10.0})

In [None]:
trainer.evaluate()

{'eval_loss': 0.7555822134017944,
 'eval_accuracy': 0.6734375,
 'eval_runtime': 11.4217,
 'eval_samples_per_second': 56.034,
 'eval_steps_per_second': 3.502,
 'epoch': 10.0}

## Testing

In [28]:
tokenizer = AutoTokenizer.from_pretrained(
    f"/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/models/{configuration}{model_name}/checkpoint-2000")
model = AutoModelForSequenceClassification.from_pretrained(
    f"/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/models/{configuration}{model_name}/checkpoint-2000")

num_correct = 0
total_examples = 0
true_labels = []
predicted_labels = []

for example in datasets["test"]:
    inputs = tokenizer(example['text'], return_tensors="pt", truncation=True)
    with torch.no_grad():
        logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    predicted_label = model.config.id2label[predicted_class_id]
    predicted_labels.append(predicted_label)
    true_label = example['labels']
    if isinstance(true_label, int):
        true_label = model.config.id2label[true_label]
    true_labels.append(true_label)
    uuid = example['uuid']
    print(f"UUID: {uuid}", f"True label: {true_label}", f"Predicted label: {predicted_label}")
    if predicted_label == true_label:
        num_correct += 1
    total_examples += 1

# We compute the metrics
accuracy = num_correct / total_examples
precision, recall, f1_score, _ = precision_recall_fscore_support(true_labels, predicted_labels, average='macro')
print(f"Accuracy: {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1-score: {f1_score:.3f}")

UUID: 080bd61d-86f7-41f8-801e-3efc956b42aa True label: passage Predicted label: passage
UUID: 834a9942-a03b-4b60-ab11-12087683d844 True label: phrase Predicted label: passage
UUID: 32b0206a-4e69-4715-a630-82b324609f7c True label: passage Predicted label: passage
UUID: 890fd675-4629-410b-a574-759848b92dce True label: phrase Predicted label: phrase
UUID: 5cf96771-c43e-4589-aeb6-bbc33915f8ff True label: passage Predicted label: passage
UUID: 26c0a5c0-9348-4dd0-9136-9103fd78402a True label: passage Predicted label: multi
UUID: 3f3fe676-0cd0-43f2-9cb5-ec77f7463cd3 True label: passage Predicted label: passage
UUID: 6176bc74-00ba-43e6-860b-234127227d2a True label: phrase Predicted label: phrase
UUID: 9b641d40-8b86-4906-b4c4-be8777f490fd True label: phrase Predicted label: phrase
UUID: ab7c24f3-f96f-4a7a-86b0-2bf3ef63da59 True label: passage Predicted label: passage
UUID: dfd4f870-0370-4889-9e30-979c86a7aeee True label: passage Predicted label: passage
UUID: fe427180-ad89-4e2c-9720-409753aa28f

## Predicting

In [29]:
tokenizer = AutoTokenizer.from_pretrained(
    f"/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/models/{configuration}{model_name}/checkpoint-2000")
model = AutoModelForSequenceClassification.from_pretrained(
    f"/content/drive/My Drive/Colab Notebooks/2023-ILTAPP/APP1_Assignment/models/{configuration}{model_name}/checkpoint-2000")

dataset_to_pred = [
    {"uuid": "24545-445654-78786", "text": "Text to try if this works correctly"},
    {"uuid": "46768-448646-46456", "text": "She went to the beach in February"},
    {"uuid": "79785-113215-49989", "text": "This is the input text for example 246845"}
]

example_dataset = Dataset.from_pandas(pd.DataFrame(dataset_to_pred))

for example in example_dataset:
    inputs = tokenizer(example['text'], return_tensors="pt", max_length=512, truncation=True)
    with torch.no_grad():
        logits = model(**inputs).logits
    predicted_class_id = logits.argmax().item()
    predicted_label = model.config.id2label[predicted_class_id]
    print(f"uuid: {example['uuid']}, spoilerType: {predicted_label}")


uuid: 24545-445654-78786, spoilerType: passage
uuid: 46768-448646-46456, spoilerType: phrase
uuid: 79785-113215-49989, spoilerType: passage
