# Clickbait Challenge at SemEval 2023 - Clickbait Spoiling

Task 1 on Spoiler Type Classification: The input is the clickbait post and the linked document. The task is to classify the spoiler type that the clickbait post warrants (either "phrase", "passage", "multi"). For each input, an output like ```{"uuid": "<UUID>", "spoilerType": "<SPOILER-TYPE>"}``` has to be generated where <SPOILER-TYPE> is either phrase, passage, or multi.
    
For each entry in the training and validation dataset, the following fields are available:

* uuid: The uuid of the dataset entry.
* postText: The text of the clickbait post which is to be spoiled.
* **targetParagraphs**: The main content of the linked web page to classify the spoiler type ***(task 1)*** and to generate the spoiler (task 2). Consists of the paragraphs of manually extracted main content.
* **targetTitle**: The title of the linked web page to classify the spoiler type ***(task 1)*** and to generate the spoiler (task 2).
* targetUrl: The URL of the linked web page.
* humanSpoiler: The human generated spoiler (abstractive) for the clickbait post from the linked web page. This field is only available in the training and validation dataset (not during test).
* spoiler: The human extracted spoiler for the clickbait post from the linked web page. This field is only available in the training and validation dataset (not during test).
* spoilerPositions: The position of the human extracted spoiler for the clickbait post from the linked web page. This field is only available in the training and validation dataset (not during test).
* **tags**: The spoiler type (might be "phrase", "passage", or "multi") that is to be classified in ***task 1*** (spoiler type classification). For task 1, this field is only available in the training and validation dataset (not during test). For task 2, this field is always available and can be used.

Some fields contain additional metainformation about the entry but are unused: postId, postPlatform, targetDescription, targetKeywords, targetMedia.

In [1]:
import pandas as pd
import numpy as np
import torch
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [33]:
from transformers import TrainingArguments, Trainer
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import AutoTokenizer, AutoModelForSequenceClassification
torch.cuda.is_available()

True

### 1. Read data
Only necessary columns + postText and targetParagraphs concatenated + lists to strings 

In [3]:
def create_df_from_jsonl(path):
    df = pd.read_json(path, lines=True)
    df['document'] = df['postText'].apply(', '.join) + df['targetParagraphs'].apply(' '.join)
    df['tags'] = df['tags'].apply(', '.join)
    return df[['document', 'tags']]

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
from sklearn.preprocessing import LabelEncoder

In [5]:
train_df = create_df_from_jsonl('data/train.jsonl')
train_df = train_df[train_df.tags != "multi"]
X_train = list(train_df["document"])
print(X_train[162])
y_train = list(train_df["tags"])
print(y_train[162])
# lb = LabelEncoder()
# y_train = lb.fit_transform(y_train)
y_train=list(pd.get_dummies(y_train,drop_first=True)['phrase'])
print(y_train[162])

Videos show the most delightful protest everAustralians know how to protest. Hundreds of people gathered Saturday local time at Parliament House in Canberra to make their way down a hill in a mass protest roll. The government plans to build a security fence to block access to the hill and other capital grounds. Protesters opposed to the fence rolled down the grassy slope just as many visitors to Parliament House often do. Even dogs got in on the democratic action. The event was organized by Lester Yao, an architect, on Facebook and delightful videos of the roll-a-thon were shared widely on social media. "It was only going to be about 20 friends and families, and now we had more than 600 or 700 people," Yao told the Sydney Morning Herald. "Unfortunately, kids might not be able to do this again and they're just enjoying themselves." The fence became a matter of debate after demonstrators breached security at Parliament House earlier this year. Lawmakers had even tossed around the idea of

In [6]:
validation_df = create_df_from_jsonl('data/validation.jsonl')
validation_df = validation_df[validation_df.tags != "multi"]
X_test = list(validation_df["document"])
y_test = list(validation_df["tags"])
# lb = LabelEncoder()
# y_test = lb.fit_transform(y_test)
y_test=list(pd.get_dummies(y_test,drop_first=True)['phrase'])


In [7]:
X_test = X_train[:1000]
y_test = y_train[:1000]

In [7]:
phrase = len(validation_df[validation_df.tags == "phrase"])
passage = len(validation_df[validation_df.tags == "passage"])
phrase/(phrase+passage)

0.5098934550989346

In [8]:
# Create torch dataset
class Dataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels:
            item["labels"] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

In [9]:
def compute_metrics(p):
    print(type(p))
    pred, labels = p
    pred = np.argmax(pred, axis=1)

    accuracy = accuracy_score(y_true=labels, y_pred=pred)
    recall = recall_score(y_true=labels, y_pred=pred)
    precision = precision_score(y_true=labels, y_pred=pred)
    f1 = f1_score(y_true=labels, y_pred=pred)

    return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}

### BERT

In [26]:
model_name = 'bert-base-uncased'
#model_name = "output\checkpoint-3000"
tokenizer = BertTokenizer.from_pretrained(model_name)
#model = BertForSequenceClassification.from_pretrained(model_name,num_labels=2)
model = AutoModelForSequenceClassification.from_pretrained("output/bert_training")

loading file vocab.txt from cache at C:\Users\Kubi/.cache\huggingface\hub\models--bert-base-uncased\snapshots\0a6aa9128b6194f4f3c4db429b6cb4891cdb421b\vocab.txt
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at None
loading file tokenizer_config.json from cache at C:\Users\Kubi/.cache\huggingface\hub\models--bert-base-uncased\snapshots\0a6aa9128b6194f4f3c4db429b6cb4891cdb421b\tokenizer_config.json
loading configuration file config.json from cache at C:\Users\Kubi/.cache\huggingface\hub\models--bert-base-uncased\snapshots\0a6aa9128b6194f4f3c4db429b6cb4891cdb421b\config.json
Model config BertConfig {
  "_name_or_path": "bert-base-uncased",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps

In [22]:
X_train_tokenized = tokenizer(X_train, truncation=True, padding=True, max_length=512)
X_val_tokenized = tokenizer(X_test, truncation=True, padding=True, max_length=512)


In [23]:
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_test)

In [48]:
args = TrainingArguments(
    output_dir="output",
    num_train_epochs=30,
    per_device_train_batch_size=8
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [28]:
trainer.train()

***** Running training *****
  Num examples = 2641
  Num Epochs = 30
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 9930
  Number of trainable parameters = 109483778


Step,Training Loss
500,0.7034
1000,0.7084
1500,0.7028
2000,0.7016
2500,0.7025
3000,0.7017
3500,0.7019
4000,0.7005
4500,0.7033
5000,0.7008


Saving model checkpoint to output\checkpoint-500
Configuration saved in output\checkpoint-500\config.json
Model weights saved in output\checkpoint-500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-1000
Configuration saved in output\checkpoint-1000\config.json
Model weights saved in output\checkpoint-1000\pytorch_model.bin
Saving model checkpoint to output\checkpoint-1500
Configuration saved in output\checkpoint-1500\config.json
Model weights saved in output\checkpoint-1500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-2000
Configuration saved in output\checkpoint-2000\config.json
Model weights saved in output\checkpoint-2000\pytorch_model.bin
Saving model checkpoint to output\checkpoint-2500
Configuration saved in output\checkpoint-2500\config.json
Model weights saved in output\checkpoint-2500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-3000
Configuration saved in output\checkpoint-3000\config.json
Model weights saved in output\check

TrainOutput(global_step=9930, training_loss=0.7001936652269009, metrics={'train_runtime': 3616.2325, 'train_samples_per_second': 21.91, 'train_steps_per_second': 2.746, 'total_flos': 2.08462889161728e+16, 'train_loss': 0.7001936652269009, 'epoch': 30.0})

In [49]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 657
  Batch size = 8


<class 'transformers.trainer_utils.EvalPrediction'>


{'eval_loss': 1.673085331916809,
 'eval_accuracy': 0.6666666666666666,
 'eval_precision': 0.6920529801324503,
 'eval_recall': 0.6238805970149254,
 'eval_f1': 0.6562009419152276,
 'eval_runtime': 9.93,
 'eval_samples_per_second': 66.163,
 'eval_steps_per_second': 8.359}

In [55]:
model = AutoModelForSequenceClassification.from_pretrained("output/bert-best")

loading configuration file output/bert-best\config.json
Model config BertConfig {
  "_name_or_path": "output/bert-best",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading weights file output/bert-best\pytorch_model.bin
All model checkpoint weights were used when initializing BertForSequenceClassification.

All the weights of BertForSequenceClassification

In [56]:
from transformers import pipeline
evaluate_model = pipeline('text-classification', model=model, tokenizer=tokenizer)

In [57]:
text1 = "This Is How Many People Police Have Killed So Far In 2016In the first half of 2016, police have killed 532 people — many of whom were unarmed, mentally ill, and people of color. Going by the Going by the Guardian’s count , 261 white people were killed by police — the highest total out of any racial group. But data also shows that black people and Native Americans are being killed at higher rates than any other group. The slight discrepancies in numbers between Killed by Police and The Guardian reflect differences in how each outlet collects data about police killings. Killed by Police is mainly open-sourced and also relies on The slight discrepancies in numbers between Killed by Police and The Guardian reflect differences in how each outlet collects data about police killings. Killed by Police is mainly open-sourced and also relies on corporate news reports for its data on people killed by police. For its database, The Guardian relies on traditional reporting on police reports and witness statements, while also culling data from verified crowdsourced information using regional news outlets, research groups, and reporting projects that include Killed by Police. There has always been a high volume of police killings, although damning videos, photos, and news reports highlight officer violence — especially against people of color — now more than ever. But what’s become an even more alarming trend is the number of officers involved in these killings who receive minor to no punishment. According to the According to the Wall Street Journal , 2015 saw the highest number of police officers being charged for deadly, on-duty shootings in a decade: 12 as of September 2015. Still, in a year when approximately 1,200 people were killed by police, zero officers were convicted of murder or manslaughter, painting the picture that officers involved in killing another person will not be held accountable for their actions."
text0 ="Videos show the most delightful protest everAustralians know how to protest. Hundreds of people gathered Saturday local time at Parliament House in Canberra to make their way down a hill in a mass protest roll. The government plans to build a security fence to block access to the hill and other capital grounds. Protesters opposed to the fence rolled down the grassy slope just as many visitors to Parliament House often do. Even dogs got in on the democratic action. The event was organized by Lester Yao, an architect, on Facebook and delightful videos of the roll-a-thon were shared widely on social media. It was only going to be about 20 friends and families, and now we had more than 600 or 700 people, Yao told the Sydney Morning Herald. Unfortunately, kids might not be able to do this again and they're just enjoying themselves. The fence became a matter of debate after demonstrators breached security at Parliament House earlier this year. Lawmakers had even tossed around the idea of digging a moat around the slope, but that was sanely rejected."

In [59]:
evaluate_model(text0)

[{'label': 'LABEL_0', 'score': 0.9987004995346069}]

## RoBERTa

In [52]:
model_name = 'roberta-large-mnli'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,num_labels=2, ignore_mismatched_sizes=True)

Could not locate the tokenizer configuration file, will try to use the model config instead.
loading configuration file config.json from cache at C:\Users\Kubi/.cache\huggingface\hub\models--roberta-large-mnli\snapshots\0dcbcf20673c006ac2d1e324954491b96f0c0015\config.json
Model config RobertaConfig {
  "_name_or_path": "roberta-large-mnli",
  "_num_labels": 3,
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "CONTRADICTION",
    "1": "NEUTRAL",
    "2": "ENTAILMENT"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "CONTRADICTION": 0,
    "ENTAILMENT": 2,
    "NEUTRAL": 1
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_tok

In [61]:
X_train_tokenized = tokenizer(X_train, truncation=True, padding=True, max_length=512)
X_val_tokenized = tokenizer(X_test, truncation=True, padding=True, max_length=512)

In [62]:
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_test)

In [63]:
args = TrainingArguments(
    output_dir="output",
    num_train_epochs=15,
    per_device_train_batch_size=8
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [53]:
trainer.train()

***** Running training *****
  Num examples = 2641
  Num Epochs = 15
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 4965
  Number of trainable parameters = 355361794


Step,Training Loss
500,0.7086
1000,0.7068
1500,0.7003
2000,0.7144
2500,0.706
3000,0.702
3500,0.6979
4000,0.699
4500,0.698


Saving model checkpoint to output\checkpoint-500
Configuration saved in output\checkpoint-500\config.json
Model weights saved in output\checkpoint-500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-1000
Configuration saved in output\checkpoint-1000\config.json
Model weights saved in output\checkpoint-1000\pytorch_model.bin
Saving model checkpoint to output\checkpoint-1500
Configuration saved in output\checkpoint-1500\config.json
Model weights saved in output\checkpoint-1500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-2000
Configuration saved in output\checkpoint-2000\config.json
Model weights saved in output\checkpoint-2000\pytorch_model.bin
Saving model checkpoint to output\checkpoint-2500
Configuration saved in output\checkpoint-2500\config.json
Model weights saved in output\checkpoint-2500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-3000
Configuration saved in output\checkpoint-3000\config.json
Model weights saved in output\check

TrainOutput(global_step=4965, training_loss=0.7031860474493329, metrics={'train_runtime': 965.5412, 'train_samples_per_second': 41.029, 'train_steps_per_second': 5.142, 'total_flos': 2667935655185220.0, 'train_loss': 0.7031860474493329, 'epoch': 15.0})

In [64]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8


<class 'transformers.trainer_utils.EvalPrediction'>


{'eval_loss': 0.7037405967712402,
 'eval_accuracy': 0.528,
 'eval_precision': 0.5261569416498993,
 'eval_recall': 0.9980916030534351,
 'eval_f1': 0.6890645586297759,
 'eval_runtime': 3.7893,
 'eval_samples_per_second': 263.904,
 'eval_steps_per_second': 32.988}

## DeBERTa

In [14]:
model_name = 'microsoft/deberta-large-mnli'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name,num_labels=2, ignore_mismatched_sizes=True)

Some weights of the model checkpoint at microsoft/deberta-large-mnli were not used when initializing DebertaForSequenceClassification: ['config']
- This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaForSequenceClassification were not initialized from the model checkpoint at microsoft/deberta-large-mnli and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([3, 1024]) in the checkpoint and torch.Size([2, 1024]) in the model instantiated
- classifier.bias: found shape torch.Size(

In [21]:
X_train_tokenized = tokenizer(X_train, truncation=True, padding=True, max_length=512)
X_val_tokenized = tokenizer(X_test, truncation=True, padding=True, max_length=512)

In [22]:
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_test)

In [23]:
args = TrainingArguments(
    output_dir="output",
    num_train_epochs=5,
    per_device_train_batch_size=1
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).


In [16]:
trainer.train()

***** Running training *****
  Num examples = 2641
  Num Epochs = 5
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 13205
  Number of trainable parameters = 406214658


Step,Training Loss
500,0.9208
1000,0.8098
1500,1.0695
2000,1.3262
2500,1.3339
3000,1.2096
3500,1.1289


Saving model checkpoint to output\checkpoint-500
Configuration saved in output\checkpoint-500\config.json
Model weights saved in output\checkpoint-500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-1000
Configuration saved in output\checkpoint-1000\config.json
Model weights saved in output\checkpoint-1000\pytorch_model.bin
Saving model checkpoint to output\checkpoint-1500
Configuration saved in output\checkpoint-1500\config.json
Model weights saved in output\checkpoint-1500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-2000
Configuration saved in output\checkpoint-2000\config.json
Model weights saved in output\checkpoint-2000\pytorch_model.bin
Saving model checkpoint to output\checkpoint-2500
Configuration saved in output\checkpoint-2500\config.json
Model weights saved in output\checkpoint-2500\pytorch_model.bin
Saving model checkpoint to output\checkpoint-3000
Configuration saved in output\checkpoint-3000\config.json
Model weights saved in output\check

KeyboardInterrupt: 

In [24]:
trainer.evaluate()

***** Running Evaluation *****
  Num examples = 1000
  Batch size = 8


<class 'transformers.trainer_utils.EvalPrediction'>


{'eval_loss': 1.2268038988113403,
 'eval_accuracy': 0.524,
 'eval_precision': 0.524,
 'eval_recall': 1.0,
 'eval_f1': 0.6876640419947507,
 'eval_runtime': 4.5971,
 'eval_samples_per_second': 217.526,
 'eval_steps_per_second': 27.191}

In [15]:
model = AutoModelForSequenceClassification.from_pretrained("output\checkpoint-1000")

## Better way


In [12]:
from datasets import load_dataset,Dataset,DatasetDict
from transformers import DataCollatorWithPadding,AutoModelForSequenceClassification, Trainer, TrainingArguments,AutoTokenizer,AutoModel,AutoConfig
from transformers.modeling_outputs import TokenClassifierOutput
import torch
import torch.nn as nn
import pandas as pd

In [13]:
def create_df_from_jsonl(path):
    df = pd.read_json(path, lines=True)
    df['input'] = df['postText'].apply(', '.join) + df['targetParagraphs'].apply(' '.join)
    df['label'] = df['tags'].apply(', '.join)
    return df[['input', 'label']]
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

### Data

In [14]:
train_df = create_df_from_jsonl('data/train.jsonl')
train_df = train_df[train_df.label != "multi"]

train_df['label']= pd.get_dummies(train_df['label'],drop_first=True)['phrase']
test_df = create_df_from_jsonl('data/validation.jsonl')
test_df = test_df[test_df.label != "multi"]
test_df['label']= pd.get_dummies(test_df['label'],drop_first=True)['phrase']

train_df = Dataset.from_pandas(train_df)
test_df = Dataset.from_pandas(test_df)

train_df = train_df.remove_columns(['__index_level_0__'])
test_df = test_df.remove_columns(['__index_level_0__'])
data = DatasetDict({
    'train': train_df,
    'test': test_df})
data


DatasetDict({
    train: Dataset({
        features: ['input', 'label'],
        num_rows: 2641
    })
    test: Dataset({
        features: ['input', 'label'],
        num_rows: 657
    })
})

### Model

In [15]:
checkpoint = "roberta-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
tokenizer.model_max_len=512

In [16]:
def tokenize(batch):
    return tokenizer(batch["input"], truncation=True,max_length=512)

tokenized_dataset = data.map(tokenize, batched=True)
tokenized_dataset
tokenized_dataset.set_format("torch",columns=["input_ids", "attention_mask", "label"])
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [17]:
class CustomModel(nn.Module):
    def __init__(self,checkpoint,num_labels): 
        super(CustomModel,self).__init__() 
        self.num_labels = num_labels 

        #Load Model with given checkpoint and extract its body
        self.model = model = AutoModel.from_pretrained(checkpoint,config=AutoConfig.from_pretrained(checkpoint, output_attentions=True,output_hidden_states=True))
        self.dropout = nn.Dropout(0.1) 
        self.classifier = nn.Linear(768,num_labels) # load and initialize weights

    def forward(self, input_ids=None, attention_mask=None,labels=None):
        #Extract outputs from the body
        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)

        #Add custom layers
        sequence_output = self.dropout(outputs[0]) #outputs[0]=last hidden state

        logits = self.classifier(sequence_output[:,0,:].view(-1,768)) # calculate losses

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        return TokenClassifierOutput(loss=loss, logits=logits, hidden_states=outputs.hidden_states,attentions=outputs.attentions)

In [18]:
print(torch.cuda.is_available())
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model=CustomModel(checkpoint=checkpoint,num_labels=2).to(device)
from torch.utils.data import DataLoader

train_dataloader = DataLoader(
    tokenized_dataset["train"], shuffle=True, batch_size=32, collate_fn=data_collator
)
eval_dataloader = DataLoader(
    tokenized_dataset["test"], batch_size=32, collate_fn=data_collator
)
from transformers import AdamW,get_scheduler

optimizer = AdamW(model.parameters(), lr=5e-5)

num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps,
)
print(num_training_steps)
from datasets import load_metric
metric = load_metric("f1")

True


Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaModel: ['classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


249


In [19]:
from tqdm.auto import tqdm

progress_bar_train = tqdm(range(num_training_steps))
progress_bar_eval = tqdm(range(num_epochs * len(eval_dataloader)))


for epoch in range(num_epochs):
    model.train()
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar_train.update(1)

    model.eval()
    for batch in eval_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)

        logits = outputs.logits
        predictions = torch.argmax(logits, dim=-1)
        metric.add_batch(predictions=predictions, references=batch["labels"])
        progress_bar_eval.update(1)
    
    print(metric.compute())

      

  0%|          | 0/249 [00:00<?, ?it/s]

  0%|          | 0/63 [00:00<?, ?it/s]

You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 8.00 GiB total capacity; 7.22 GiB already allocated; 0 bytes free; 7.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF