# BERT fine-tune for Change My View dataset

BERT finet-tune for the Change My View dataset with our 'combined contextual and structural features as text' sentence representation.

## Packages

In [1]:
!pip install pandas==1.3.4
!pip install transformers==4.12.5
!pip install datasets==1.15.1
!pip install ipywidgets

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


## Imports

In [2]:
import os
import pickle

from collections import Counter

from sklearn.metrics import classification_report

import numpy as np
import torch
import torch.nn as nn

import transformers
from transformers import Trainer
from transformers import BertTokenizer
from transformers import BertForSequenceClassification
from transformers import Trainer, TrainingArguments
from transformers.data.data_collator import DataCollatorWithPadding

import datasets
from datasets import Dataset
from datasets import ClassLabel
from datasets import load_metric

## File and folder paths

In [3]:
# Your paths go here:

DATA_FILE = '../datasets/dataset_change_my_view_icann_v2.pt'
RESULTS_FOLDER = '../results/'

## GPU

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [5]:
device

device(type='cuda')

## Load dataset

In [6]:
dataset = torch.load(DATA_FILE)

In [7]:
dataset

DatasetDict({
    train: Dataset({
        features: ['split', 'component', 'labels', 'topic', 'full_sentence', 'topic_and_full_sentence', 'context_full_sentence_structural_fts_as_txt', 'context_full_sentence_structural_fts_as_txt_combined', 'feature_tensor', 'sentence_structural_fts_as_text'],
        num_rows: 2720
    })
    test: Dataset({
        features: ['split', 'component', 'labels', 'topic', 'full_sentence', 'topic_and_full_sentence', 'context_full_sentence_structural_fts_as_txt', 'context_full_sentence_structural_fts_as_txt_combined', 'feature_tensor', 'sentence_structural_fts_as_text'],
        num_rows: 763
    })
})

## Tokenizer

In [9]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [10]:
label_names = set(dataset['train']['labels'])
label_nb = len(label_names)
labels = ClassLabel(num_classes=label_nb, names=label_names)

In [11]:
labels

ClassLabel(num_classes=2, names={'claim', 'premise'}, names_file=None, id=None)

To obtain the different results with different sentence representation, change the column name in the cell below. The column names correspond to the following sentence representations:
1. sentence representation: component only, column_name = component 
2. sentence representation: sentence, column_name = full_sentence
3. sentence representation: topic + sent, column_name = topic_and_full_sentence
4. sentence representation: topic + sent + struct fts, column_name = context_full_sentence_structural_fts_as_txt
5. sentence representation: topic + sent + struct fts (abbreviated), column_name = context_full_sentence_structural_fts_as_txt_combined
6. sentence representation: sent + struct fts only, column_name = sentence_structural_fts_as_text

In [12]:
column_name = 'sentence_structural_fts_as_text'
def tokenize(batch):
    tokens = tokenizer(batch[column_name], truncation=True, padding=True, max_length=512)
    tokens['labels'] = labels.str2int(batch['labels'])
    return tokens

In [13]:
dataset = dataset.map(tokenize, batched=True)

  0%|          | 0/3 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [14]:
dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])

In [15]:
dataset

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'component', 'context_full_sentence_structural_fts_as_txt', 'context_full_sentence_structural_fts_as_txt_combined', 'feature_tensor', 'full_sentence', 'input_ids', 'labels', 'sentence_structural_fts_as_text', 'split', 'token_type_ids', 'topic', 'topic_and_full_sentence'],
        num_rows: 2720
    })
    test: Dataset({
        features: ['attention_mask', 'component', 'context_full_sentence_structural_fts_as_txt', 'context_full_sentence_structural_fts_as_txt_combined', 'feature_tensor', 'full_sentence', 'input_ids', 'labels', 'sentence_structural_fts_as_text', 'split', 'token_type_ids', 'topic', 'topic_and_full_sentence'],
        num_rows: 763
    })
})

## Train-test split

In [16]:
train_dataset = dataset['train'].shuffle(seed=42)
test_dataset = dataset['test'].shuffle(seed=42)

train_val_datasets = dataset['train'].train_test_split(train_size=0.8, seed=42)
train_dataset = train_val_datasets['train']
val_dataset = train_val_datasets['test']

In [17]:
dataset_d = {}
dataset_d['train'] = train_dataset
dataset_d['test'] = test_dataset
dataset_d['val'] = val_dataset

## Model

In [22]:
NUM_LABELS = labels.num_classes
BATCH_SIZE = 32
NB_EPOCHS = 12

In [23]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=NUM_LABELS)
model.to(device)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

## Trainer

In [24]:
class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        outputs = model(**inputs)
        logits = outputs.get('logits')
        loss_fct = nn.CrossEntropyLoss()#(weight=class_weights)
        loss = loss_fct(logits, labels)
        return (loss, outputs) if return_outputs else loss

In [25]:
metric = load_metric('f1')

def compute_metrics(eval_pred):
    
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    
    return metric.compute(predictions=predictions, references=labels, average='macro')

In [26]:
training_args = TrainingArguments(
    
    # output
    output_dir=RESULTS_FOLDER,          
    
    # params
    num_train_epochs=NB_EPOCHS,               # nb of epochs
    per_device_train_batch_size=BATCH_SIZE,   # batch size per device during training
    per_device_eval_batch_size=BATCH_SIZE,    # cf. paper Sun et al.
    learning_rate=3e-5,#2e-5,                 # cf. paper Sun et al.
#     warmup_steps=500,                         # number of warmup steps for learning rate scheduler
    warmup_ratio=0.1,                         # cf. paper Sun et al.
    weight_decay=0.01,                        # strength of weight decay
    
    # eval
    evaluation_strategy="steps",              # cf. paper Sun et al.
    eval_steps=20,                            # cf. paper Sun et al.
    
    # log
    logging_dir="/notebooks/Results/bert_sequence_classification/tb_logs",  
    logging_strategy='steps',
    logging_steps=20,
    
    # save
    save_strategy='steps',
    save_total_limit=2,
    # save_steps=20, # default 500
    load_best_model_at_end=True,              # cf. paper Sun et al.
    # metric_for_best_model='eval_loss' 
    metric_for_best_model='f1'
)

In [27]:
trainer = CustomTrainer( # Trainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics
    # callbacks=[EarlyStoppingCallback(early_stopping_patience=5)]
)

In [28]:
results = trainer.train()

The following columns in the training set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: full_sentence, topic, split, component, sentence_structural_fts_as_text, feature_tensor, context_full_sentence_structural_fts_as_txt_combined, context_full_sentence_structural_fts_as_txt, topic_and_full_sentence.
***** Running training *****
  Num examples = 2176
  Num Epochs = 12
  Instantaneous batch size per device = 32
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 1
  Total optimization steps = 816


Step,Training Loss,Validation Loss,F1
20,0.6952,0.672373,0.369641
40,0.6803,0.66166,0.369641
60,0.6614,0.623695,0.689498
80,0.5983,0.663522,0.62722
100,0.611,0.581593,0.644554
120,0.6015,0.587558,0.69116
140,0.5587,0.546184,0.715244
160,0.4691,0.537232,0.74714
180,0.4224,0.592042,0.685372
200,0.484,0.570069,0.734015


The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: full_sentence, topic, split, component, sentence_structural_fts_as_text, feature_tensor, context_full_sentence_structural_fts_as_txt_combined, context_full_sentence_structural_fts_as_txt, topic_and_full_sentence.
***** Running Evaluation *****
  Num examples = 544
  Batch size = 32
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: full_sentence, topic, split, component, sentence_structural_fts_as_text, feature_tensor, context_full_sentence_structural_fts_as_txt_combined, context_full_sentence_structural_fts_as_txt, topic_and_full_sentence.
***** Running Evaluation *****
  Num examples = 544
  Batch size = 32
The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ig

## Results

In [29]:
model.eval()

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [30]:
test_trainer = Trainer(model, data_collator=DataCollatorWithPadding(tokenizer))
test_raw_preds, test_labels, _ = test_trainer.predict(test_dataset)
test_preds = np.argmax(test_raw_preds, axis=1)

No `TrainingArguments` passed, using `output_dir=tmp_trainer`.
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
The following columns in the test set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: full_sentence, topic, split, component, sentence_structural_fts_as_text, feature_tensor, context_full_sentence_structural_fts_as_txt_combined, context_full_sentence_structural_fts_as_txt, topic_and_full_sentence.
***** Running Prediction *****
  Num examples = 763
  Batch size = 8


In [32]:
target_name = labels.int2str([0,1])
print(classification_report(test_labels, test_preds, target_names=target_name))

              precision    recall  f1-score   support

       claim       0.80      0.71      0.75       322
     premise       0.80      0.87      0.84       441

    accuracy                           0.80       763
   macro avg       0.80      0.79      0.79       763
weighted avg       0.80      0.80      0.80       763

