## CQ1. Apply a token classification or NER pipeline to identify named entities in the input text

In [2]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


Create the appropriate pipeline to obtain the expected results below:

In [3]:
MODEL_ID = "dslim/bert-base-NER"
ner_pipe = pipeline('ner', model=MODEL_ID)

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [4]:
example = "My name is Pablo, and I live in Madrid although I am from Murcia"
result = ner_pipe(example)
result

[{'entity': 'B-PER',
  'score': 0.99889094,
  'index': 4,
  'word': 'Pablo',
  'start': 11,
  'end': 16},
 {'entity': 'B-LOC',
  'score': 0.9995328,
  'index': 10,
  'word': 'Madrid',
  'start': 32,
  'end': 38},
 {'entity': 'B-LOC',
  'score': 0.9991641,
  'index': 15,
  'word': 'Mu',
  'start': 58,
  'end': 60},
 {'entity': 'I-LOC',
  'score': 0.9830967,
  'index': 16,
  'word': '##rcia',
  'start': 60,
  'end': 64}]

Notice how multiple tokens can actually correspond to the same named entity! You will need to handle the output to make it more intuitive. Lukily, you are also given the start and end characters, so you can easily join adjacent intervals and take the appropriate characters from the initial string. In this case we find a `B-LOC` entity in character ranges `[58,60)` and `[60,64)`, they should be joined together.

## CQ2. Fine-tune a RoBERTa model for Natural Language Inference (NLI) on the SNLI dataset

**Natural Language Inference (NLI)** is the task of determining the logical relationship between a pair of sentences—typically labeled as **entailment**, **contradiction**, or **neutral**. Given a **premise** (a statement) and a **hypothesis** (another statement), the goal is to predict whether the hypothesis is true (entailment), false (contradiction), or undetermined (neutral) based on the premise.  

**The SNLI Dataset (Stanford Natural Language Inference)** is a widely used benchmark dataset for NLI. It consists of 570,000 sentence pairs labeled for entailment, contradiction, and neutral relationships.  

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import pytorch_lightning as pl


from transformers import AutoModel, AutoTokenizer
from torchmetrics import Accuracy
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
def freeze_layers_for_bert_based_models(model, num_frozen_layers):
    # Freeze the first `num_frozen_layers` layers of the model
    for layer in model.encoder.layer[:num_frozen_layers]:
        for param in layer.parameters():
            param.requires_grad = False
    # Freeze initial embeddings
    for param in model.embeddings.parameters():
        param.requires_grad = False

The main change over basic text classification is the tokenization. Instead of tokenizing texts, you tokenize pairs of texts! Complete the following `CollateFn`.

In [3]:
class CollateFn():
    def __init__(self, tokenizer_name, max_length):
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
        self.max_length = max_length

    def __call__(self, batch):
        premises = [example['premise'] for example in batch]
        hypothesis = [example['hypothesis'] for example in batch]

        texts = list(zip(premises, hypothesis))  # List of pairs of premises and hypothesis
        labels = [example['label'] for example in batch]

        encoded_input = self.tokenizer(
            texts,
            max_length=self.max_length,
            padding=True,
            truncation=True,
            return_tensors='pt',
        )

        labels = torch.tensor(labels)

        return encoded_input, labels

The `LighningDataModule` is very basic, so no need to complete anything here:

In [4]:
class SNLIDataModule(pl.LightningDataModule):
    def __init__(self, tokenizer_name, max_length, batch_size):
        super().__init__()
        self.tokenizer_name = tokenizer_name
        self.max_length = max_length
        self.batch_size = batch_size

    def setup(self, stage=None):
        dataset = load_dataset("snli")

        filter_missing_labels = lambda example: example['label'] != -1
        self.train = dataset['train'].filter(filter_missing_labels)
        self.val = dataset['validation'].filter(filter_missing_labels)
        self.test = dataset['test'].filter(filter_missing_labels)

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
            self.train,
            batch_size=self.batch_size,
            shuffle=True,
            collate_fn=CollateFn(self.tokenizer_name, self.max_length),
            num_workers=4,
        )

    def val_dataloader(self):
        return torch.utils.data.DataLoader(
            self.val,
            batch_size=self.batch_size,
            collate_fn=CollateFn(self.tokenizer_name, self.max_length),
            num_workers=4,
        )
    
    def test_dataloader(self):
        return torch.utils.data.DataLoader(
            self.test,
            batch_size=self.batch_size,
            collate_fn=CollateFn(self.tokenizer_name, self.max_length),
            num_workers=4,
        )

Now, you need to complete the following `LightningModule` code. It is basically an adaptation from the one seen during the lecture, except now its a multiclass problem with three labels (contradiction, neutral and entailment).

In [9]:
class TextNLIClassifier(pl.LightningModule):
    def __init__(self, model_name, optimizer_params, pooling="mean", frozen_layers=0):
        """
        model_name: The name of the model to use
        optimizer_params: Parameters to pass to the optimizer
        pooling: The pooling strategy to use. Either 'cls' or 'mean'
        frozen_layers: The number of layers to freeze in the pre-trained model
        """
        super().__init__()
        self.model = AutoModel.from_pretrained(model_name)
        self.classifier = nn.Linear(self.model.config.hidden_size, 3)  # Contradiction, Neutral, Entailment
        freeze_layers_for_bert_based_models(self.model, frozen_layers)
        
        assert pooling in ["cls", "mean"], "Pooling must be either 'cls' or 'mean'"
        self.pooling = pooling

        self.accuracy = Accuracy(task="multiclass", num_classes=3)
        self.optimizer_params = optimizer_params

    def forward(self, input_ids, attention_mask, token_type_ids=None):
        # input_ids: (batch_size, seq_length)
        # attention_mask: (batch_size, seq_length)

        outputs = self.model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
        last_hidden_state = outputs.last_hidden_state  # (batch_size, seq_length, hidden_size)

        if self.pooling == "cls":
            # NOTE Option 1: Use the CLS token
            pool_output = last_hidden_state[:, 0, :]  # (batch_size, hidden_size)
        else:
            # NOTE Option 2: Use the mean of all tokens
            mean_coeffs = attention_mask.float() / attention_mask.float().sum(dim=1, keepdim=True)  # (batch_size, seq_length)
            pool_output = torch.einsum("bld,bl->bd", last_hidden_state, mean_coeffs)  # (batch_size, hidden_size)

        logits = self.classifier(pool_output)  # (batch_size, 3)
        return logits

    def training_step(self, batch, batch_idx):
        loss, accuracy = self._step(batch)
        self.log('train_loss', loss, prog_bar=True, on_step=True, on_epoch=True)
        self.log('train_accuracy', accuracy, prog_bar=True, on_step=True, on_epoch=True)
        return loss

    def validation_step(self, batch, batch_idx):
        loss, accuracy = self._step(batch)
        self.log('val_loss', loss, prog_bar=True, on_step=False, on_epoch=True)
        self.log('val_accuracy', accuracy, prog_bar=True, on_step=False, on_epoch=True)
        return loss

    def test_step(self, batch, batch_idx):
        loss, accuracy = self._step(batch)
        self.log('test_loss', loss, prog_bar=True, on_step=False, on_epoch=True)
        self.log('test_accuracy', accuracy, prog_bar=True, on_step=False, on_epoch=True)
        return loss

    def _step(self, batch):
        encoded_input, labels = batch

        logits = self(**encoded_input)  # (batch_size, 3)
        loss = F.cross_entropy(logits, labels)
        accuracy = self.accuracy(logits, labels)
        
        return loss, accuracy

    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), **self.optimizer_params)
        return optimizer
    
    def configure_callbacks(self):
        return super().configure_callbacks() + [
            pl.callbacks.ModelCheckpoint(monitor='val_loss', mode='min'),
        ]

Finally, you can train the model, modifying any parameters you want.

In [13]:
MAX_LENGTH = 512
BATCH_SIZE = 128

# MODEL = "bert-base-uncased"
MODEL = "FacebookAI/roberta-base"
POOLING = "mean"  # ["cls", "mean"]
NUM_FROZEN_LAYERS = 10  # Leave two layers unfrozen
OPTIMIZER_PARAMS = {
    'lr': 2e-5,
    'weight_decay': 0.01,
}

In [15]:
data_module = SNLIDataModule(MODEL, MAX_LENGTH, BATCH_SIZE)
model = TextNLIClassifier(MODEL, OPTIMIZER_PARAMS, POOLING, NUM_FROZEN_LAYERS)

data_module.setup()
trainer = pl.Trainer(max_epochs=2, accelerator="gpu", devices=[1], precision="16-mixed")

Some weights of RobertaModel were not initialized from the model checkpoint at FacebookAI/roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


In [16]:
trainer.fit(model, data_module)

The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: ModelCheckpoint
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]

  | Name       | Type               | Params
--------------------------------------------------
0 | model      | RobertaModel       | 124 M 
1 | classifier | Linear             | 2.3 K 
2 | accuracy   | MulticlassAccuracy | 0     
--------------------------------------------------
14.8 M    Trainable params
109 M     Non-trainable params
124 M     Total params
498.592   Total estimated model params size (MB)


Epoch 1: 100%|██████████| 4292/4292 [04:46<00:00, 14.96it/s, v_num=28, train_loss_step=0.498, train_accuracy_step=0.790, val_loss=0.358, val_accuracy=0.864, train_loss_epoch=0.453, train_accuracy_epoch=0.822]

`Trainer.fit` stopped: `max_epochs=2` reached.


Epoch 1: 100%|██████████| 4292/4292 [04:48<00:00, 14.87it/s, v_num=28, train_loss_step=0.498, train_accuracy_step=0.790, val_loss=0.358, val_accuracy=0.864, train_loss_epoch=0.453, train_accuracy_epoch=0.822]


In [17]:
trainer.test(ckpt_path="best", datamodule=data_module)

The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: ModelCheckpoint
Restoring states from the checkpoint path at /home/pablo/classes/MP_DL-DL_NLP/Lecture 2 - Text Classification/lightning_logs/version_28/checkpoints/epoch=1-step=8584.ckpt
/home/pablo/.micromamba/envs/mdl-dl_nlp/lib/python3.11/site-packages/lightning_fabric/utilities/cloud_io.py:55: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are expl

Testing DataLoader 0: 100%|██████████| 77/77 [00:03<00:00, 21.65it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      test_accuracy         0.8611563444137573
        test_loss           0.36514949798583984
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


[{'test_loss': 0.36514949798583984, 'test_accuracy': 0.8611563444137573}]