# Fine-tuning BERT for multi-label text classification

In this notebook, we are going to fine-tune BERT to predict one or more labels for a given piece of text. Note that this notebook illustrates how to fine-tune a bert-base-uncased model, but you can also fine-tune a RoBERTa, DeBERTa, DistilBERT, CANINE, ... checkpoint in the same way.

All of those work in the same way: they add a linear layer on top of the base model, which is used to produce a tensor of shape (batch_size, num_labels), indicating the unnormalized scores for a number of labels for every example in the batch.



## Set-up environment

First, we install the libraries which we'll use: HuggingFace Transformers and Datasets.

In [None]:
!pip install -q transformers[torch] datasets

In [None]:
from huggingface_hub import notebook_login

In [None]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Load dataset



In [None]:
from datasets import load_dataset

encoded_dataset = load_dataset("zcamz/toxic")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
labels = ["toxic","severe_toxic","obscene", "threat", "insult", "identity_hate"]
id2label = { id : label for id,label in enumerate(labels)}
label2id = { label : id for id,label in enumerate(labels)}

In [None]:
encoded_dataset.set_format("torch")

In [None]:
from sklearn.utils.class_weight import compute_class_weight
import numpy as np
import torch

# Extract labels from the dataset
labels_np = np.array(encoded_dataset["train"]["labels"])

# Calculate class weights for each class
class_weights = []
for i in range(labels_np.shape[1]):
    class_weight = compute_class_weight('balanced', classes=np.unique(labels_np[:, i]), y=labels_np[:, i])
    class_weights.append(class_weight[1])  # Get weight for the positive class (1)

# Convert class weights to a tensor
class_weights_tensor = torch.tensor(class_weights, dtype=torch.float32)

print("Class Weights:", class_weights_tensor)

Class Weights: tensor([  5.2156,  50.1005,   9.4785, 157.9901,  10.1913,  57.4509])


## Define model

Here we define a model that includes a pre-trained base (i.e. the weights from bert-base-uncased) are loaded, with a random initialized classification head (linear layer) on top. One should fine-tune this head, together with the pre-trained base on a labeled dataset.

This is also printed by the warning.

We set the `problem_type` to be "multi_label_classification", as this will make sure the appropriate loss function is used (namely [`BCEWithLogitsLoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html)). We also make sure the output layer has `len(labels)` output neurons, and we set the id2label and label2id mappings.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")



tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [None]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased",
                                                           problem_type="multi_label_classification",
                                                           num_labels=len(labels),
                                                           id2label=id2label,
                                                           label2id=label2id)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Train the model!

We are going to train the model using HuggingFace's Trainer API. This requires us to define 2 things:

* `TrainingArguments`, which specify training hyperparameters. All options can be found in the [docs](https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments). Below, we for example specify that we want to evaluate after every epoch of training, we would like to save the model every epoch, we set the learning rate, the batch size to use for training/evaluation, how many epochs to train for, and so on.
* a `Trainer` object (docs can be found [here](https://huggingface.co/transformers/main_classes/trainer.html#id1)).

In [None]:
batch_size = 64
metric_name = "f1"

In [None]:
from transformers import TrainingArguments, Trainer

args = TrainingArguments(
    f"bert-finetuned-toxic",
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=10,
    weight_decay=0.02,
    load_best_model_at_end=True,
    metric_for_best_model=metric_name,
    #push_to_hub=True,
)

We are also going to compute metrics while training. For this, we need to define a `compute_metrics` function, that returns a dictionary with the desired metric values.

In [None]:
from sklearn.metrics import f1_score, roc_auc_score, accuracy_score, classification_report
from transformers import EvalPrediction
import torch
import numpy as np
targets_names = ["toxic","severe_toxic","obscene", "threat", "insult", "identity_hate", 'overall_non_toxic']
# source: https://jesusleal.io/2021/04/21/Longformer-multilabel-classification/
def multi_label_metrics(predictions, labels, threshold=0.5):
    # first, apply sigmoid on predictions which are of shape (batch_size, num_labels)
    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(torch.Tensor(predictions))
    # next, use threshold to turn them into integer predictions
    y_pred = np.zeros(probs.shape)
    y_pred[np.where(probs >= threshold)] = 1
    # finally, compute metrics
    y_true = labels

    overall_non_toxic = np.zeros(y_pred.shape[0])
    overall_non_toxic[y_pred.sum(axis=1) == 0] = 1
    y_pred = np.hstack([y_pred, overall_non_toxic.reshape(-1,1)])

    overall_non_toxic = np.zeros(y_true.shape[0])
    overall_non_toxic[y_true.sum(axis=1) == 0] = 1
    y_true = np.hstack([y_true, overall_non_toxic.reshape(-1,1)])

    f1_macro_average = f1_score(y_true=y_true, y_pred=y_pred, average='macro')
    roc_auc = roc_auc_score(y_true, y_pred, average = 'macro')
    accuracy = accuracy_score(y_true, y_pred)
    # return as dictionary
    metrics = {'f1': f1_macro_average,
               'roc_auc': roc_auc,
               'accuracy': accuracy}
    print(classification_report(y_true, y_pred, target_names=targets_names))
    return metrics

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions,
            tuple) else p.predictions
    result = multi_label_metrics(
        predictions=preds,
        labels=p.label_ids)
    return result

Let's verify a batch as well as a forward pass:

In [None]:
encoded_dataset['train'][0]['labels'].type()

In [None]:
encoded_dataset['train']['input_ids'][0]

tensor([  101, 13055, 28774, 15544,  2323,  6402,  1999, 11669, 13055, 28774,
        15544,  2003, 11669,  1012,  1045,  5223, 13055, 28774, 15544,  1012,
         1042,  1003,  1003,  1047,  2014,  2000,  3109,   999,  1012,  1012,
         1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0])

In [None]:
encoded_dataset['train']['input_ids'][0]

tensor([  101, 13055, 28774, 15544,  2323,  6402,  1999, 11669, 13055, 28774,
        15544,  2003, 11669,  1012,  1045,  5223, 13055, 28774, 15544,  1012,
         1042,  1003,  1003,  1047,  2014,  2000,  3109,   999,  1012,  1012,
         1012,   102,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0])

In [None]:
#forward pass
outputs = model(input_ids=encoded_dataset['train']['input_ids'][0].unsqueeze(0), labels=encoded_dataset['train'][0]['labels'].unsqueeze(0))
outputs

TypeError: BertModel.forward() got an unexpected keyword argument 'labels'

Let's start training!

In [None]:
class MyTrainer(Trainer):
    def __init__(self, class_weights, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # You pass the class weights when instantiating the Trainer
        self.class_weights = class_weights

    def compute_loss(self, model, inputs, return_outputs=False):
        """
        How the loss is computed by Trainer. By default, all models return the loss in the first element.
        Subclass and override for custom behavior.
        """
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)
        # Save past state if it exists
        # TODO: this needs to be fixed and made cleaner later.
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            # We don't use .loss here since the model may return tuples instead of ModelOutput.

            # Changes start here
            # loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
            logits = outputs['logits']
            self.class_weights = self.class_weights.to(logits.device)
            criterion = torch.nn.BCEWithLogitsLoss(pos_weight=self.class_weights)
            loss = criterion(logits, inputs['labels'])
            # Changes end here

        return (loss, outputs) if return_outputs else loss

In [15]:
trainer = MyTrainer(
    class_weights_tensor,
    model,
    args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss,F1,Roc Auc,Accuracy
1,No log,0.174019,0.564632,0.954362,0.861946
2,0.296200,0.159531,0.599356,0.955068,0.869121
3,0.154500,0.17145,0.632172,0.950864,0.877644
4,0.121800,0.188298,0.641236,0.946665,0.887044
5,0.097600,0.249747,0.680768,0.926484,0.903682
6,0.080700,0.241068,0.67423,0.933127,0.891744
7,0.068200,0.295496,0.692201,0.918259,0.899514
8,0.059700,0.320664,0.703232,0.914293,0.906878
9,0.053300,0.32067,0.697741,0.915831,0.904402
10,0.048700,0.340676,0.702846,0.909141,0.907285


                   precision    recall  f1-score   support

            toxic       0.62      0.94      0.75      3056
     severe_toxic       0.17      0.99      0.30       321
          obscene       0.54      0.97      0.69      1715
           threat       0.16      0.93      0.28        74
           insult       0.46      0.97      0.62      1614
    identity_hate       0.21      0.93      0.34       294
overall_non_toxic       0.99      0.94      0.97     28671

        micro avg       0.81      0.95      0.87     35745
        macro avg       0.45      0.95      0.56     35745
     weighted avg       0.90      0.95      0.91     35745
      samples avg       0.90      0.94      0.92     35745

                   precision    recall  f1-score   support

            toxic       0.64      0.94      0.76      3056
     severe_toxic       0.22      0.98      0.37       321
          obscene       0.66      0.95      0.78      1715
           threat       0.16      0.95      0.27    

TrainOutput(global_step=4990, training_loss=0.10273209115068516, metrics={'train_runtime': 6170.832, 'train_samples_per_second': 206.87, 'train_steps_per_second': 0.809, 'total_flos': 8.397227791208448e+16, 'train_loss': 0.10273209115068516, 'epoch': 10.0})

## Evaluate

After training, we evaluate our model on the validation set.

In [16]:
trainer.evaluate()

OutOfMemoryError: CUDA out of memory. Tried to allocate 10.08 GiB. GPU 0 has a total capacity of 22.17 GiB of which 10.04 GiB is free. Process 42874 has 12.12 GiB memory in use. Of the allocated memory 10.57 GiB is allocated by PyTorch, and 1.32 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

## Inference

Let's test the model on a new sentence:

In [None]:
text = "Hi fuck"

encoding = tokenizer(text, return_tensors="pt")
encoding = {k: v.to(trainer.model.device) for k,v in encoding.items()}

outputs = trainer.model(**encoding)

The logits that come out of the model are of shape (batch_size, num_labels). As we are only forwarding a single sentence through the model, the `batch_size` equals 1. The logits is a tensor that contains the (unnormalized) scores for every individual label.

In [None]:
logits = outputs.logits
logits.shape

torch.Size([1, 6])

To turn them into actual predicted labels, we first apply a sigmoid function independently to every score, such that every score is turned into a number between 0 and 1, that can be interpreted as a "probability" for how certain the model is that a given class belongs to the input text.

Next, we use a threshold (typically, 0.5) to turn every probability into either a 1 (which means, we predict the label for the given example) or a 0 (which means, we don't predict the label for the given example).

In [None]:
# apply sigmoid + threshold
sigmoid = torch.nn.Sigmoid()
probs = sigmoid(logits.squeeze().cpu())
predictions = np.zeros(probs.shape)
predictions[np.where(probs >= 0.5)] = 1
# turn predicted id's into actual label names
predicted_labels = [id2label[idx] for idx, label in enumerate(predictions) if label == 1.0]
print(predicted_labels)

['toxic', 'severe_toxic', 'obscene', 'insult']


In [None]:
trainer.save_model("./my_model")

In [None]:
trainer.push_to_hub("zcamz/bert-classifier-toxic")

CommitInfo(commit_url='https://huggingface.co/zcamz/bert-finetuned-toxic/commit/6f63129ee8a36b75e2c00368e43143f0dab6fbce', commit_message='zcamz/bert-classifier-toxic', commit_description='', oid='6f63129ee8a36b75e2c00368e43143f0dab6fbce', pr_url=None, pr_revision=None, pr_num=None)