# SetFit for Multilabel Text Classification

In this notebook, we'll try outtext classification on a multilabel dataset with SetFit.\
It's known to be good for few shot learning. It relies on the embeddings models to perform training using contrastive learning.

It expect it to perform well in binary classification situations like our problem.

In [1]:
from datasets import load_dataset

model_id = "sentence-transformers/paraphrase-mpnet-base-v2"
model_id = "sentence-transformers/all-MiniLM-L6-v2"
model_id = "BAAI/bge-small-en-v1.5"
# model_id = "jinaai/jina-embedding-s-en-v1"
# model_id = "avsolatorio/GIST-all-MiniLM-L6-v2"
# model_id = "mixedbread-ai/mxbai-embed-large-v1"
# model_id = "WhereIsAI/UAE-Large-V1"


  from .autonotebook import tqdm as notebook_tqdm


## Loading the dataset

In [2]:
import pandas as pd
from datasets import Dataset, DatasetDict

train_df = pd.read_csv('../data/processed/clean_train.csv')
valid_df = pd.read_csv('../data/processed/clean_valid.csv')


# dataset = load_dataset("ethos", "multilabel")

ds_dict = {'train' : Dataset.from_pandas(train_df),
           'valid' : Dataset.from_pandas(valid_df)}

dataset = DatasetDict(ds_dict)
dataset

DatasetDict({
    train: Dataset({
        features: ['clean_content', 'cyber_label', 'environmental_issue'],
        num_rows: 1008
    })
    valid: Dataset({
        features: ['clean_content', 'cyber_label', 'environmental_issue'],
        num_rows: 252
    })
})

In [3]:
import numpy as np

features = dataset["train"].column_names
features.remove("clean_content")
features

['cyber_label', 'environmental_issue']

We encode the emotions in a single `'label'` feature. 

In [4]:
def encode_labels(record):
    return {"labels": [record[feature] for feature in features]}


dataset = dataset.map(encode_labels)

Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1008/1008 [00:00<00:00, 29177.97 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 252/252 [00:00<00:00, 35475.75 examples/s]


In [5]:
train_dataset = dataset["train"]
eval_dataset = dataset["valid"]

Okay, now we have the dataset, let's load and train a model!

## Fine-tuning the model

To train a SetFit model, we download a pretrained checkpoint from the Hub using `from_pretrained()` method associated with the `SetFitModel` class.

**Note that the `multi_target_strategy` parameter here signals to both the model and the trainer to expect a multi-labelled dataset.**

In [6]:
from setfit import SetFitModel

model = SetFitModel.from_pretrained(model_id, multi_target_strategy="one-vs-rest")
model.device

model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.


device(type='cuda', index=0)

Here, we've downloaded a pretrained Sentence Transformer from the Hub and added a logistic classification head to the create the SetFit model. As indicated in the message, we need to train this model on some labeled examples. We can do so by using the `SetFitTrainer` class as follows:

In [7]:
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitTrainer

trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss,
    num_iterations=5,
    column_mapping={"clean_content": "text", "labels": "label"},
    batch_size=3,
    num_epochs=2,
)

  trainer = SetFitTrainer(
Applying column mapping to the training dataset
Applying column mapping to the evaluation dataset
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1008/1008 [00:00<00:00, 21143.31 examples/s]


The main arguments to notice in the trainer is the following:

* `loss_class`: The loss function to use for contrastive learning with the Sentence Transformer body
* `num_iterations`: The number of text pairs to generate for contrastive learning
* `column_mapping`: The `SetFitTrainer` expects the inputs to be found in a `text` and `label` column. This mapping automatically formats the training and evaluation datasets for us.

Now that we've created a trainer, we can train it!

In [8]:
trainer.train(max_length=256)

  trainer.train(max_length=256)
***** Running training *****
  Num unique pairs = 10080
  Batch size = 3
  Num epochs = 2
  Total optimization steps = 6720


Step,Training Loss


The final step is to compute the model's performance using the `evaluate()` method. The default metric measures 'subset accuracy', which measures the fraction of samples where we predict all labels correctly.

In [9]:
metrics = trainer.evaluate()
metrics

***** Running evaluation *****


{'accuracy': 0.8214285714285714}

Let's try two random and short sentences.

In [10]:
preds = model(
    [
        "Daily cyber topics",
        "This is shouldn't be be assigned any labels?"
    ]
)
preds

tensor([[1, 0],
        [0, 0]])

In [11]:
# Show predicted labels, requires you to have stored the 'features' somewhere
[[f for f, p in zip(features, ps) if p] for ps in preds]

[['cyber_label'], []]

In [12]:
from sklearn.metrics import accuracy_score, classification_report

X_valid = valid_df['clean_content']
y_valid = valid_df[['cyber_label', 'environmental_issue']]

y_pred = model(X_valid)
for i, label in enumerate(['cyber_label', 'environmental_issue']):
    print(f"Accuracy for {label}: {accuracy_score(y_valid.iloc[:, i], y_pred[:, i])}")
    print(f"Classification Report for {label}:\n", classification_report(y_valid.iloc[:, i], y_pred[:, i]))


Accuracy for cyber_label: 0.9325396825396826
Classification Report for cyber_label:
               precision    recall  f1-score   support

           0       0.97      0.96      0.96       235
           1       0.50      0.53      0.51        17

    accuracy                           0.93       252
   macro avg       0.73      0.75      0.74       252
weighted avg       0.93      0.93      0.93       252

Accuracy for environmental_issue: 0.8690476190476191
Classification Report for environmental_issue:
               precision    recall  f1-score   support

           0       0.93      0.90      0.92       200
           1       0.66      0.75      0.70        52

    accuracy                           0.87       252
   macro avg       0.80      0.82      0.81       252
weighted avg       0.88      0.87      0.87       252

