# Arabic Intent Classification using SetFit - Efficient Few-shot Learning with Sentence Transformers

Recently Huggingface and intel have published a paper on SetFit, a novel few-shot learning approach for sentence embeddings. SetFit is a simple and efficient method that can be used to train sentence embeddings for few-shot learning tasks. In this notebook, we will show how to use SetFit to train sentence embeddings for Arabic Intent Classification


In [1]:
%pip install setfit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
from datasets import load_dataset
import numpy as np
import pandas as pd
import plotly.express as px
from datasets import load_dataset
from sentence_transformers.losses import BatchAllTripletLoss

from setfit import SetFitModel, SetFitTrainer, sample_dataset

## Loading of the Massive Intent Arabic Dataset from HuggingFace

In [3]:
dataset = load_dataset("SetFit/amazon_massive_intent_ar-SA")
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples = 128)
eval_dataset = dataset["test"]



  0%|          | 0/3 [00:00<?, ?it/s]



## Setting up of Setfit Trainer and training of the model

In [4]:
# Load a SetFit model from Hub
model = SetFitModel.from_pretrained("aubmindlab/bert-large-arabertv02")

# Create trainer
trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=BatchAllTripletLoss,
    metric="accuracy",
    batch_size=64,
    num_iterations=20, # The number of text pairs to generate for contrastive learning
    num_epochs=15, # The number of epochs to use for contrastive learning
    warmup_proportion=0.2, # The proportion of the training steps to use for the warmup
)

# Train and evaluate
trainer.train()
metrics = trainer.evaluate()


Some weights of the model checkpoint at /root/.cache/torch/sentence_transformers/aubmindlab_bert-large-arabertv02 were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
model_head.pkl not found on HuggingFace Hub, initialising classific

Epoch:   0%|          | 0/15 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

Iteration:   0%|          | 0/96 [00:00<?, ?it/s]

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
***** Running evaluation *****


In [5]:
print(metrics)

{'accuracy': 0.7787491593813046}


In [6]:

# Push model to the Hub
trainer.push_to_hub(r"fathyshalab/massive-ar-SA")

# Download from Hub and run inference
model = SetFitModel.from_pretrained("fathyshalab/massive-ar-SA")
# Run inference
preds = model(["تشغيل الأضواء", "كيف هو الطقس يوم الثلاثاء الثالث عشر"])

TypeError: ignored