[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/education-toolkit/blob/main/02_ml-demos-with-gradio.ipynb)



💡 **Welcome!**

This notebook provides a short walk through of text classification using few shot learning with [SetFit](https://github.com/huggingface/setfit).
This notebook can be found at [https://bit.ly/raj_setfit](https://bit.ly/raj_setfit) or my [huggingface demos repo](https://github.com/rajshah4/huggingface-demos/tree/main/SetFit).


You can find a deeper dive on doing text classification over at [Philipp's blog](https://www.philschmid.de/getting-started-setfit)

In [None]:
!python -m pip install setfit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting setfit
  Downloading setfit-0.3.0-py3-none-any.whl (21 kB)
Collecting evaluate==0.2.2
  Downloading evaluate-0.2.2-py3-none-any.whl (69 kB)
[K     |████████████████████████████████| 69 kB 3.9 MB/s 
[?25hCollecting datasets==2.3.2
  Downloading datasets-2.3.2-py3-none-any.whl (362 kB)
[K     |████████████████████████████████| 362 kB 10.8 MB/s 
[?25hCollecting sentence-transformers==2.2.2
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[K     |████████████████████████████████| 85 kB 4.8 MB/s 
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting multiprocess
  Downloading multiprocess-0.70.14-py37-none-any.whl (115 kB)
[K     |████████████████████████████████| 115 kB 45.2 MB/s 
Collecting xxhash
  Downloading xxhash-3.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[K     |██████████████████████████████

## Load Dataset

In [None]:
from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss

from setfit import SetFitModel, SetFitTrainer

In [None]:
# Load a dataset from the Hugging Face Hub
dataset = load_dataset("SetFit/SentEval-CR")



Downloading and preparing dataset json/SetFit--SentEval-CR to /root/.cache/huggingface/datasets/SetFit___json/SetFit--SentEval-CR-3d6d995d44023096/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/427k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/109k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/SetFit___json/SetFit--SentEval-CR-3d6d995d44023096/0.0.0/da492aad5680612e4028e7f6ddc04b1dfcec4b64db470ed7cc5f2bb265b9b6b5. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

## Simulate the few-shot regime by sampling 8 examples per class

In [None]:
num_classes = 2
train_dataset = dataset["train"].shuffle(seed=42).select(range(8 * num_classes))
eval_dataset = dataset["test"]



In [None]:
train_dataset['text']

['* slick-looking design and improved interface',
 "the day finally arrived when i was sure i 'd leave sprint .",
 'as for bluetooth , no problems at all .',
 '2 ) storage capacity',
 "neither message was answered ( they ask for 24 hours before replying - i 've been waiting 27 days . )",
 "for a price that 's still less than even the lowest level ipod i was able to get this 40gb monster , and the best part is it works as great as it was advertised to and then some .",
 'i bought the player this week and i like it by far .',
 'only problem is that is a bit heavy .',
 'i love the slim design ; . the weight would only be an issue if it were bulky .',
 'it fits into a hand well , it has a removable battery ( this is important ) , great sound quality , fm stereo , recorder , smooth ui , and a feature that most uni pods lack . . . char ! .',
 'once a depth is locked , it will jump off a little while working .',
 'the thought of not having to buy refills and just using regular bags is awesome

## Load a SetFit model from Hub

In [None]:
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")

model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.


## Create Trainer

In [None]:
trainer = SetFitTrainer(
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss,
    batch_size=16,
    num_iterations=20, # The number of text pairs to generate for contrastive learning
    column_mapping={"text": "text", "label": "label"} # Map dataset columns to text/label expected by trainer
)

## Train and evaluate

In [None]:
trainer.train()

Applying column mapping to training dataset
***** Running training *****
  Num examples = 640
  Num epochs = 1
  Total optimization steps = 40
  Total train batch size = 16


Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Iteration:   0%|          | 0/40 [00:00<?, ?it/s]

In [None]:
metrics = trainer.evaluate()

Applying column mapping to evaluation dataset
***** Running evaluation *****


In [None]:
metrics

{'accuracy': 0.8632138114209827}

In [None]:
## Log into Hugging Face Hub

In [None]:
from huggingface_hub import notebook_login
notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token


In [None]:
!huggingface-cli whoami
!git config --global credential.helper store

rajistics
[1morgs: [0m huggingface,spaces-explorers,demo-org,HF-test-lab,qualitydatalab,FinanceInc,inferenceendpoints,vendorabc


In [None]:
trainer.push_to_hub(repo_path_or_name="rajistics/setfit-model",use_auth_token=True)

Cloning https://huggingface.co/rajistics/setfit-model into local empty directory.


Upload file pytorch_model.bin:   0%|          | 3.34k/418M [00:00<?, ?B/s]

Upload file model_head.pkl:  48%|####8     | 3.34k/6.95k [00:00<?, ?B/s]

remote: Scanning LFS files for validity, may be slow...        
remote: LFS file scan complete.        
To https://huggingface.co/rajistics/setfit-model
   b3453a3..ca8dc8f  main -> main

remote: LFS file scan complete.        
To https://huggingface.co/rajistics/setfit-model
   b3453a3..ca8dc8f  main -> main



'https://huggingface.co/rajistics/setfit-model/commit/ca8dc8fec289777c59cc34e238f45da4c727d04c'

## Download model for local Inference

In [None]:
modelt = SetFitModel.from_pretrained("rajistics/setfit-model")
# Run inference
preds = modelt(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])

In [None]:
preds

array([1, 0])

## Hugging Face Inference Endpoints

Inference Endpoints are a production solution. Example Endpoint: https://huggingface.co/philschmid/setfit-ag-news-endpoint

Sample request once endpoint is created

In [None]:
import json
import requests as r

ENDPOINT_URL=""# url of your endpoint
HF_TOKEN=""

# payload samples
regular_payload = { "inputs": "Coming to The Rescue Got a unique problem? Not to worry: you can find a financial planner for every specialized need"}

# HTTP headers for authorization
headers= {
    "Authorization": f"Bearer {HF_TOKEN}",
    "Content-Type": "application/json"
}

# send request
response = r.post(ENDPOINT_URL, headers=headers, json=paramter_payload)
classified = response.json()

print(classified)