<a href="https://colab.research.google.com/github/joynaomi81/Zero-shot-Prediction/blob/main/Zero_shot_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#  Zero-Shot Classification Documentation

## What is Zero-Shot Classification?

Zero-shot classification is a natural language processing (NLP) technique that allows a model to assign labels or categories to text without having been explicitly trained on examples of those labels.

In simple terms, it means the model can understand and classify text into categories it has never seen before, using only the meaning of the labels.


## How Does It Work?

Zero-shot models use pretrained language understanding (like from BART or RoBERTa) and apply natural language inference (NLI) techniques to estimate whether a given label fits a text.



## Why Use Zero-Shot Classification?
* You don’t need labeled training data for your specific categories.  
* You can quickly test new labels or categories.  


## Key Terms Explained

- **Sequence (`sequence`)** → The input text being classified.
- **Labels (`labels`)** → The list of candidate categories provided by the user.
- **Scores (`scores`)** → Confidence values (between 0 and 1) showing how likely the model thinks the text belongs to each label. The scores typically sum to 1 across all labels.




In [1]:
!pip install -q transformers

Load model

In [2]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


In [9]:
# input text and candidate labels
sequence_to_classify = "one day I will see the world"  #classify text
candidate_labels = ['travel', 'cooking', 'dancing']

classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will see the world',
 'labels': ['travel', 'dancing', 'cooking'],
 'scores': [0.9938651323318481, 0.0032737809233367443, 0.002861025743186474]}

In [10]:
# input text and candidate labels
candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
classifier(sequence_to_classify, candidate_labels, multi_label=True)

{'sequence': 'one day I will see the world',
 'labels': ['travel', 'exploration', 'dancing', 'cooking'],
 'scores': [0.9945111274719238,
  0.9383887052536011,
  0.005706210620701313,
  0.0018193126888945699]}

In [11]:
# input text and candidate labels
sequence_to_classify = "Donald Trump will be next president"
candidate_labels = ['science', 'politics', 'history']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'Donald Trump will be next president',
 'labels': ['politics', 'history', 'science'],
 'scores': [0.8404946327209473, 0.15548016130924225, 0.004025156144052744]}

In [7]:
# input text and candidate labels
sequence_to_classify = "The James Webb Space Telescope has made groundbreaking discoveries in astrophysics."
candidate_labels = ['space', 'sports', 'politics']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'The James Webb Space Telescope has made groundbreaking discoveries in astrophysics.',
 'labels': ['space', 'sports', 'politics'],
 'scores': [0.984025239944458, 0.009652308188378811, 0.006322517525404692]}

In [8]:
# input text and candidate labels
sequence_to_classify = "Regular exercise and a balanced diet can improve mental health."
candidate_labels = ['health', 'technology', 'finance']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'Regular exercise and a balanced diet can improve mental health.',
 'labels': ['health', 'technology', 'finance'],
 'scores': [0.9278359413146973, 0.04332251474261284, 0.028841543942689896]}