# An Overview of Zero Shot Learning in NLP



### Introduction



### 

### A Ready-made Zero-Shot Text Classifier

Recently [Yin et al.](https://arxiv.org/abs/1909.00161) proposed a method which uses a pre-trained MNLI sequence-pair classifier as an out-of-the-box text/label compatibility function.

Natural Language Inference (NLI) considers two sentences: a "premise" and a "hypothesis". The task is to determine whether the hypothesis is true (entailment) given the hypothesis. The following example comes from [NLP Progress](http://nlpprogress.com/english/natural_language_inference.html).

|Premise|Label|Hypothesis|
|-|-|-|
|A man inspects the uniform of a figure in some East Asian country.	|contradiction | The man is sleeping.|
|An older and younger man smiling.	 | neutral | Two men are smiling and laughing at the cats playing on the floor. |
|A soccer game with multiple males playing.	|entailment | Some men are playing a sport.|

The idea is to take the sequence as the "premise" and turn each possible label into a "hypothesis." If the model says that the premise "entails" the hypothesis, we take the label to be true. This gives us a ready-made compatibility function that works reasonably well on certain tasks without any task-specific training. See the code snippet below to see how easily this can be done with 🤗 Transformers.

In [1]:
#collapse-show
# load model pretrained on MNLI
from transformers import BartForSequenceClassification, BartTokenizer
tokenizer = BartTokenizer.from_pretrained('bart-large-mnli')
model = BartForSequenceClassification.from_pretrained('bart-large-mnli')

# pose sequence as a NLI premise and label (politics) as a hypothesis
premise = "Who are you voting for in 2020?"
hypothesis = 'This text is about politics.'

# run through model pre-trained on MNLI
input_ids = tokenizer.encode(premise, hypothesis, return_tensors='pt')
logits = model(input_ids)[0]

# we throw away "neutral" (dim 1) and take the probability of
# "entailment" (2) as the probability of the label being true 
entail_contradiction_logits = logits[:,[0,2]]
probs = entail_contradiction_logits.softmax(dim=1)
true_prob = probs[:,1].item() * 100
print(f'Probability that the label is true: {true_prob:0.2f}%')

Probability that the label is true: 99.04%


In their paper, the authors report an F1 of 37.9 on Yahoo Answers using the smallest version of BERT fine-tuned only on the Multi-genre NLI (MNLI) corpus. By simply using the larger and more recent Bart model pre-trained on MNLI, we were able to bring this model up to NUMBER.