## Zero-Shot Text Difficulty Classification

### Introduction

Zero-shot and few-shot NLP models are used to handle NLP given the limited dataset. This note is trying to explore those algorithms to improve the accuracy of text classification in terms of text difficulty.

#### Pipeline

In [1]:
#The model can be loaded with the zero-shot-classification pipeline like so:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli")

In [13]:
# Use this pipeline to classify sequences into any of the class names you specify.
sequence_to_classify = ["When Japan was added back to the F1 schedule ten years later , it went to Suzuka instead .",
                       "Before Persephone was released to Hermes , who had been sent to retrieve her , Hades tricked her into eating pomegranate seeds , -LRB- six or three according to the telling -RRB- which forced her to return to the underworld for a period each year ."]
candidate_labels = ['0','1']

classifier(sequence_to_classify, candidate_labels, multi_label=False)
#{'labels': ['travel', 'dancing', 'cooking'],
# 'scores': [0.9938651323318481, 0.0032737774308770895, 0.002861034357920289],
# 'sequence': 'one day I will see the world'}


[{'sequence': 'When Japan was added back to the F1 schedule ten years later , it went to Suzuka instead .',
  'labels': ['1', '0'],
  'scores': [0.587661623954773, 0.41233840584754944]},
 {'sequence': 'Before Persephone was released to Hermes , who had been sent to retrieve her , Hades tricked her into eating pomegranate seeds , -LRB- six or three according to the telling -RRB- which forced her to return to the underworld for a period each year .',
  'labels': ['1', '0'],
  'scores': [0.6243764758110046, 0.37562352418899536]}]

In [5]:
candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
classifier(sequence_to_classify, candidate_labels, multi_label=True)
#{'labels': ['travel', 'exploration', 'dancing', 'cooking'],
# 'scores': [0.9945111274719238,
#  0.9383890628814697,
#  0.0057061901316046715,
#  0.0018193122232332826],
# 'sequence': 'one day I will see the world'}


{'sequence': 'one day I will see the world',
 'labels': ['travel', 'exploration', 'dancing', 'cooking'],
 'scores': [0.994511067867279,
  0.9383885264396667,
  0.005706145893782377,
  0.0018192846328020096]}

In [29]:
import pandas as pd
from sklearn.model_selection import train_test_split

train_data_path="./01_data/WikiLarge_Train.csv"
train_data=pd.read_csv(train_data_path)

size=round(len(train_data)*1)
r_train=train_data.sample(n=size)
texts=list(r_train["original_text"])
labels=list(r_train["label"])
    
rest_texts, test_texts, rest_labels, test_labels = train_test_split(texts, labels, test_size=0.01, random_state=1)
train_texts, dev_texts, train_labels, dev_labels = train_test_split(rest_texts, rest_labels, test_size=0.1, random_state=1)

print("Train size:", len(train_texts))
print("Dev size:", len(dev_texts))
print("Test size:", len(test_texts))

Train size: 371340
Dev size: 41260
Test size: 4168


In [35]:
from tqdm import trange
from tqdm.notebook import tqdm

candidate_labels = ['0','1']

epoches=20
rests=len(test_texts)%epoches
batches=int(len(test_texts)/epoches)

for i in range(epoches):  
    if i==0:
        test_result=classifier(test_texts[:batches],candidate_labels, multi_label=False)
    else:
        if i==epoches-1:
            test_result_tmp=classifier(test_texts[i*batches:(i+1)*batches+rests],candidate_labels, multi_label=False)
        else:
             test_result_tmp=classifier(test_texts[i*batches:(i+1)*batches],candidate_labels, multi_label=False)
        test_result=test_result+test_result_tmp

  0%|          | 0/20 [00:00<?, ?it/s]

In [78]:
df_test_result=pd.DataFrame.from_dict(test_result)
df_test_result['labels']=df_test_result['labels'].apply(lambda x: int(x[0]))
df_test_result['scores']=df_test_result['scores'].apply(lambda x: float(x[0]))

In [79]:
from sklearn.metrics import classification_report, precision_recall_fscore_support
import numpy as np

test_correct=test_labels
test_predicted=df_test_result['labels']
print("Test performance:", precision_recall_fscore_support(test_correct, test_predicted, average="micro"))

bert_accuracy = np.mean(test_predicted == test_correct)

print(classification_report(test_correct, test_predicted))

Test performance: (0.4882437619961612, 0.4882437619961612, 0.4882437619961612, None)
              precision    recall  f1-score   support

           0       0.39      0.03      0.06      2096
           1       0.49      0.95      0.65      2072

    accuracy                           0.49      4168
   macro avg       0.44      0.49      0.35      4168
weighted avg       0.44      0.49      0.35      4168



#### Manual Pytorch

In [6]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
# pose sequence as a NLI premise and label as a hypothesis
from transformers import AutoModelForSequenceClassification, AutoTokenizer
nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')
tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')

In [16]:
nli_model.to(device)

premise = sequence_to_classify[0]
label="1"
hypothesis = f'This example is {label}.'

# run through model pre-trained on MNLI
x = tokenizer.encode(premise, hypothesis, return_tensors='pt',
                     truncation_strategy='only_first')
logits = nli_model(x.to(device))[0]

# we throw away "neutral" (dim 1) and take the probability of
# "entailment" (2) as the probability of the label being true 
entail_contradiction_logits = logits[:,[0,2]]
probs = entail_contradiction_logits.softmax(dim=1)
prob_label_is_true = probs[:,1]




In [17]:
prob_label_is_true

tensor([0.5768], device='cuda:0', grad_fn=<SelectBackward0>)