# Exploring ANLI Dataset

The https://huggingface.co/datasets/facebook/anli dataset was introduced in "Adversarial NLI: A New Benchmark for Natural Language Understanding" (Yixin et al, ACL 2020) https://aclanthology.org/2020.acl-main.441/ as a hard natural language entailment dataset.

It was created by collecting pairs (premise, hypothesis) that existing models could not properly categorize as entailment/contradiction or neutral.  For such cases, human annotators were asked to provide an explanation for their tagging decision.



In [1]:
from datasets import load_dataset

dataset = load_dataset("facebook/anli")
dataset = dataset.filter(lambda x: x['reason'] != None and x['reason'] != "")

In [2]:
dataset

DatasetDict({
    train_r1: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 2923
    })
    dev_r1: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 1000
    })
    test_r1: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 1000
    })
    train_r2: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 4861
    })
    dev_r2: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 1000
    })
    test_r2: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 1000
    })
    train_r3: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 13375
    })
    dev_r3: Dataset({
        features: ['uid', 'premise', 'hypothesis', 'label', 'reason'],
        num_rows: 1200


In [3]:
dataset['test_r1'][0]

{'uid': '4aae63a8-fcf7-406c-a2f3-50c31c5934a9',
 'premise': 'Ernest Jones is a British jeweller and watchmaker. Established in 1949, its first store was opened in Oxford Street, London. Ernest Jones specialises in diamonds and watches, stocking brands such as Gucci and Emporio Armani. Ernest Jones is part of the Signet Jewelers group.',
 'hypothesis': 'The first Ernest Jones store was opened on the continent of Europe.',
 'label': 0,
 'reason': "The first store was opened in London, which is in Europe. It may have been difficult for the system because continents weren't mentioned."}

In https://aclanthology.org/2023.findings-eacl.162/ (Kavumba et al, EACL 2023), experiments demonstrate that when LLMs are prompted to classify a pair (premise, hypothesis) as entailment/contradiction/neutral, they perform better if the prompt also requires an explanation to justify the selected label.  

In order for the explanation to be helpful, though, it must be a "relevant" explanation, that is, a sentence that is related semantically to the premise and the hypothesis. 

Your mission in this question is to reproduce these empirical observations.