# NLI Metric Evaluation

- New **focus** of the paper comes from a finding of the thesis:

  - "The proposed _contradiction rate metric_ turns out to be correlated with human
    judgement, so it is a good indicator of human prediction consistency and allows it to be assessed automatically without the need for annotations requiring significant human effort."
  - We would understand how entailment is correlated with human evaluation

- Example of the assessment:
  - Consider a pair of strings (input and adversarial example)
    - eg. ["it's a very <font color = green>valuable</font> film . . .", "it's a very <font color = red>inestimable</font> film . . ."]
  - We use the human annotations collected for the master's thesis as the _gold annotations_ (ignoring UNCLEAR annotations)
  - Then compute the automatic annotations using a pre-trained [NLI model](https://huggingface.co/cross-encoder/nli-deberta-v3-base) considering as:
    - INCONSISTENT if the model predicts CONTRADICTION (considering both directions, A->B and A<-B)
    - CONSISTENT in all other cases
  - Use [Cohen’s kappa](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html) as statistic to measures inter-annotator agreement (0.6+ is good)


In [1]:
from sentence_transformers import CrossEncoder
from sklearn.metrics import cohen_kappa_score
from pathlib import Path
import pandas as pd
import re
from tqdm.auto import tqdm

In [38]:
class AnnotationEvaluator:
    def __init__(self, 
                 annotation_file_path: str,
                 annotation_to_idx: dict,
                 nli_model: str = "cross-encoder/nli-deberta-v3-base",
                 attack_col: str = "Attack",
                 annotation_col: str = "Annotation",
                 remove_unclear: bool = True
                 ):
        """
        Provides a simple interface to perform an automated evaluation of the attack and compare it with the gold annotations.

        Parameters
        ----------
        annotation_file_path: str
            Path to the gold annotation file. The file should be a csv file with two columns: "Attack" and "Annotation".
        annotation_to_idx: dict
            The labels that are used for the annotation. The labels should be strings and match the labels in the gold annotation file.
        nli_model: str
            The name of the NLI model to use for the evaluation. The model should be a cross-encoder model.
        attack_col: str
            The name of the column in the gold annotation file that contains the attack.
        annotation_col: str
            The name of the column in the gold annotation file that contains the gold annotation.
        remove_unclear: bool
            If True, the unclear annotations are removed from the gold annotations, since they cannot be predicted by the NLI model.
        """
        self.annotation_to_idx = annotation_to_idx
        self.attack_col = attack_col
        self.annotation_col = annotation_col
        self.nli_model = CrossEncoder(nli_model)
        self.nli_label_to_idx = {"contradiction": 0, "entailment": 1, "neutral": 2}

        self.gold_annotations = pd.read_csv(annotation_file_path)
        self.gold_annotations = self.__process_gold_annotations()
        if remove_unclear:
            self.gold_annotations = self.gold_annotations[self.gold_annotations["gold_annotation"] != self.annotation_to_idx["UNCLEAR"]]


    def __separate_original_perturbed(self, text: str):
        """
        Splits the text into original and perturbed sentence.
        """
        text = re.sub("<.*?>", "", text) # remove html tags
        
        # Split text into original and perturbed 
        original, perturbed = text.split("Perturbed:")
        original = original.split("Original:")[1].strip()
        perturbed = perturbed.strip()
        return original, perturbed

    def __process_gold_annotations(self):
        """
        Processes the gold annotations and creates a datafra,e with the original and perturbed sentence separated and the gold annotation as idx.
        """
        df = pd.DataFrame(columns=["original", "perturbed"])

        df['original'], df['perturbed'] = zip(*self.gold_annotations[self.attack_col].map(self.__separate_original_perturbed))
        df['gold_annotation'] = self.gold_annotations[self.annotation_col].map(self.annotation_to_idx)
        return df
    
    def get_gold_annotations(self):
        """
        Returns the list gold annotations.
        """
        return self.gold_annotations["gold_annotation"].tolist()
    
    def __preds_to_annotations(self, preds, preds_reverse):
        """
        Converts the NLI predictions to annotation labels.
        A prediction is considered inconsistent if the nli model predicts "contradiction" for one of the two directions of entailment.
        In all other cases the prediction is considered consistent.

        Parameters
        ----------
        preds: list
            The predictions of the NLI model for (original sentence -> perturbed sentence)
        preds_reverse: list
            The predictions of the NLI model for (perturbed sentence -> original sentence)
        """
        nli_annotations = []
        for pred, pred_reverse in zip(preds, preds_reverse):
            if (pred == self.nli_label_to_idx["contradiction"]) or (pred_reverse == self.nli_label_to_idx["contradiction"]):
                nli_annotations.append(self.annotation_to_idx["INCONSISTENT"])
            else:
                nli_annotations.append(self.annotation_to_idx["CONSISTENT"])
        return nli_annotations

    def get_nli_annotations(self):
        """
        Returns the NLI automatic annotations
        """
        inputs = list(self.gold_annotations[['original', 'perturbed']].itertuples(index=False, name=None))
        nli_scores = self.nli_model.predict(inputs)
        nli_preds = nli_scores.argmax(axis=1)

        inputs_reverse = list(self.gold_annotations[['perturbed', 'original']].itertuples(index=False, name=None))
        nli_scores_reverse = self.nli_model.predict(inputs_reverse)
        nli_preds_reverse = nli_scores_reverse.argmax(axis=1)

        return self.__preds_to_annotations(nli_preds, nli_preds_reverse)


In [39]:
annotation_paths = list(Path('annotations/thesis_gold/').glob('*.csv'))

annotation_to_idx = {
    "CONSISTENT" : 0,
    "INCONSISTENT" : 1,
    "UNCLEAR" : 2,
}


In [40]:
gold_annotations = []
nli_annotations = []

for annotation_path in tqdm(annotation_paths):
    evaluator = AnnotationEvaluator(annotation_path, annotation_to_idx, remove_unclear=True)

    gold_annotations += evaluator.get_gold_annotations()
    nli_annotations += evaluator.get_nli_annotations()

  0%|          | 0/3 [00:00<?, ?it/s]

In [41]:
print(f"Total annotations processed: {len(gold_annotations)}")

Total annotations processed: 264


In [42]:
print(f"Cohen's kappa: {cohen_kappa_score(gold_annotations, nli_annotations)}")

Cohen's kappa: 0.6050269299820467


In [43]:
gold_annotations = pd.Series(gold_annotations)
nli_annotations = pd.Series(nli_annotations)
idx_diff = gold_annotations != nli_annotations

wrong_predictions = sum(idx_diff)
inconsistent_consistent = sum((gold_annotations[idx_diff] == 1) & (nli_annotations[idx_diff] == 0))
consistent_inconsistent = sum((gold_annotations[idx_diff] == 0) & (nli_annotations[idx_diff] == 1))

print(f"Wrong predictions: {wrong_predictions}")
print(f"Inconsistent predicted as Consistent: {inconsistent_consistent}")
print(f"Consistent predicted as Inconsistent: {consistent_inconsistent}")

Wrong predictions: 40
Inconsistent predicted as Consistent: 7
Consistent predicted as Inconsistent: 33
