This repository contains the data for our Multilingual Benchmark for Probing Negation-Awareness with Minimal Pairs (CoNLL 2021). The idea of our work is to provide multilingual datasets that allow to investigate the extent to which pre-trained language models are aware of the semantics of negation markers. The datasets are manually derived from the multilingual XNLI datasets, and consist of minimal pairs of NLI examples that only differ in the presence/absence of a negation marker. For more information, checkout the associated video and poster.
The repository contains the following data:
- the lists of negation cues used to select NLI examples for the minimal pairs
- the datasets of minimal pairs resulting from negation removal
We suggest to evaluate negation awareness on minimal pairs by comparing the fraction of correct predictions on the original NLI example (correct prediction for first element of the minimal pair) with the fraction of correct predictions on the original NLI example AND the corresponding modified NLI example (correct predictions for both elements of the minimal pair). To replicate the results in our paper:
-
Install required packages.
- pytorch
- transformers
- scikit-learn
- seaborn
The code has been tested for
python==3.8
,pytorch==1.7.1
andtransformers==4.3.2
-
Download the BERT model fine-tuned on MNLI data from here. Put it in the
./trained_models
folder and unzip the file. -
Get predictions for minimal pairs by running
./eval_scripts/run_predict_multilingual.sh
, which calls the python code for model evaluation incode/training/predict_nli.py
. The predictions are written to the./results
folder. -
Compute difference in performance on
orig
andorig AND modified
examples using./eval_scripts/compute_performance_loss.py
. By uncommenting the last two lines in the script, you can generate the barplots shown in Figure 4 in the paper.
If you have questions or comments, please contact the corresponding author at mrkhartmann4@gmail.com
(Mareike Hartmann).