Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets (GD-NLI)

This repository hosts the data and code of the paper Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets published at ACL 2022.

Download our Generated Debiased NLI (GD-NLI) Datasets
Code and Models
Citing

Download our Generated Debiased NLI (GD-NLI) Datasets

Our synthetic debiased NLI datasets (GD-NLI) can be downloaded using the following links. All data files follow jsonline format. Each line contains a json object with the standard NLI fields premise, hypothesis, label, and a metadata field type that indicates whether the sample comes from the original dataset or is generated.

Dataset	#Samples	Link
All data	-	zip
SNLI Z-Aug	1,142,475	jsonl
SNLI Seq-Z	933,085	jsonl
SNLI Par-Z	927,906	jsonl
MNLI Z-Aug	744,326	jsonl
MNLI Seq-Z	740,811	jsonl
MNLI Par-Z	744,200	jsonl

Code and Models

Installation and setup

conda create --name gdnli python=3.7 && conda activate gdnli
git clone https://github.com/jimmycode/gen-debiased-nli
cd gen-debiased-nli
. scripts/init.sh

Download data

You can download GD-NLI datasets with the links provided above, or run the following script:

. scripts/get_data.sh

We use SNLI-hard, MNLI-hard, HANS, and an adversarial attack suite to evaluate our models. We also provide a script to download these evaluation datasets.

. scripts/get_eval_data.sh

Training with our datasets

python scripts/train_nli_synthetic.py \
  --exp_name bert-base_mnli_z-aug \
  --train_data data/gen-debiased-nli/mnli_z-aug \
  --dev_data <path to MNLI dev> --data_dir data/ \
  --model_name_or_path bert-base-uncased \
  --max_length 128 --per_device_train_batch_size 32 --per_device_eval_batch_size 16 \
  --learning_rate 1e-5 --num_train_epochs 5 --lr_scheduler_type constant_with_warmup --num_warmup_steps 2000 \
  --output_dir <your output dir> \
  --seed 42 --do_predict --do_train

Training PoE models with our datasets

python scripts/train_nli_poe.py \
  --exp_name bert-base_mnli_z-aug_poe \
  --train_data data/gen-debiased-nli/mnli_z-aug.jsonl \
  --dev_data <path to MNLI dev> --data_dir data/ \
  --model_name_or_path bert-base-uncased \
  --max_length 128 --per_device_train_batch_size 32 --per_device_eval_batch_size 16 \
  --learning_rate 2e-5 --num_train_epochs 3 --lr_scheduler_type linear --num_warmup_steps 1000 \
  --output_dir <your output dir> \
  --poe_loss --poe_alpha 2.0 --lambda_h 1.0 --nonlinear_h_classifier deep \
  --seed 42 --do_predict --do_train

Citing

Please use the following bibtex to cite our work:

@inproceedings{gen-debiased-nli-2022,
    title = "Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets",
    author = "Wu, Yuxiang  and
      Gardner, Matt  and
      Stenetorp, Pontus  and
      Dasigi, Pradeep",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    month = may,
    year = "2022",
    publisher = "Association for Computational Linguistics",
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
gen_debiased_nli		gen_debiased_nli
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gen_debiased_nli

gen_debiased_nli

scripts

scripts

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets (GD-NLI)

Download our Generated Debiased NLI (GD-NLI) Datasets

Code and Models

Installation and setup

Download data

Training with our datasets

Training PoE models with our datasets

Citing

About

Releases

Packages

Languages

jimmycode/gen-debiased-nli

Folders and files

Latest commit

History

Repository files navigation

Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets (GD-NLI)

Download our Generated Debiased NLI (GD-NLI) Datasets

Code and Models

Installation and setup

Download data

Training with our datasets

Training PoE models with our datasets

Citing

About

Resources

Stars

Watchers

Forks

Languages