Skip to content

lancopku/Avg-Avg

Repository files navigation

Code for OOD Detection Research in NLP

This is the official code of our EMNLP 2022 (Findings) paper Holistic Sentence Embeddings for Better Out-of-Distribution Detection and EACL 2023 (Findings) paper Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features.

This repository implements the OOD detection algorithms developed by us (Avg-Avg published at EMNLP 2022 and GNOME published at EACL 2023) along with the following baselines:

Algorithm Paper
MC Dropout Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (ICML 2016)
Maximum Softmax Probability A Baseline For Detecting Misclassfied and Out-of-Distribution Samples in Nerual Networks (ICML 2017)
ODIN Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks (ICLR 2018)
Maha Distance A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks (NIPS 2018)
LOF Deep Unknown Intent Detection with Margin Loss (ACL 2019)
Energy Score Energy-based Out-of-distribution Detection (NIPS 2020)
ContraOOD Contrastive Out-of-Distribution Detection for Pretrained Transformers (EMNLP 2021)
KNN Distance Out-of-Distribution Detection with Deep Nearest Neighbors (ICML 2022)
D2U D2U: Distance-to-Uniform Learning for Out-of-Scope Detection (NAACL 2022)

Requirements

Python: 3.7.9

To install the dependencies, run

pip install -r requirements.txt

For the datasets used in our papers, please download the nlp_ood_datasets.zip file from this Google Drive link and unzip it under the root directory (a 'dataset' directory will be created).

Training

Vanilla Training

Vanilla training with cross-entroy loss:

python train.py --model roberta-base --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16 

Supervised Contrastive Training

Add --loss_type scl or --loss_type margin to use the supervised contrastive auxiliary targets proposed in Contrastive Out-of-Distribution Detection for Pretrained Transformers:

python train.py --model roberta-base --loss scl --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16 

Training with RecAdam Regularization

Add '--optimizer recadam' to use the RecAdam optimizer:

python train.py --model roberta-base --optimizer recadam --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16 

Evaluation for OOD Detection

Avg-Avg, GNOME, Maha, and KNN

Feature Extraction

Extract features from a fine-tuned model first:

python extract_full_features.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --output_dir <YOUR_FT_DIR> --model roberta-base --pretrained_model <PATH_TO_FINETUNED_MODEL>

GNOME addtionally needs pre-trained features:

python extract_full_features.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --output_dir <YOUR_PRE_DIR> --model roberta-base
Test

Maha with last-cls pooled features:

python ood_test_embedding.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FT_DIR> --token_pooling cls --layer_pooling last

Avg-Avg (Ours, EMNLP 2022), i.e., Maha with avg-avg pooled features:

python ood_test_embedding.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FT_DIR> --token_pooling avg --layer_pooling avg

KNN (with the pooling way that you choose, cls-last by default):

python ood_test_embedding_knn.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FEARTURE_DIR>  --token_pooling <cls/avg> --layer_pooling <last/avg/a list of layer indexes like 1,2,11,12>

GNOME (Ours, EACL 2023) (--std for score normalization, --ensemble_way mean/min for choosing the aggregator mean or min):

python ood_test_embedding_gnome.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --ft_dir <YOUR_FT_DIR>  --pre_dir <YOUR_PRE_DIR> --std --ensemble_way mean

Note: Our algorithms Avg-Avg and GNOME are tested on the features extraced from the model trained with the vanilla entropy loss. For reproducing the results of Contrastive Out-of-Distribution Detection for Pretrained Transformers, just use the model trained with contrastive targets to extract features.

Other Algorithms

To test MSP (base) / Energy (energy) / D2U (d2u) / ODIN (odin) / LOF (lof) / MC Dropout (mc), just specify the method by passing it to the ood_method argument in the test.py script:

python test.py --dataset sst-2 --ood_datasets 20news,trec,wmt16  --model roberta-base --pretrained_model <PATH_TO_FINETUNED_MODEL> --ood_method base/energy/d2u/odin/lof/mc

Citation

If you find this repository to be useful for your research, please consider citing.

@inproceedings{chen-etal-2022-holistic,
    title = "Holistic Sentence Embeddings for Better Out-of-Distribution Detection",
    author = "Chen, Sishuo  and
      Bi, Xiaohan  and
      Gao, Rundong  and
      Sun, Xu",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.497",
    pages = "6676--6686"
}
@inproceedings{chen-etal-2023-fine,
    title = "Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features",
    author = "Chen, Sishuo  and
      Yang, Wenkai  and
      Bi, Xiaohan  and
      Sun, Xu",
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-eacl.41",
    pages = "564--579"
}

Acknowledgement

This repository relies on resources from FSSD_OoD_Detection, Huggingface Transformers, and RecAdam. We thank the original authors for their open-sourcing.

Releases

No releases published

Packages

No packages published

Languages