GitHub - lancopku/Avg-Avg: [Findings of EMNLP 2022] Holistic Sentence Embeddings for Better Out-of-Distribution Detection

Code for OOD Detection Research in NLP

This is the official code of our EMNLP 2022 (Findings) paper Holistic Sentence Embeddings for Better Out-of-Distribution Detection and EACL 2023 (Findings) paper Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features.

This repository implements the OOD detection algorithms developed by us (Avg-Avg published at EMNLP 2022 and GNOME published at EACL 2023) along with the following baselines:

Algorithm	Paper
MC Dropout	Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (ICML 2016)
Maximum Softmax Probability	A Baseline For Detecting Misclassfied and Out-of-Distribution Samples in Nerual Networks (ICML 2017)
ODIN	Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks (ICLR 2018)
Maha Distance	A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks (NIPS 2018)
LOF	Deep Unknown Intent Detection with Margin Loss (ACL 2019)
Energy Score	Energy-based Out-of-distribution Detection (NIPS 2020)
ContraOOD	Contrastive Out-of-Distribution Detection for Pretrained Transformers (EMNLP 2021)
KNN Distance	Out-of-Distribution Detection with Deep Nearest Neighbors (ICML 2022)
D2U	D2U: Distance-to-Uniform Learning for Out-of-Scope Detection (NAACL 2022)

Requirements

Python: 3.7.9

To install the dependencies, run

pip install -r requirements.txt

For the datasets used in our papers, please download the nlp_ood_datasets.zip file from this Google Drive link and unzip it under the root directory (a 'dataset' directory will be created).

Training

Vanilla Training

Vanilla training with cross-entroy loss:

python train.py --model roberta-base --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16

Supervised Contrastive Training

Add --loss_type scl or --loss_type margin to use the supervised contrastive auxiliary targets proposed in Contrastive Out-of-Distribution Detection for Pretrained Transformers:

python train.py --model roberta-base --loss scl --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16

Training with RecAdam Regularization

Add '--optimizer recadam' to use the RecAdam optimizer:

python train.py --model roberta-base --optimizer recadam --output_dir <YOUR_DIR> --seed 13 --dataset sst-2 --log_file <YOUR_LOG_FILE> --lr 2e-5 --epochs 5 --batch_size 16

Evaluation for OOD Detection

Avg-Avg, GNOME, Maha, and KNN

Feature Extraction

Extract features from a fine-tuned model first:

python extract_full_features.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --output_dir <YOUR_FT_DIR> --model roberta-base --pretrained_model <PATH_TO_FINETUNED_MODEL>

GNOME addtionally needs pre-trained features:

python extract_full_features.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --output_dir <YOUR_PRE_DIR> --model roberta-base

Test

Maha with last-cls pooled features:

python ood_test_embedding.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FT_DIR> --token_pooling cls --layer_pooling last

Avg-Avg (Ours, EMNLP 2022), i.e., Maha with avg-avg pooled features:

python ood_test_embedding.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FT_DIR> --token_pooling avg --layer_pooling avg

KNN (with the pooling way that you choose, cls-last by default):

python ood_test_embedding_knn.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --input_dir <YOUR_FEARTURE_DIR>  --token_pooling <cls/avg> --layer_pooling <last/avg/a list of layer indexes like 1,2,11,12>

GNOME (Ours, EACL 2023) (--std for score normalization, --ensemble_way mean/min for choosing the aggregator mean or min):

python ood_test_embedding_gnome.py --dataset sst-2 --ood_datasets 20news,trec,wmt16 --ft_dir <YOUR_FT_DIR>  --pre_dir <YOUR_PRE_DIR> --std --ensemble_way mean

Note: Our algorithms Avg-Avg and GNOME are tested on the features extraced from the model trained with the vanilla entropy loss. For reproducing the results of Contrastive Out-of-Distribution Detection for Pretrained Transformers, just use the model trained with contrastive targets to extract features.

Other Algorithms

To test MSP (base) / Energy (energy) / D2U (d2u) / ODIN (odin) / LOF (lof) / MC Dropout (mc), just specify the method by passing it to the ood_method argument in the test.py script:

python test.py --dataset sst-2 --ood_datasets 20news,trec,wmt16  --model roberta-base --pretrained_model <PATH_TO_FINETUNED_MODEL> --ood_method base/energy/d2u/odin/lof/mc

Citation

If you find this repository to be useful for your research, please consider citing.

@inproceedings{chen-etal-2022-holistic,
    title = "Holistic Sentence Embeddings for Better Out-of-Distribution Detection",
    author = "Chen, Sishuo  and
      Bi, Xiaohan  and
      Gao, Rundong  and
      Sun, Xu",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-emnlp.497",
    pages = "6676--6686"
}
@inproceedings{chen-etal-2023-fine,
    title = "Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features",
    author = "Chen, Sishuo  and
      Yang, Wenkai  and
      Bi, Xiaohan  and
      Sun, Xu",
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-eacl.41",
    pages = "564--579"
}

Acknowledgement

This repository relies on resources from FSSD_OoD_Detection, Huggingface Transformers, and RecAdam. We thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
lib		lib
LICENSE		LICENSE
README.md		README.md
extract_full_features.py		extract_full_features.py
ood_test_embedding.py		ood_test_embedding.py
ood_test_embedding_gnome.py		ood_test_embedding_gnome.py
ood_test_embedding_knn.py		ood_test_embedding_knn.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

License

lancopku/Avg-Avg

Folders and files

Latest commit

History

Repository files navigation

Code for OOD Detection Research in NLP

Requirements

Training

Vanilla Training

Supervised Contrastive Training

Training with RecAdam Regularization

Evaluation for OOD Detection

Avg-Avg, GNOME, Maha, and KNN

Feature Extraction

Test

Other Algorithms

Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages