This is the repository of the paper: "BECEL: Benchmark for Consistency Evaluation of Language Models (TBD)
-
data: BECEL datasets for 7 downstream tasks. Please refer README.md files in each data repository for more information.
- ag_news: includes additive and semantic consistency datasets.
- boolq: includes semantic and negational consistency datasets.
- mrpc: includes semantic, negational, and symmetric consistency datasets.
- rte: includes semantic, negational, and symmetric consistency datasets.
- snli: includes semantic, negational, symmetric, and transitive consistency datasets.
- sst2: includes additive and semantic consistency datasets are provided.
- wic: includes semantic, negational, symmetric, and transitive consistency datasets.
-
src: Scripts for evaluation metrics and examples.
@inproceedings{jang-etal-2022-becel,
title = "{BECEL}: Benchmark for Consistency Evaluation of Language Models",
author = "Jang, Myeongjun and
Kwon, Deuk Sin and
Lukasiewicz, Thomas",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = oct,
year = "2022",
address = "Gyeongju, Republic of Korea",
publisher = "International Committee on Computational Linguistics",
url = "https://aclanthology.org/2022.coling-1.324",
pages = "3680--3696",
}