This repo contains the scripts to build the Temporal NLI dataset and also to run different models on it as described in the following paper:
Vashishtha, Siddharth, Adam Poliak, Yash Kumar Lal, Benjamin Van Durme, Aaron Steven White. Temporal Reasoning in Natural Language Inference. Findings of the Association for Computational Linguistics: EMNLP 2020, November, 2020.
@inproceedings{vashishtha-etal-2020-temporal,
title = "Temporal Reasoning in Natural Language Inference",
author = "Vashishtha, Siddharth and
Poliak, Adam and
Lal, Yash Kumar and
Van Durme, Benjamin and
White, Aaron Steven",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.363",
pages = "4070--4078",
abstract = "We introduce five new natural language inference (NLI) datasets focused on temporal reasoning. We recast four existing datasets annotated for event duration{---}how long an event lasts{---}and event ordering{---}how events are temporally arranged{---}into more than one million NLI examples. We use these datasets to investigate how well neural models trained on a popular NLI corpus capture these forms of temporal reasoning.",
}
We use pipenv
to run our scripts in a Python virtualenv. You can replicate the environment by cloning this repo and running the following from the root dir of this repo:
pipenv install --ignore-pipfile
If you don't have pipenv, you can install it by running:
pip install pipenv
There are two steps to creating our recasted datasets:
To train on our models from scratch or to use our best models, follow instructions here. Our saved roberta models can be downloaded by following instructions here
We made the following updates to our recasted data from the first published version:
- To get the verb inflections, we use English Unimorph. If any inflection is not found in Unimorph, we back-off to LemmInflect
- We added copular predicates from TempEval3, TimeBank-Dense, and RED corpus to our recasted data. We parse each corpus through Stanza to get the dependency trees of sentences in the corpus and then generate hypothesis for the NLI pair using rules described in the paper.