This is the data and code for the paper:
"SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference"
Martin Schmitt and Hinrich Schütze. ACL 2019. paper
Additional material (e.g., slides of the talk) can be found here.
@inproceedings{schmitt2019sherliic,
title = "{S}her{LI}i{C}: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference",
author = {Schmitt, Martin and
Sch{\"u}tze, Hinrich},
booktitle = "Proceedings of the 57th Conference of the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1086",
pages = "902--914"
}
The SherLIiC resources can be downloaded from here.
You should extract the archive to the folder data
.
The embedding files in embeddings/filtered
only contain embeddings for the relations in SherLIiC-dev and SherLIiC-test.
Full embedding files (i.e. embeddings for all entities and relations in the whole SherLIiC event graph) can be downloaded here.
To see a list of all available baselines, run:
python3 code/baselines.py list_baselines
python3 code/baselines.py single --help
Usage: baselines.py single [OPTIONS] BASELINE [DATASET]...
Runs BASELINE on (a list of) DATASETs (See `list_baselines` for a list of
available baselines).
Options:
--examples / --no-examples Whether or not to print example predictions
(default: false).
--use-lemma / --no-lemma Whether to make predictions on top of `Lemma`
baseline or not. (default: use it)
--rounding / --no-rounding Whether or not results should be rounded
(default: true).
-t, --threshold FLOAT Threshold for tunable baselines (default: 0.5);
ignored for non-tunable baselines.
--help Show this message and exit.
To evaluate all non-tunable baselines on dev and test, run:
python3 code/baselines.py non_tunables data/dev.csv data/test.csv
The results will be stored to non-tunable-dev.txt
and non-tunable-test.txt
.
For more options, see python3 code/baselines.py non_tunables --help
.
To tune all tunable baselines on dev and then evaluate on dev and test, run:
python3 code/baselines.py tunables data/dev.csv data/test.csv
The results will be stored to tunable-devtest.txt
.
For more options see python3 code/baselines.py tunables --help
.
To find the F1-optimal threshold for a single baseline on a given dataset (which should be dev, of course), run:
python3 code/baselines.py find_threshold DATASET BASELINE
Example:
python3 code/baselines.py find_threshold data/dev.csv typed_rel_emb
For the baseline w2v+tsg_rel_emb
, the effectiveness of type-informed vs. unrestricted (untyped) relation embeddings has to be determined before-hand.
For this run:
python3 code/baselines.py tsg_pref data/dev.csv tsg_typed_vs_untyped.txt
This will store the type signature preferences in the file tsg_typed_vs_untyped.txt
.
To use this file, you have to enter the path to it in file_paths.json
.
A precomputed file is available in data/tsg_typed_vs_untyped_thr0.0-only-dev.txt
.
So w2v+tsg_rel_emb
can be used right away without any preprocessing necessary (i.e., other than downloading the pretrained word2vec embeddings).
If you want to qualitatively analyze errors made by several tunable baselines on a specific dataset (which should be dev), you can run
python3 code/baselines.py error_analysis DATASET OUT_FILE [METHODS]...
where results will be written to OUT_FILE
. You can specify as many METHODS
as you want.
The method to convert the SherLIiC Event Graph to a training corpus suitable for word2vec is given by the command create_training
:
python3 code/baselines.py create_training data/teg.tsv typed_rel_emb_train.txt
The command above would create the training corpus used for learning the embeddings in embeddings/complete/typed_rel_emb.txt
, i.e., the embeddings for the baseline typed_rel_emb
.
See python3 code/baselines.py create_training --help
for more options.
In order to use the word2vec baseline (and all baselines building on it), you have to
- Download the pretrained word embeddings from here.
- Enter the path to the gzipped file in
file_paths.json
under the keyword2vec
.
In order to reproduce the rule collection baselines, you have to download them,
sometimes run a preprocessing script and enter the path to the right file into file_paths.json
. Find specific instructions below.
- Download
ACL2011Resource.zip
from here. - Unzip it and enter the path to
ResourceEdges.txt
infile_paths.json
under the keyberant
.
- Download
reverb_local_global.rar
from here. - Extract the archive and enter the path to
reverb_local_clsf_all.txt
infile_paths.json
under the keyberant_new
.
- Download PPDB 2.0 XXXL All from here.
- Run
python3 code/preprocess.py ppdb path/to/ppdb-2.0-xxxl-all.gz external/ppdb.csv
- Enter the path
external/ppdb.csv
infile_paths.json
under the keyppdb
.
- Download
patty-dataset-freebase.tar.gz
from here. - Extract the archive and enter the paths to
wikipedia-patterns.txt
andwikipedia-subsumptions.txt
as two-element list infile_paths.json
under the keypatty
.
- Download
sherlockrules.zip
from here. - Extract the archive, enter the folder
sherlockrules
and run
cat sherlockrules.* | grep -v '^#' | cut -f1,2,9 | grep -v '2\.0' | cut -f1,3 > sherlockrules.all
- Enter the path to
sherlockrules.all
infile_paths.json
under the keyschoenmackers
.
- Download
resource.zip
from here. - Unzip it and enter the path to
rules.tsv
infile_paths.json
under the keychirps
.
Once you have downloaded all resources and put the right paths into file_paths.json
,
you should be able to run this baseline, too.