-
Notifications
You must be signed in to change notification settings - Fork 12
Learning as abduction
The theorem prover learns from Natural Language Inference (NLI) data via abductive reasoning.
Abductive reasoning is obtained by reversing a theorem-proving procedure. As a result, lexical/phrasal relations are learned that serve as the best explanation for the gold label of an inference problem. In other words, instead of proving sentence-level inference gold labels with the help of lexical/phrasal relations, the relations are proved taking into account the inference gold labels.
For example, given an NLI problem:
No person is touching a domestic animal
ENTAILMENT There is no girl who is petting a cow
abduction allows to infer and learn that cow
is a domestic animal
.
Replicating the results of the Learning as Abduction paper
The following instructions were tested for a commit tagged with starSEM-20
.
If you are only interested in the files with predictions for the SICK problems,
these files can be found in results/starSEM_2020/TE/
.
.log
files contain stdout of abduction and evaluation, while .ans.T
and .ans.E
contain predictions for SICK-train&trial and SICK-test, respectively.
-
produce, a Make-like tool with python support 👍
OR discover the commands inresults/starSEM_2020/produce.ini
, (not recommended 👎) - LangPro, obviously
-
DepCCG, EasyCCG, and C&C (not the easiet route 👎)
OR simply use already parsed and prolog-formatted SICK sentences found inresults/starSEM_2020/ccg_sen/
andccg_sen_d/
👍
Data parts and parsers
-
T
,D
, andE
stand for SICK-train (asT
raining), SICK-trail (asD
evelopment), and SICK-test (asE
valuation), respectively. -
ccg
denotes CCG derivations obtained from C&C with the rebanked model. -
eccg
denotes CCG derivations from EasyCCG with the standard model. -
depccg.trihf.sep
denotes CCG derivations from DepCCG with the standard triheadfirst model, where each SICK sentence is parsed separately. the DepCCG derivations use EasyCCG lemmatizer and the C&C named entity recognizer. This decision is not the best for performance but makes the settings comparable to Yanaka et al. (2018).
Theorem proving parameters
-
-k
excludes the hand-crafted lexical relations. -
rN
sets rule application limit toN
. -
cN
makes use ofN
threads for concurrent theorem proving.c0
uses all available threads. -
w3
uses WordNet relations: synonymy, hypernymy/hyponymy, similar, derivation, and antonymy. -
al
forces alignment of indefinite NPs (likea man
). -
ch
does consistency checking for sentences before proving relations.
Abduction parameters
-
ab
builds a tableau tree for non-aligned LLFs only if the one for aligned LLFs fails. -
p123
considers only terms of length 1, 2, or 3. -
cKB
check consistency of abduced relations wrt KB. -
cT
check comparability of terms in the abduced relations.
The following command runs Langpro with abductive learning for C&C derivations with 50 rule applications limits for all available threads.
This saves a significant amount of time (compared to r800
) and sacrifices a little accuracy.
Note that the paper reports results for unseen SICK-test when using r800
and all the three parsers.
produce -f results/starSEM_2020/produce.ini results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r50,c0_ab,ch,cKB,cT,p123.log
The command triggers train-evaluate scenario for C&C derivations (TE/ccg/
).
First, the prover is trained on TD
(SICK-train & trail) via abduction, and then the prover uses the abduced/induced knowledge from TD
to prove problems in E
(SICK-test).
Training on T+D
and evaluation on E
is signified by TD_E/
.
The command creates following files in results/starSEM_2020/TE/ccg/TD_E
: *.log
contains stdout and stderr, *.ans_KB.pl
lists the learned set of relations, and *.ans.E
and .ans.T
contain <problem ID, predicted label> pairs.
To run the abductive learning for derivations produced by other parsers, replace ccg
with eccg
or depccg.trihf.sep
in the above command.
The scores of training and evaluation can be found in *.log
and also obtained via running the evaluation script on predicted labels:
python3 python/evaluate.py --sys results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E --gld SICK_dataset/SICK_test_annotated.txt
Evaluate aggregated predictions from different CCG parser-based LangPro versions as:
python3 python/evaluate.py --sys results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E results/starSEM_2020/TE/eccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E results/starSEM_2020/TE/depccg.trihf.sep/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E --gld SICK_dataset/SICK_test_annotated.txt --hybrid