Learning as abduction

The theorem prover learns from Natural Language Inference (NLI) data via abductive reasoning.

Abductive reasoning is obtained by reversing a theorem-proving procedure. As a result, lexical/phrasal relations are learned that serve as the best explanation for the gold label of an inference problem. In other words, instead of proving sentence-level inference gold labels with the help of lexical/phrasal relations, the relations are proved taking into account the inference gold labels.

For example, given an NLI problem:
No person is touching a domestic animal ENTAILMENT There is no girl who is petting a cow
abduction allows to infer and learn that cow is a domestic animal.

Replicating the results of the Learning as Abduction paper

The following instructions were tested for a commit tagged with starSEM-20.

Output files

If you are only interested in the files with predictions for the SICK problems, these files can be found in results/starSEM_2020/TE/. .log files contain stdout of abduction and evaluation, while .ans.T and .ans.E contain predictions for SICK-train&trial and SICK-test, respectively.

Required

produce, a Make-like tool with python support 👍
OR discover the commands in results/starSEM_2020/produce.ini, (not recommended 👎)
LangPro, obviously
DepCCG, EasyCCG, and C&C (not the easiet route 👎)
OR simply use already parsed and prolog-formatted SICK sentences found in results/starSEM_2020/ccg_sen/ and ccg_sen_d/ 👍

Naming conventions

Data parts and parsers

T, D, and E stand for SICK-train (as Training), SICK-trail (as Development), and SICK-test (as Evaluation), respectively.
ccg denotes CCG derivations obtained from C&C with the rebanked model.
eccg denotes CCG derivations from EasyCCG with the standard model.
depccg.trihf.sep denotes CCG derivations from DepCCG with the standard triheadfirst model, where each SICK sentence is parsed separately. the DepCCG derivations use EasyCCG lemmatizer and the C&C named entity recognizer. This decision is not the best for performance but makes the settings comparable to Yanaka et al. (2018).

Theorem proving parameters

-k excludes the hand-crafted lexical relations.
rN sets rule application limit to N.
cN makes use of N threads for concurrent theorem proving. c0 uses all available threads.
w3 uses WordNet relations: synonymy, hypernymy/hyponymy, similar, derivation, and antonymy.
al forces alignment of indefinite NPs (like a man).
ch does consistency checking for sentences before proving relations.

Abduction parameters

ab builds a tableau tree for non-aligned LLFs only if the one for aligned LLFs fails.
p123 considers only terms of length 1, 2, or 3.
cKB check consistency of abduced relations wrt KB.
cT check comparability of terms in the abduced relations.

Running experiments

The following command runs Langpro with abductive learning for C&C derivations with 50 rule applications limits for all available threads. This saves a significant amount of time (compared to r800) and sacrifices a little accuracy. Note that the paper reports results for unseen SICK-test when using r800 and all the three parsers.

produce -f results/starSEM_2020/produce.ini  results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r50,c0_ab,ch,cKB,cT,p123.log

The command triggers train-evaluate scenario for C&C derivations (TE/ccg/). First, the prover is trained on TD (SICK-train & trail) via abduction, and then the prover uses the abduced/induced knowledge from TD to prove problems in E (SICK-test). Training on T+D and evaluation on E is signified by TD_E/.

The command creates following files in results/starSEM_2020/TE/ccg/TD_E: *.log contains stdout and stderr, *.ans_KB.pl lists the learned set of relations, and *.ans.E and .ans.T contain <problem ID, predicted label> pairs.

To run the abductive learning for derivations produced by other parsers, replace ccg with eccg or depccg.trihf.sep in the above command.

Scores

The scores of training and evaluation can be found in *.log and also obtained via running the evaluation script on predicted labels:

python3 python/evaluate.py --sys results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E  --gld SICK_dataset/SICK_test_annotated.txt

Evaluate aggregated predictions from different CCG parser-based LangPro versions as:

python3 python/evaluate.py --sys results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E results/starSEM_2020/TE/eccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E results/starSEM_2020/TE/depccg.trihf.sep/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E   --gld SICK_dataset/SICK_test_annotated.txt  --hybrid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly