Learning as abduction

The theorem prover learns from Natural Language Inference (NLI) data via abductive reasoning.

Abductive reasoning is obtained by reversing a theorem-proving procedure. As a result, lexical/phrasal relations are learned that serve as the best explanation for the gold label of an inference problem. In other words, instead of proving sentence-level inference gold labels with the help of lexical/phrasal relations, the relations are proved taking into account the inference gold labels.

For example, given an NLI problem:
No person is touching a domestic animal ENTAILMENT There is no girl who is petting a cow
abduction allows to infer and learn that cow is a domestic animal.

Replicating the results of the Learning as Abduction paper

The following instructions were tested for a commit tagged with starSEM-20.

Output files

If you are only interested in the files with predictions for the SICK problems, these files can be found in results/starSEM_2020/TE/. .log files contain stdout of abduction and evaluation, while .ans.T and .ans.E contain predictions for SICK-train&trial and SICK-test, respectively.


  • produce, a Make-like tool with python support 👍
    OR discover the commands in results/starSEM_2020/produce.ini, (not recommended 👎)
  • LangPro, obviously
  • DepCCG, EasyCCG, and C&C (not the easiet route 👎)
    OR simply use already parsed and prolog-formatted SICK sentences found in results/starSEM_2020/ccg_sen/ and ccg_sen_d/ 👍

Naming conventions

Data parts and parsers

  • T, D, and E stand for SICK-train (as Training), SICK-trail (as Development), and SICK-test (as Evaluation), respectively.
  • ccg denotes CCG derivations obtained from C&C with the rebanked model.
  • eccg denotes CCG derivations from EasyCCG with the standard model.
  • depccg.trihf.sep denotes CCG derivations from DepCCG with the standard triheadfirst model, where each SICK sentence is parsed separately. the DepCCG derivations use EasyCCG lemmatizer and the C&C named entity recognizer. This decision is not the best for performance but makes the settings comparable to Yanaka et al. (2018).

Theorem proving parameters

  • -k excludes the hand-crafted lexical relations.
  • rN sets rule application limit to N.
  • cN makes use of N threads for concurrent theorem proving. c0 uses all available threads.
  • w3 uses WordNet relations: synonymy, hypernymy/hyponymy, similar, derivation, and antonymy.
  • al forces alignment of indefinite NPs (like a man).
  • ch does consistency checking for sentences before proving relations.

Abduction parameters

  • ab builds a tableau tree for non-aligned LLFs only if the one for aligned LLFs fails.
  • p123 considers only terms of length 1, 2, or 3.
  • cKB check consistency of abduced relations wrt KB.
  • cT check comparability of terms in the abduced relations.

Running experiments

The following command runs Langpro with abductive learning for C&C derivations with 50 rule applications limits for all available threads. This saves a significant amount of time (compared to r800) and sacrifices a little accuracy. Note that the paper reports results for unseen SICK-test when using r800 and all the three parsers.

produce -f results/starSEM_2020/produce.ini  results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r50,c0_ab,ch,cKB,cT,p123.log 

The command triggers train-evaluate scenario for C&C derivations (TE/ccg/). First, the prover is trained on TD (SICK-train & trail) via abduction, and then the prover uses the abduced/induced knowledge from TD to prove problems in E (SICK-test). Training on T+D and evaluation on E is signified by TD_E/.

The command creates following files in results/starSEM_2020/TE/ccg/TD_E: *.log contains stdout and stderr, * lists the learned set of relations, and *.ans.E and .ans.T contain <problem ID, predicted label> pairs.

To run the abductive learning for derivations produced by other parsers, replace ccg with eccg or depccg.trihf.sep in the above command.


The scores of training and evaluation can be found in *.log and also obtained via running the evaluation script on predicted labels:

python3 python/ --sys results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E  --gld SICK_dataset/SICK_test_annotated.txt

Evaluate aggregated predictions from different CCG parser-based LangPro versions as:

python3 python/ --sys results/starSEM_2020/TE/ccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E results/starSEM_2020/TE/eccg/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E results/starSEM_2020/TE/depccg.trihf.sep/TD_E/al,ch,w3,-k,r800,c20_ab,ch,cKB,cT,p123.ans.E   --gld SICK_dataset/SICK_test_annotated.txt  --hybrid