Skip to content

qikunxun/LSTK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Code and data for the paper "Learning from Both Structural and Textual Knowledge for Inductive Knowledge Graph Completion"

Prerequisites

  • Python 3.8
  • pytorch==1.10.0
  • TensorFlow==1.15.0 (for LSTK-NeuralLP and LSTK-DRUM)

Datasets

We use three datasets in our experiments.

Dataset Download Link (original)
HacRED https://github.com/qiaojiim/HacRED
DocRED https://github.com/thunlp/DocRED
BioRel https://bit.ly/biorel_dataset

Models

We use four models in our experiments.

Model Code Download Link (original)
NeuralLP https://github.com/fanyangxyz/Neural-LP
DRUM https://github.com/alisadeghian/DRUM
RNNLogic https://github.com/DeepGraphLearning/RNNLogic
TELM This work

Use examples

The first stage

LSTK is a two-stage framework. In the first stage, it aims at generating a set of soft triples for reasoning.

You can generate a set of soft triples by:

Path for code: src_nli

  1. training a textual entailment model:
python main_nli.py [dataset]
  1. Searching triples with corresponding texts:
python generate_triples_by_index.py [dataset]
  1. If the dataset is in Chinese, please use:
python generate_triples_by_index_zh.py [dataset]
  1. Appling the trained textual entailment model to generate soft triples:
python apply_model_nli.py [dataset]

After the above process, you can get three files (train/valid/test_triple_scores.txt) storing soft triples.

You can also directly download our processed soft triples:

Dataset Download Link (processed)
HacRED Google Drive
DocRED Google Drive
BioRel Google Drive

The second stage

In the second stage, you can use the generated soft triples to train SOTA neural approximate rule-based models.

LSTK-TELM

Path for code: src/LSTK-TELM

The script for both training and evaluation on the HacRED dataset is:

sh run_hacred.sh

The script for both training and evaluation on the HacRED dataset is:

sh run_docred.sh

The script for both training and evaluation on the BioRel dataset is:

sh run_biorel.sh

The script for rule extraction is:

sh run_rules.sh [dataset]

We also provide the runing scripts of baseline methods:

LSTK-NeuralLP and LSTK-DRUM

Path for code: src/LSTK-NeuralLP or src/LSTK-DRUM

The training script is:

python -u src/main.py --datadir=[dataset]/ --exp_name=[dataset] --num_step 4 --gpu 0 --exps_dir exps --max_epoch 100 --seed 1234

The evaluation script is:

sh eval/collect_all_facts.sh [dataset]

python eval/get_truths.py [dataset]

python eval/evaluate.py --preds=exps/[dataset]/test_predictions.txt --truths=[dataset]/truths.pckl

LSTK-RNNLogic

Path for code: src/LSTK-RNNLogic

The script for environment installation is:

cd LSTK-RNNLogic/codes/pyrnnlogiclib/
python setup.py install

The script for data preparation is:

python process_dicts.py
python get_scores.py
python process_soft.py

The script for both training and evaluation is:

python run.py --data_path [dataset] --num_generated_rules 2000 --num_rules_for_test 500 --num_important_rules 0 --prior_weight 0.01 --cuda --predictor_learning_rate 0.1 --generator_epochs 5000 --max_rule_length 2

Citation

Please consider citing the following paper if you find our codes helpful. Thank you!

@inproceedings{QiDW23,
  author       = {Kunxun Qi and
                  Jianfeng Du and
                  Hai Wan},
  title        = {Learning from Both Structural and Textual Knowledge for Inductive
                  Knowledge Graph Completion},
  booktitle    = {NeurIPS},
  year         = {2023},
  url          = {http://papers.nips.cc/paper\_files/paper/2023/hash/544242770e8333875325d013328b2079-Abstract-Conference.html},
}

About

Code and data for the paper "Learning from Both Structural and Textual Knowledge for Inductive Knowledge Graph Completion"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published