KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

Incorporating factual knowledge into pre-trained language models (PLM) such as BERT is an emerging trend in recent NLP studies. However, most of the existing methods combine the external knowledge integration module with a modified pre-training loss and re-implement the pre-training process on the large-scale corpus. Re-pretraining these models is usually resource-consuming, and difficult to adapt to another domain with a different knowledge graph (KG). Besides, those works either cannot embed knowledge context dynamically according to textual context or struggle with the knowledge ambiguity issue. In this paper, we propose a novel knowledge-aware language model framework based on fine-tuning process, which equips PLM with a unified knowledge-enhanced text graph that contains both text and multi-relational sub-graphs extracted from KG. We design a hierarchical relational-graph-based message passing mechanism, which can allow the representations of injected KG and text to mutually updates each other and can dynamically select ambiguous mentioned entities that share the same text. Our empirical results show that our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT, and achieve significant improvement on the machine reading comprehension (MRC) task compared with other knowledge-enhanced models.

Framework of KELM (left) and illustrates how to generate knowledge-enriched token embeddings (right)

Environment

To install requirements:

pip install -r requirements.txt

The experiments in the paper are worked on 4 V100 GPUs. For using distributed training with multiple GPUs, please utilize Pytorch DistributedDataParallel.

Datasets and Knowledge Graph

1. ReCoRD

For train and dev set, download from ReCoRD

For test set, download from SuperGlue

2. MultiRC and COPA

Download from from SuperGlue

3. Knowledge Graph

We uses two knowledge graphs: WordNet and NELL

For NELL, name entity recognition is performed by Standford CoreNLP. For WordNet, we find text matching from this repository

For convenience, you can download all relative files from following google drive links:

link unzip and replace ./data/

Data Preprocess

download bert-large-cased model from huggingface to ./cache/bert-large-cased

For train set,

sh data_preprocess_$dataset$_train.sh

For dev set,

sh data_preprocess_$dataset$_dev.sh

Train

We follow the same "two-staged" training strategy with KT-NET

Firstly, freeze language model and run

sh run_first_$dataset$.sh

Then unfreeze language model and run from the saved first stage model(replace INIT_DIR with local address of saved model)

sh run_second_$dataset$.sh

Results

Our model(backbone: BERT) achieves the following performance on:

ReCoRD:

Model name	EM/F1(Dev)	EM/F1(Test)
BERT_large	70.2/72.2	71.3/72.0
SKG+BERT_large	70.9/71.6	72.2/72.8
KT-NET_WordNet	70.6/72.8	-
KT-NET_NELL	70.5/72.5	-
KT-NET_BOTH	71.6/73.6	73.0/74.8
KELM_WordNet	75.4/75.9	75.9/76.5
KELM_NELL	74.8/75.3	75.9/76.3
KELM_BOTH	75.1/75.6	76.2/76.7

MultiRC:

Model name	EM/F1(Dev)	EM/F1(Test)
BERT_large	-	24.1/70.0
KT-NET_BOTH^*	26.7/71.7	25.4/71.1
KELM_WordNet	29.2/70.6	25.9/69.2
KELM_NELL	27.3/70.4	26.5/70.6
KELM_BOTH	30.3/71.0	27.2/70.8

*from our implementation

COPA:

Model name	Accuracy(Dev)	Accuracy(Test)
BERT_large	-	70.6
KELM_WordNet	76.1	78.0

We also implement KELM base on RoBERTa_large and evaulate it on ReCoRD:

Model name	EM/F1(Dev)	EM/F1(Test)
RoBERTa_large	87.9/88.4	88.4/88.9
KELM_BOTH	88.2/88.7	89.1/89.6

Acknowledgement

The part of code is implemented based on the open source code of huggingface, DGL and KT-NET, jiant

License

We use Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Tensorboard_dir		Tensorboard_dir
cache		cache
checkpoint		checkpoint
data		data
images		images
log		log
outputs		outputs
runs		runs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_preprocess_copa_dev.sh		data_preprocess_copa_dev.sh
data_preprocess_copa_train.sh		data_preprocess_copa_train.sh
data_preprocess_multirc_dev.sh		data_preprocess_multirc_dev.sh
data_preprocess_multirc_train.sh		data_preprocess_multirc_train.sh
data_preprocess_record_dev.sh		data_preprocess_record_dev.sh
data_preprocess_record_dev_roberta.sh		data_preprocess_record_dev_roberta.sh
data_preprocess_record_train.sh		data_preprocess_record_train.sh
data_preprocess_record_train_roberta.sh		data_preprocess_record_train_roberta.sh
requirements.txt		requirements.txt
run_first_copa.sh		run_first_copa.sh
run_first_multirc.sh		run_first_multirc.sh
run_first_record.sh		run_first_record.sh
run_first_record_roberta.sh		run_first_record_roberta.sh
run_second_copa.sh		run_second_copa.sh
run_second_multirc.sh		run_second_multirc.sh
run_second_record.sh		run_second_record.sh
run_second_record_roberta.sh		run_second_record_roberta.sh

License

nlp-anonymous-happy/anonymous-KG-guided-NLP

Folders and files

Latest commit

History

Repository files navigation

KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs

Environment

Datasets and Knowledge Graph

1. ReCoRD

2. MultiRC and COPA

3. Knowledge Graph

Data Preprocess

Train

Results

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Languages