A Framework for Adapting Pre-Trained Language Models to Knowledge Graph Completion

This repository contains the official implementation for our paper:

A Framework for Adapting Pre-Trained Language Models to Knowledge Graph Completion
Justin Lovelace and Carolyn Penstein Rosé
In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

Dependencies

Our work was performed with Python 3.8. The dependencies can be installed from requirements.txt.

Data Preparation

We conduct our work upon the FB15K-237, WN18RR, CN82K, and SNOMED-CT Core datasets. Because the SNOMED-CT Core dataset was derived from the UMLS, we cannot directly release the dataset files. See here for full instructions for how to recreate the dataset. The other datasets can be processed with python preprocessing/process_datasets.py

We provide the BERT embeddings used from prior work (e.g. data/CN82K/embeddings/bert-base-uncased_prior.pt) used for our initial experiments as well as those extracted in our work (e.g. data/CN82K/embeddings/bert-base-uncased-ft_mean.pt).

Candidate Embedding Processing

We provide example scripts for training a KGC model with our best unsupervised and supervised embedding processing techniques, the normalizing flow and the residual MLP in scripts/tail_emb_processing/ for the CN82K dataset. The scripts can be applied to other datasets by updating the --dataset flag.

Embedding Extraction Experiments

We provide example scripts for training a KGC model with our best supervised embedding extraction techniques, prompt tuning and linear probing in scripts/head_ent_extraction/ for the CN82K dataset. The scripts can be applied to other datasets by updating the --dataset flag.

Normalizing Flow Training

We provide an example script for training a normalizing flow to process embeddings at flow/scripts/cn82k_flow.sh. It should be run from the flow/ directory.

Acknowledgements

Our normalizing flow implementation was adapted from chrischute's open-source implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
flow		flow
preprocessing		preprocessing
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CONSTANTS.py		CONSTANTS.py
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
evaluation.py		evaluation.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Framework for Adapting Pre-Trained Language Models to Knowledge Graph Completion

Dependencies

Data Preparation

Candidate Embedding Processing

Embedding Extraction Experiments

Normalizing Flow Training

Acknowledgements

About

Releases

Packages

Languages

License

justinlovelace/LM-KG-Completion

Folders and files

Latest commit

History

Repository files navigation

A Framework for Adapting Pre-Trained Language Models to Knowledge Graph Completion

Dependencies

Data Preparation

Candidate Embedding Processing

Embedding Extraction Experiments

Normalizing Flow Training

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages