Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE

For the sake of reproducibility, resources for replicating the experiments presented in our paper are provided below.

Datasets

The datasets/ folder contains the following datasets: YAGO14K, FB15k187, and DBpedia77k [1].

Statistics for these datasets as well as their corresponding protographs are reported in the following two tables.

Dataset	#Classes	#HierarchyDepth	#Entities	#Relations	#Triples (train)	#Triples (valid)	#Triples (test)
YAGO14k	954	5	14,178	37	18,263	472	448
FB15k187	624	2	14,305	187	245,350	15,256	17,830
DBpedia77k	280	8	76,651	150	140,760	16,334	32,934

Dataset	Protograph	#Entities	#Relations	#Triples
YAGO14K	P1	22	37	37
	P2	590	37	4,959
FB15k187	P1	138	187	187
	P2	138	187	187
DBpedia77k	P1	55	150	150
	P2	186	150	3,210

Building Protographs

Two heuristics for building protographs are presented in our paper. In order to build the required protographs for YAGO14K, FB15k187 (renamed as FB14K for short), and DBpedia77k (DB77K for short) at the same time, please run the following commands:

python get_prototype.py --dataset YAGO14K && python get_prototype.py --dataset FB14K && python get_prototype.py --dataset DB77K

Note that you can bring your own datasets (with all the required files) and run the following command:

python get_prototype.py --dataset mydataset

Knowledge Graph Embeddings

Pre-trained embeddings' files are provided in the datasets/ folder. These correspond to the embeddings found at the best epoch on the validation, for each combination of model, setting, and dataset. In particular, for each dataset the MASCHInE-P1/ (resp. MASCHInE-P2/) folder contain embeddings of the best models after the fine-tuning step.

We also made our scripts for training and testing available. These will be refactored upon acceptance. In particular, the _vanilla/ folder contains all the necessary files to train and test knowledge graph embedding models in the vanilla setting. The _transfer/ folder has the same purpose, but for training and testing MASCHInE-P1 and MASCHInE-P2. Before using these scripts, you should first place them at the root of this repo (i.e. in their parent folder).

Hyperparameters

Below are reported the best hyperparameters found, which were used for training models:

YAGO14K	dimension	learning rate	batch size	regularizer	regularizer weight
TransE	100	0.001	2048	L2	0.001
DistMult	100	0.001	2048	L2	0.0001
ComplEx	100	0.01	2048	L2	0.1
ConvE	200	0.001	512	None	None
TuckER	200	0.001	128	None	None

FB15k187	dimension	learning rate	batch size	regularizer	regularizer weight
TransE	200	0.001	2048	L2	0.001
DistMult	200	0.001	2048	L2	0.01
ComplEx	200	0.001	2048	L2	0.1
ConvE	200	0.001	128	None	None
TuckER	200	0.0005	128	None	None

DBpedia77K	dimension	learning rate	batch size	regularizer	regularizer weight
TransE	200	0.001	2048	L2	0.001
DistMult	200	0.001	2048	L2	0.01
ComplEx	200	0.001	2048	L2	0.1
ConvE	200	0.001	512	None	None
TuckER	200	0.001	128	None	None

Link Prediction

Link prediction experiments can be replicated using the code provided in the _vanilla/ and _transfer/ folders.

Entity Clustering

Clustering experiments are performed following the guidelines and code provided in https://github.com/mariaangelapellegrino/Evaluation-Framework [2].

Node Classification

Node classification experiments are performed following the guidelines and code provided in https://github.com/janothan/DL-TC-Generator [3].

References

[1] Hubert, N., Monnin, P., Brun, A., & Monticolo, D. (2023). Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction.

[2] Pellegrino, M. A., Cochez, M., Garofalo, M., & Ristoski, P. (2019). A configurable evaluation framework for node embedding techniques. In The Semantic Web: ESWC 2019 Satellite Events: ESWC 2019 Satellite Events, Portorož, Slovenia, June 2–6, 2019, Revised Selected Papers 16 (pp. 156-160). Springer International Publishing.

[3] Portisch, J., & Paulheim, H. (2022, October). The DLCC node classification benchmark for analyzing knowledge graph embeddings. In The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings (pp. 592-609). Cham: Springer International Publishing.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
_transfer		_transfer
_vanilla		_vanilla
datasets		datasets
embeddings		embeddings
LICENSE		LICENSE
README.md		README.md
get_prototype.py		get_prototype.py
models.py		models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_transfer

_transfer

_vanilla

_vanilla

datasets

datasets

embeddings

embeddings

LICENSE

LICENSE

README.md

README.md

get_prototype.py

get_prototype.py

models.py

models.py

Repository files navigation

Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE

Datasets

Building Protographs

Knowledge Graph Embeddings

Hyperparameters

Link Prediction

Entity Clustering

Node Classification

References

About

Releases

Packages

Languages

License

nicolas-hbt/versatile-embeddings

Folders and files

Latest commit

History

Repository files navigation

Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing Semantics with MASCHInE

Datasets

Building Protographs

Knowledge Graph Embeddings

Hyperparameters

Link Prediction

Entity Clustering

Node Classification

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages