Skip to content

Latest commit



76 lines (63 loc) · 5.75 KB

File metadata and controls

76 lines (63 loc) · 5.75 KB



We provide FB15k and WN18 datasets used for the visualization and evaluation in the folder data_FB15k/ and data_WN18 respectively, using the input format required by our codes.

Datasets are required in the following format, containing 16 files:

  • -train.txt: training file, format (head_entity, relation, tail_entity).
  • -valid.txt: validation file, same format as -train.txt
  • -test.txt: testing file, same format as -train.txt
  • -train-lhs.pkl, -train-rel.pkl, -train-rhs.pkl: training matrices for head, relation and tail respectively.
  • -valid-lhs.pkl, -valid-rel.pkl, -valid-rhs.pkl: validation matrices for head, relation and tail respectively.
  • -test-lhs.pkl, -test-rel.pkl, -test-rhs.pkl: testing matrices for head, relation and tail respectively.
  • entity2idx.pkl or synset2idx.pkl, idx2entity.pkl or synset2idx.pkl: key-value pairs for entity/relations-id.
  • entity2id.txt, relation2id.txt: key-value files, format (entity/relation, id)

We also provide the dataset statistics on the existing frequency of relations and entities for FB15k and WN18.

Please note that, for TransE and TransR, the dataset required by our codes is in the folder data_FB15k/ and data_WN18/,
while for TransE(AdaGrad), the dataset required by our codes is in the folder AWML_TransEmin/data/.


The codes are in the folder AWML_TransE/, AWML_TransEmin/, AWML_TransR/. The original model can be downloaded from:

  • TransE, in folder AWML_TransE/, is published by "Translating Embeddings for Modeling Multi-relational Data (2013)." Download
  • TransE(AdaGrad), in folder AWML_TransEmin/, is published by "Efficient energy-based embedding models for link prediction in knowledge graphs (2016)." Download
  • TransR, in folder AWML_TransR/, is published by "Learning Entity and Relation Embeddings for Knowledge Graph Completion (2015)." Download

Pre-training and Clustering

For pre-traning, you need to follow the steps below:

  • TransE: call the program FB15k/
  • TransE(AdaGrad): call the program for FB15k and for WN18 to obtain the embeddings in folder fb15k_embeddings/ and in folder wn18_embeddings/ respectively.
  • TransR: call the program FB15k/

For clustering, you need to follow the steps below:

  1. call the program to obtain the .txt file for the embeddings.
  2. call the program and in folder cluster/ to cluster all the entity-pair offsets for each knowledge category to cunstruct clustered relation set.
    AP clustering algorithm is published by "Clustering by Passing Messages Between Data Points." Download
    Note that, we provide our clustering result in k.pkl file.
  3. call the program and to obtain the clustered training matrices for head, relation and tail for the training of our proposed framework AWML:
    We provide the dictionary of relation to sub-relation in rel2subrel_apC.pkl and subrel2rel_apC.pkl.
  • TransE: FB15k-train-inpl/inpo/inpr_C.pkl for FB15k and WN-train-inpl/inpo/inpr_C.pkl for WN18.
  • TransE(AdaGrad): FB15k-train_C.pkl for FB15k and WN-train_C.pkl for WN18 in folder AWML_TransEmin/data/.
  • TransR: FB15k-train-inpl/inpo/inpr_RC.pkl for FB15k and WN-train-inpl/inpo/inpr_RC.pkl for WN18.

Training AWML framework

For calculating the category-specific density, you need to follow the steps below:

  1. call the program to obtain the entity-pair offsets for each knowledge category.
  2. call the program to calculate each category-specific density.

For training the KRL model incorporated by our proposed framework, you need to call the training program below:

  1. TransE: CTransE_aml/awl_random/
  2. TransE(AdaGrad): learnC_aml/awl_random/
  3. TransR: CTransR_aml/awl_random/

Testing the model

iWe provide the embeddings obtained by all the models used for visualization and evaluation in the folder fb15k_embeddings/ and wn18_embeddings/.
We also provide the parameters of AWML algorithm for the above embedding result in the corresponding training file.
For testing in the tasks of link prediction and triplet classification, you need to call the program below:

  • Link prediction: for filtered setting and for raw setting.
  • Triplet classification: for filtered setting and for raw setting.
    Please note that, for TransE(AdaGrad) model, the testing process follows the training process in the training file.

We also provide evaluation results .out file for all the models in folder AWML_TransE/, AWML_TransEmin/, AWML__TransR/.


For visualizing the embeddings of entity-pair offsets, you need to follow the steps below:

  1. call the program to obtain 2-dim vectors of all the entities and relations.
    The dimensionality reduction algorithm of t-SNE is published by "Visualizing Data using t-SNE" Download
  2. call the program to obtain all the golden entity-pair offsets.
  3. call the program to obtain all the synthetic entity-pair offsets.
  4. call the program to obtain the visualizing results.