# Installing dependencies

For installing our extensions for PyKEEN v.1.4.0, clone this specific version of PyKEEN inside your machine and follow the instructions on the `README.md` file in [this GitHub directory](https://github.com/sntcristian/and-kge/tree/main/pykeen-extension). Then, execute the following steps: <br/>
1. open the command line inside the folder in which your modified version of PyKEEN is.
2. install the library in development mode.
3. install sentence-transformers library (this will be used by our preprocessing classes).

In [None]:
cd "/content/drive/MyDrive/thesis_project/pykeen-1.4.0"

/content/drive/MyDrive/thesis_project/pykeen-1.4.0


In [None]:
!pip install -e .
!pip install sentence-transformers

# Import data

The dataset used in this notebook is freely available on [Zenodo](https://doi.org/10.5281/zenodo.5569438).

**Note:** if you have problem in importing PyKEEN inside the Jupyter environment (*happens in Colab*), restart the runtime.

In [None]:
import pykeen

For creating the triples instances to be processed by the KGE model, we used the original class `TriplesFactory` from PyKEEN and, for the instance which include textual and numeric literals for the entities, we used our modified `TriplesLiteralsFactory` class. 
The `TriplesLiteralsFactory` class accepts either `tsv` files containing literal triples or `npy` files containing pre-encoded embeddings for the numeric and textual literals associated to entities in the KG.<br/>
**Note:** to use pre-encoded `npy` matrices, beware that the TriplesFactory instances do not create new mapping from entities and relations to identifiers since this will cause the literal embeddings to be associated to wrong entities. We suggest to reuse the `entity_to_id` and `relation_to_id` mapping produced at the time the literals were encoded.




In [None]:
from pykeen.triples import TriplesLiteralsFactory
from pykeen.triples import TriplesFactory

In [None]:
import json
f1 = open('OC-782K/entity_to_id.json')
entity_to_id = json.load(f1)
f2 = open('OC-782K/relation_to_id.json')
relation_to_id = json.load(f2)

In [None]:
testing = TriplesFactory.from_path("OC-782K/testing.txt", entity_to_id=entity_to_id, relation_to_id=relation_to_id)
validation = TriplesFactory.from_path("OC-782K/validation.txt", entity_to_id=entity_to_id, relation_to_id=relation_to_id)

In [None]:
training = TriplesLiteralsFactory(path="OC-782K/training.txt", path_to_numeric_embeddings="OC-782K/numeric_literals.npy", path_to_textual_embeddings="OC-782K/textual_literals.npy", entity_to_id=entity_to_id, relation_to_id=relation_to_id)

# Initialize Pipeline object

This is an example of how we run the training of a multimodal KGE model on *OC-782K* by using the configuration used in our research.
The configuration files are available [here](https://github.com/sntcristian/and-kge/blob/main/open-citations/entity-prediction/distmultgatetext/config.json). <br/> Following this script will save a the final KGE model and the results for entity prediction in the `trial_final/oc/distmultgatetext/` directory.

In [None]:
from pykeen.pipeline import pipeline

result = pipeline(
    training=training,
    testing=testing,
    model='DistMult_gate_text',
    model_kwargs=dict(embedding_dim=512),
    optimizer="adam",
    optimizer_kwargs=dict(lr=0.0003),
    training_loop='slcwa',
    training_kwargs=dict(num_epochs=120, 
                         batch_size=512, 
                         label_smoothing=0.0012),
    loss= "BCEAfterSigmoidLoss",
    negative_sampler='basic',
    negative_sampler_kwargs=dict(num_negs_per_pos=12),
    regularizer="NoRegularizer",
    random_seed=3622058570,
    evaluation_kwargs=dict(batch_size=4)
)
result.save_to_directory("trial_final/oc/distmultgatetext/")

Training epochs on cuda:   0%|          | 0/120 [00:00<?, ?epoch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Training batches on cuda:   0%|          | 0/776 [00:00<?, ?batch/s]

Evaluating on cuda:   0%|          | 0.00/124k [00:00<?, ?triple/s]

INFO:pykeen.evaluation.evaluator:Evaluation took 2506.54s seconds
