Getting Embeddings of the Entity and Relations #1365

anonimoustt · 2024-02-15T17:34:49Z

Hi,

Is it possible to get the embeddings of the entities and relations of the knowledge graph. For instance, a triple is:

Candy tastes sweet. Here, Candy and sweet are the entities and taste is the relation. Now my query is: How to use pykeen to get the
embeddings of the entities ( Candy, sweet) and the relation, tastes ?

cthoyt · 2024-02-15T18:05:17Z

I think this will help: https://pykeen.readthedocs.io/en/stable/tutorial/first_steps.html#using-learned-embeddings

anonimoustt · 2024-02-15T18:15:52Z

Hi,

Thanks. Do you any example like Ampligraph embeddings: https://docs.ampligraph.org/en/latest/examples.html

It will be more helpful. Also is pykeen better than Ampligraph.

cthoyt · 2024-02-15T20:55:28Z

I'd suggest reading the tutorial I linked in full, which has everything you'll need. It's hard to say what library is best, ours for sure has many more features and models, but it always depends on your use case.

anonimoustt · 2024-02-16T02:19:43Z

Hi,

I was trying to get the embedding using pykeen: The data as follows:

import numpy as np
Data1= np.array([['a', 'y', 'b'],
              ['b', 'y', 'a'],
              ['a', 'y', 'c'],
              ['c', 'y', 'a'],
              ['a', 'y', 'd'],
              ['c', 'y', 'd'],
              ['b', 'y', 'c'],
              ['f', 'y', 'e']])

Using the following code I was able to get the embeddings of 'UMLS' data. 'UMLS' data is available in pykeen database. But if I want to get the embedding of new data like Data1( above mentioned) how can it be done?

import torch
from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline
result = pipeline(model='TransE', dataset='UMLS')
model = result.model
entity_representation_modules: List['pykeen.nn.Representation'] = model.entity_representations
relation_representation_modules: List['pykeen.nn.Representation'] = model.relation_representations
entity_embeddings: pykeen.nn.Embedding = entity_representation_modules[0]
relation_embeddings: pykeen.nn.Embedding = relation_representation_modules[0]
entity_embedding_tensor: torch.FloatTensor = entity_embeddings()
relation_embedding_tensor: torch.FloatTensor = relation_embeddings()
entity_embedding_tensor: torch.FloatTensor = entity_embeddings(indices=None)
relation_embedding_tensor: torch.FloatTensor = relation_embeddings(indices=None)

Thanks.

anonimoustt · 2024-02-16T14:58:38Z

Hi,

I was trying to get the embedding using pykeen: The data as follows:
import numpy as np
Data1= np.array([['a', 'y', 'b'],
['b', 'y', 'a'],
['a', 'y', 'c'],
['c', 'y', 'a'],
['a', 'y', 'd'],
['c', 'y', 'd'],
['b', 'y', 'c'],
['f', 'y', 'e']])
Using the following code I was able to get the embeddings of 'UMLS' data. 'UMLS' data is available in pykeen database. But if I want to get the embedding of new data like Data1( above mentioned) how can it be done?
import torch
from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline
result = pipeline(model='TransE', dataset='UMLS')
model = result.model
entity_representation_modules: List['pykeen.nn.Representation'] = model.entity_representations
relation_representation_modules: List['pykeen.nn.Representation'] = model.relation_representations
entity_embeddings: pykeen.nn.Embedding = entity_representation_modules[0]
relation_embeddings: pykeen.nn.Embedding = relation_representation_modules[0]
entity_embedding_tensor: torch.FloatTensor = entity_embeddings()
relation_embedding_tensor: torch.FloatTensor = relation_embeddings()
entity_embedding_tensor: torch.FloatTensor = entity_embeddings(indices=None)
relation_embedding_tensor: torch.FloatTensor = relation_embeddings(indices=None)

Furthermore, how to add new knowledge graph data like https://www.zjukg.org/project/ProteinKG25/ to get the embeddings
Thanks.

anonimoustt · 2024-02-16T15:52:47Z

Furthermore, does pykeen have knowledge graph data like https://www.zjukg.org/project/ProteinKG25/ in its datasets where triples show protein to protein relation, and protein has the corresponding Uniprot ID.

mberr · 2024-02-16T17:27:38Z

Furthermore, does pykeen have knowledge graph data like https://www.zjukg.org/project/ProteinKG25/ in its datasets where triples show protein to protein relation, and protein has the corresponding Uniprot ID.

I am not aware that there is an existing binding for ProteinKG25. The following tutorial describes how you can load an external dataset: https://pykeen.readthedocs.io/en/stable/byo/data.html If the data does not come in the format of TSV files, you can either do some pre-processing outside of PyKEEN to prepare this format; or follow the steps described in https://pykeen.readthedocs.io/en/stable/extending/datasets.html to write a custom loader. https://github.com/pykeen/pykeen/blob/master/src/pykeen/datasets/base.py contains some useful base classes you may be able to use (depending on the format of the dataset).

mberr · 2024-02-16T17:38:35Z

Using the following code I was able to get the embeddings of 'UMLS' data. 'UMLS' data is available in pykeen database. But if I want to get the embedding of new data like Data1( above mentioned) how can it be done?

Quite similar 😉

The snippet you shared does already show you how you can get access the representations stored in a (trained) model:

entity_representation_modules: List['pykeen.nn.Representation'] = model.entity_representations
relation_representation_modules: List['pykeen.nn.Representation'] = model.relation_representations
entity_embeddings: pykeen.nn.Embedding = entity_representation_modules[0]
relation_embeddings: pykeen.nn.Embedding = relation_representation_modules[0]
entity_embedding_tensor: torch.FloatTensor = entity_embeddings()
relation_embedding_tensor: torch.FloatTensor = relation_embeddings()
entity_embedding_tensor: torch.FloatTensor = entity_embeddings(indices=None)
relation_embedding_tensor: torch.FloatTensor = relation_embeddings(indices=None)

These tensors are usually in the form of a matrix of shape (num_entities, embedding_dim) and (num_relations, embedding_dim); what might be missing for your interpretation is the conversion between string labels, i.e., an entity's or relation's name, and their ID.

This process is described here: https://pykeen.readthedocs.io/en/stable/tutorial/first_steps.html#mapping-entity-and-relation-identifiers-to-their-names

It is important that you use a triples factory which is based on the same label-to-id mapping you also used to convert your training data to integer tensors; If you train a model with PyKEEN, you will find it saved alongside the model weights. https://pykeen.readthedocs.io/en/stable/tutorial/checkpoints.html#resuming-training describes how you can load these mappings.

anonimoustt · 2024-02-16T23:30:55Z

Hi,
I was able to get the embeddings of protein knowledge graph data using pykeen. Only thing is that, the value in embeddings looks very close to zero. I also want to know that, is there any option to change the dimension of the embedding. I used ConvE method which gives 200 dimensions. Is it possible to have embeddings of dimension 320? Further, I observe if the number of triples more than 10,000 it is giving memory error. are there any restrictions on the size of triples.

Here are the values of embedding:

[0.0017415468555841886,
 0.0020289210267518606,
 -0.0024428751582522366,
 0.0003346314860755611,
 -0.003599902393032514,
 -0.0022604978230915773,
 -0.006414890196019836,
 -0.001007098755439338,
 -0.000891482955650069,
 -0.00015597889552993192,
 0.00011426882107837491,
 -0.002788625476141504,
 -0.001628724182265814,
 0.003601291998696612,
 -0.0011085774290389978,
 0.0010431754011548042,
 -0.002421554378949051,
 0.003166341586620547,
 -0.002322079398517571,
 -0.00043104596238639455,
 0.0010004756775094453,
 0.0023743226964250217,
 0.0005421599132112535,
 -0.0027540538843618353,
 0.0058076925550919565,
 0.004578667093319554,
 -0.0007533992700893948,
 -0.0025509362579593884,
 -0.000984566468661785,
 0.002385908902118905,
 -0.00014667611361950583,
 -0.002786656162630486,
 -0.0010160182574517843,
 0.0012733197100298216,
 0.0030917265932933185,
 -0.00025259495966396464,
 0.00026315779262363125,
 5.821580783610666e-05,
 0.00012087390115532948,
 -0.0011059885655862868,
 -0.001919582124070471,
 0.0047833407183904934,
 0.00044059828096377877,
 -0.002456275975556691,
 -0.0020891806568734844,
 -0.004828415017056727,
 -0.002847435103569189,
 -0.0032646221535124085,
 0.0003373189087387064,
 -0.0024683896682509655,
 0.00212630136306072,
 -0.004150695448225986,
 -0.0031644464182735924,
 -0.004870084784173977,
 0.002350185195401325,
 0.00039380344473146007,
 -0.00043047127483164,
 -0.0011573401777396975,
 -0.001063647366261497,
 -0.0016871983844095062,
 -6.92630808113382e-05,
 0.0007285035479456122,
 -0.00019289708603914096,
 -0.00148695642607989,
 0.0024982134471851873,
 0.003494842262213563,
 -0.0009611465403222314,
 -0.0021434869864208456,
 -0.0005295243401813137,
 -0.0019028997793062831,
 0.0018961127602697724,
 -0.0038856781017940554,
 -0.0005105181790196489,
 0.0009459691402669698,
 0.0018123203964692052,
 -0.0006865696136327433,
 -0.0007767874797505374,
 0.0006546143921266653,
 -0.0001268532991569327,
 -0.0030007645960209977,
 -0.0047005901943950575,
 0.0036949487075846597,
 -0.0021667171533655432,
 0.0009905830528471422,
 -0.002612256479598295,
 -0.002631295014778556,
 -0.0028584420254281293,
 -0.003713423127422363,
 0.0008985209884212294,
 0.0027125853990012994,
 0.0003658266298577055,
 -0.004063431961168201,
 -0.0020874761920833015,
 -0.005852176824989302,
 0.00011179315781294977,
 -0.0019012670393325766,
 -0.0019178221222381944,
 -0.0025808041889460592,
 -0.0013283826756665309,
 -0.004839842971843081,
 -0.001430589515398419,
 0.00020659923477172527,
 -0.0023674813926063375,
 0.0016635643554301226,
 -0.0015717340842998204,
 -0.0017650544084859502,
 -0.001478606215242975,
 -0.001367872646018523,
 -0.002278269464304006,
 -0.0064596052835640665,
 -0.0036846830119608635,
 -0.002594720807795197,
 0.0012052286105193359,
 -0.003129655175820953,
 4.143127985647512e-05,
 0.0018678921215818356,
 -0.001763518020363257,
 0.0001773898764729493,
 0.0031575815343577976,
 -0.0006162054795259475,
 -0.002557691997381076,
 -0.00047167417843391304,
 -0.0031079714559797468,
 -0.003119691394071049,
 0.0009132792603312776,
 0.001599474048039256,
 -0.0016751899062090927,
 0.0011626209528641418,
 -0.0025520166794077035,
 0.002102249196526001,
 0.005824588183774036,
 -0.0016379782856499507,
 -0.002600913478142304,
 -0.0010023775358058247,
 -0.00014035862596200845,
 -0.0034674254820952367,
 0.0025989930068780974,
 0.0022619017049515123,
 -0.0035690068120297707,
 -0.0011164863937950353,
 -0.0019005471451263877,
 0.00202030425361237,
 0.002178433448223818,
 -0.003292783149513635,
 0.0011586245788847288,
 0.001823332414304424,
 0.000889632974333278,
 -0.00048474452998923404,
 -0.003208847308989414,
 0.001922659366742008,
 -0.0001063910459502628,
 0.00015140983593992554,
 0.002833767376313308,
 -0.0037829131016978547,
 0.0009611971206105494,
 -0.0038337987588660262,
 -0.0012271884321187379,
 -0.003718855760320563,
 -7.07972097319779e-05,
 0.0020266058223716694,
 -0.0023961097517421225,
 -0.0006169329370824255,
 -0.0028725115548308288,
 0.0022937884376017445,
 -0.0018583620949586808,
 0.0012547990707401022,
 -0.0007561720517969183,
 -0.0028446188002277167,
 0.00029988052099459017,
 0.0005734644507024424,
 0.001406100059679629,
 0.004298944520476626,
 -0.003564840028711622,
 -0.0022906062167921187,
 -0.0011693019652823326,
 0.0022875185189921996,
 -0.0010151714682265969,
 -0.004729095643152542,
 0.004692932443249714,
 -0.0003392574182934579,
 -0.0031881339738565532,
 0.0006762623515734268,
 -9.629964790328775e-05,
 0.00018194508473407824,
 0.0038416632098642414,
 -0.0016768831282752599,
 -0.000887814918798605,
 -0.002346283590456152,
 -0.005639427137652433,
 0.0009939908917922567,
 -0.002446443565737425,
 -0.004722650054723248,
 -0.0027636691764786094,
 0.003293139201073996,
 -0.004304944809781927,
 -0.001950115304914674,
 -0.002276449515820605,
 -0.000809070920386473,
 -0.0011766172321871049,
 -0.0016360378171889776]

mberr · 2024-02-17T15:42:14Z

I also want to know that, is there any option to change the dimension of the embedding. I used ConvE method which gives 200 dimensions. Is it possible to have embeddings of dimension 320?

Yes, you can take a look at the possible parameters here: https://pykeen.readthedocs.io/en/stable/api/pykeen.models.ConvE.html#pykeen.models.ConvE Depeding on how you train the model, you can either provide them in the model constructor, or as model_kwargs in the pipeline.

Further, I observe if the number of triples more than 10,000 it is giving memory error. are there any restrictions on the size of triples.

Without any further information about your training setup and infrastructure it is hard to tell why that is the case.

anonimoustt · 2024-02-17T16:18:37Z

Hi, here is data of 7050 triples that is working but more than that giving memory error: https://drive.google.com/file/d/1hW1qAKgJPBqKZ-HMO5MzeFfaV3Qq3wdW/view?usp=drive_link

Here is the code:

%%capture

!pip install pybel
!pip install pykeen
import os
import numpy as np
import pykeen
from pykeen.pipeline import pipeline
from pykeen.triples.leakage import Sealant
from pykeen.triples import TriplesFactory
ras_triples_path = "proteinkgtsv2.tsv"
#missing_ras_triples = not os.path.exists(ras_triples_path)
#missing_ras_triples
triples = np.loadtxt(ras_triples_path, dtype=str, delimiter="\t") #Memory error if #triples>10K
h=[]
r=[]
t=[]

for kk in triples:
if kk[0] not in h:
h.append(kk[0])
if kk[1] not in r:
r.append(kk[1])
if kk[2] not in t:
t.append(kk[2])
#print(len(h),len(r),len(t))
tf = TriplesFactory.from_labeled_triples(triples)
#Getting error here while generating TriplesFactory if the number of triples greater than 10K or so. I have 1 M triples.
#Training: Getting Error here
results = pipeline(
training=tf,
testing=tf,
model="TransH",
training_kwargs=dict(num_epochs=200),
model_kwargs=dict(embedding_dim=320),
#entity_representations_kwargs=dict(shape=64),
random_seed=1235,
device="cpu",
#shape=320,
)
model = results.model
from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline
from typing import List
import pykeen.nn
from pykeen.pipeline import pipeline
import torch
entity_representation_modules: List['pykeen.nn.Representation'] = model.entity_representations
relation_representation_modules: List['pykeen.nn.Representation'] = model.relation_representations
entity_embeddings: pykeen.nn.Embedding = entity_representation_modules[0]
relation_embeddings: pykeen.nn.Embedding = relation_representation_modules[0]
entity_embedding_tensor: torch.FloatTensor = entity_embeddings()
relation_embedding_tensor: torch.FloatTensor = relation_embeddings()
entity_embedding_tensor: torch.FloatTensor = entity_embeddings(indices=None)
relation_embedding_tensor: torch.FloatTensor = relation_embeddings(indices=None)
#fggm=[sum(d)/len(fgg2) for d in zip(*fgg2)]
enti=[]
for jj in entity_embedding_tensor:
kl=jj.tolist()
hg=[]
for kv in kl:
hg.append(kv)
enti.append(hg)
#print("\n\n")
for jj in relation_embedding_tensor:
kl=jj.tolist()
hg=[]
for kv in kl:
hg.append(kv)
enti.append(hg)
print(len(enti))
vxccx=[sum(d)/len(enti) for d in zip(*enti)]

anonimoustt · 2024-02-18T14:52:36Z

The memory problem is solved. The main problem was:
triples = np.loadtxt(ras_triples_path, dtype=str, delimiter="\t") which was giving memory error. Storing the triples from TSV file resolve this issue.

mberr · 2024-02-27T20:59:55Z

Hi @anonimoustt ,

great that you found a solution. I'll close this issue for now.

mberr closed this as completed Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Embeddings of the Entity and Relations #1365

Getting Embeddings of the Entity and Relations #1365

anonimoustt commented Feb 15, 2024

cthoyt commented Feb 15, 2024

anonimoustt commented Feb 15, 2024

cthoyt commented Feb 15, 2024

anonimoustt commented Feb 16, 2024 •

edited by mberr

Loading

anonimoustt commented Feb 16, 2024

anonimoustt commented Feb 16, 2024 •

edited

Loading

mberr commented Feb 16, 2024

mberr commented Feb 16, 2024

anonimoustt commented Feb 16, 2024 •

edited by mberr

Loading

mberr commented Feb 17, 2024

anonimoustt commented Feb 17, 2024 •

edited

Loading

anonimoustt commented Feb 18, 2024

mberr commented Feb 27, 2024

Getting Embeddings of the Entity and Relations #1365

Getting Embeddings of the Entity and Relations #1365

Comments

anonimoustt commented Feb 15, 2024

cthoyt commented Feb 15, 2024

anonimoustt commented Feb 15, 2024

cthoyt commented Feb 15, 2024

anonimoustt commented Feb 16, 2024 • edited by mberr Loading

anonimoustt commented Feb 16, 2024

anonimoustt commented Feb 16, 2024 • edited Loading

mberr commented Feb 16, 2024

mberr commented Feb 16, 2024

anonimoustt commented Feb 16, 2024 • edited by mberr Loading

mberr commented Feb 17, 2024

anonimoustt commented Feb 17, 2024 • edited Loading

anonimoustt commented Feb 18, 2024

mberr commented Feb 27, 2024

anonimoustt commented Feb 16, 2024 •

edited by mberr

Loading

anonimoustt commented Feb 16, 2024 •

edited

Loading

anonimoustt commented Feb 16, 2024 •

edited by mberr

Loading

anonimoustt commented Feb 17, 2024 •

edited

Loading