-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🦜 🏴☠️ Implement NodePiece representation and model #621
Conversation
@migalkin would be great to have your feedback, as you may be familiar with it 😉 |
trigger ci
Can we have a demo on how you would use this representation with a model? Like can we easily implement a TransE with NodePiece? |
Sure. from typing import Optional
from class_resolver.api import HintOrType
from pykeen.models.nbase import ERModel
from pykeen.nn.emb import EmbeddingSpecification, NodePieceRepresentation
from pykeen.nn.modules import Interaction, TransEInteraction
from pykeen.pipeline import pipeline
from pykeen.triples.triples_factory import CoreTriplesFactory
class NodePieceModel(ERModel):
def __init__(
self,
*,
triples_factory: CoreTriplesFactory,
embedding_specification: Optional[EmbeddingSpecification] = None,
interaction: HintOrType[Interaction] = TransEInteraction,
**kwargs,
) -> None:
if embedding_specification is None:
embedding_specification = EmbeddingSpecification(
shape=(64,),
)
entity_representations = NodePieceRepresentation(
triples_factory=triples_factory,
token_representation=embedding_specification,
)
super().__init__(
triples_factory=triples_factory,
interaction=interaction,
entity_representations=entity_representations,
relation_representations=embedding_specification,
**kwargs,
)
result = pipeline(
dataset="nations",
model=NodePieceModel,
model_kwargs=dict(
interaction_kwargs=dict(
p=2,
),
),
)
print(result.get_metric("hits_at_10")) EDIT: added in 6b29c4e |
trigger ci
trigger ci
trigger ci
trigger ci
also fix create_inverse_triples for NodePiece test trigger ci
@@ -931,7 +931,7 @@ def test_score_t(self) -> None: | |||
try: | |||
scores = self.instance.score_t(batch) | |||
except NotImplementedError: | |||
self.fail(msg="Score_o not yet implemented") | |||
self.fail(msg="score_t not yet implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this typo is not really part of the PR
@@ -950,7 +968,7 @@ def test_score_h(self) -> None: | |||
try: | |||
scores = self.instance.score_h(batch) | |||
except NotImplementedError: | |||
self.fail(msg="Score_s not yet implemented") | |||
self.fail(msg="score_h not yet implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
trigger ci
Looks like the issues are now with ConvE's tests |
I was running the debugger for the ConvE test and for some reason after initialization of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation works 🎉
Exposing the ratio
param from the MLP encoder sounds like a good idea (with the default value 2), otherwise everything looks ready!
:func:`torch.max`, or even trainable aggregations e.g., ``MLP(mean(MLP(tokens)))`` | ||
(cf. DeepSets from [zaheer2017]_) if given value ``"mlp"``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current _ConcatMLP
is not DeepSets :)
The idea of DeepSets is to project each set member independently through some encoder, then aggregate (like with mean) and then pass through another FF net. It would look like this:
enc1 = nn.Sequential(
nn.Linear(embedding_dim, embedding_dim),
nn.ReLU(),
nn.Linear(embedding_dim, embedding_dim)
)
enc2 = nn.Sequential(nn.Linear(embedding_dim, embedding_dim), nn.ReLU(), nn.Linear(embedding_dim, output_dim))
and in forward pass:
# x: shape (bs, num_elements, embedding_dim)
x = enc1(x) # the same shape (bs, num_elements, embedding_dim)
x = torch.mean(-2) # here we do the aggregation to (bs, embedding_dim)
x = enc2(x) # final projection keeping (bs, output_dim)
It can be added as an option along with mlp though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct docstring somehow got lost during the refactoring 😅
here it was still correct: #621 (comment)
I think the issue is that we did not yet think about what should be scored in We could either:
To keep this PR focused on one thing, I tend towards option 1. |
trigger ci
Yes let’s bump this. So let’s override the test in conve to be skipped and leave a todo for later |
it is not yet clear what would be the desired output shape
trigger ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first draft to add NodePiece representations to pykeen.
For now, it uses a simple variant of it, where each entity is represented by
k
randomly chosen incident relations.