Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌀🔗 Extend EA datasets to allow loading a unified graph #871

Merged
merged 34 commits into from
Apr 28, 2022

Conversation

mberr
Copy link
Member

@mberr mberr commented Apr 13, 2022

This PR adds support for loading KGs from entity alignment datasets as a joint graph comprising both individual sides.

@mberr
Copy link
Member Author

mberr commented Apr 14, 2022

@dobraczka - any thoughts on this? maybe also regarding extension to the OpenEA dataset family, or generally adding a workflow to perform EA as LP task?

@dobraczka
Copy link
Contributor

I can take a detailled look at it next week since I have vacation right now. I think treating EA as a special LP task can only get you so far.
Generally, I already did some work in that direction. I used some techniques from the OpenEA package as guidance. The simplest extension is to use "swapping", i.e. from the training alignment add triples to the connected graph, where known matches are exchanges with each other. For example if you have the triples:
A rel1 B
B rel2 C
D rel3 E

and know that A == D you add the triples
D rel1 B
A rel3 E

The other simple strategy is "sharing" where same entites get the same id, but the TriplesFactory does not like that.
For evaluation you then need something like an align-rank-based evaluator, that gets queried with entities from one side and should score the entities from the other side (without the use of a relation). I already hacked something together that does this.

I am currently working on inductive EA. I already created semi- and fully-inductive datasets from the OpenEA datasets and now I'm trying to get a sensible baseline using NodePiece. I got some results already, but it needs a bit of work still.

InductiveNodePieceEA

If you or @migalkin are interested in collaborating on the inductive EA stuff let me know.
As I said I have vacation (until end of next week), so I might be slow to respond in that period, but I am generally very interested in helping to add EA to Pykeen!

@mberr
Copy link
Member Author

mberr commented Apr 14, 2022

Thanks @dobraczka for the quick response despite being on vacation 😅 If you send me your preferred email to max.berrendorf@gmail.com, I'll send you an invite to our PyKEEN Slack.

@dobraczka
Copy link
Contributor

I like that you already included the possibility to extend this with different combination strategies, which is what I was thinking about, but that can be done in a seperate PR.
Do you want to also do this for the OpenEA datasets or should I do that?

# store for repr
self.side = side
# split
training, testing, validation = tf.split(ratios=split_ratios, random_state=random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This train/test/val split would generally only make sense for a LP setting, right? In the EA setting we would want to split the alignment tuples appropriately.
But creating EA-specific settings could also be done in a seperate PR.

@mberr
Copy link
Member Author

mberr commented Apr 28, 2022

I removed some of the additional changes to reduce this PR to a atomic change. The old version is available here.

@mberr mberr changed the title Extend EA datasets to allow loading a unified graph 🌀🔗 Extend EA datasets to allow loading a unified graph Apr 28, 2022
@mberr
Copy link
Member Author

mberr commented Apr 28, 2022

@PyKEEN-bot test

@mberr mberr enabled auto-merge (squash) April 28, 2022 16:35
@mberr mberr merged commit f1a77df into master Apr 28, 2022
@mberr mberr deleted the add-merge-option-to-ea-datasets branch April 28, 2022 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants