🌀🔗 Extend EA datasets to allow loading a unified graph #871

mberr · 2022-04-13T13:59:06Z

This PR adds support for loading KGs from entity alignment datasets as a joint graph comprising both individual sides.

mberr · 2022-04-14T15:12:16Z

@dobraczka - any thoughts on this? maybe also regarding extension to the OpenEA dataset family, or generally adding a workflow to perform EA as LP task?

dobraczka · 2022-04-14T16:08:05Z

I can take a detailled look at it next week since I have vacation right now. I think treating EA as a special LP task can only get you so far.
Generally, I already did some work in that direction. I used some techniques from the OpenEA package as guidance. The simplest extension is to use "swapping", i.e. from the training alignment add triples to the connected graph, where known matches are exchanges with each other. For example if you have the triples:
A rel1 B
B rel2 C
D rel3 E

and know that A == D you add the triples
D rel1 B
A rel3 E

The other simple strategy is "sharing" where same entites get the same id, but the TriplesFactory does not like that.
For evaluation you then need something like an align-rank-based evaluator, that gets queried with entities from one side and should score the entities from the other side (without the use of a relation). I already hacked something together that does this.

I am currently working on inductive EA. I already created semi- and fully-inductive datasets from the OpenEA datasets and now I'm trying to get a sensible baseline using NodePiece. I got some results already, but it needs a bit of work still.

If you or @migalkin are interested in collaborating on the inductive EA stuff let me know.
As I said I have vacation (until end of next week), so I might be slow to respond in that period, but I am generally very interested in helping to add EA to Pykeen!

mberr · 2022-04-14T16:15:01Z

Thanks @dobraczka for the quick response despite being on vacation 😅 If you send me your preferred email to max.berrendorf@gmail.com, I'll send you an invite to our PyKEEN Slack.

dobraczka · 2022-04-25T13:26:02Z

I like that you already included the possibility to extend this with different combination strategies, which is what I was thinking about, but that can be done in a seperate PR.
Do you want to also do this for the OpenEA datasets or should I do that?

…-ea-datasets

trigger ci

dobraczka · 2022-04-27T13:21:59Z

src/pykeen/datasets/ea/base.py

+        # store for repr
+        self.side = side
+        # split
+        training, testing, validation = tf.split(ratios=split_ratios, random_state=random_state)


This train/test/val split would generally only make sense for a LP setting, right? In the EA setting we would want to split the alignment tuples appropriately.
But creating EA-specific settings could also be done in a seperate PR.

mberr · 2022-04-28T14:21:49Z

I removed some of the additional changes to reduce this PR to a atomic change. The old version is available here.

trigger ci

mberr · 2022-04-28T16:34:36Z

@PyKEEN-bot test

mberr added 6 commits April 13, 2022 15:57

allow to load both sides

f090397

add graph combination option

6d01318

use triple alignments, too

f30ed33

add missing paths

d943dfa

add todo

3dd4e1d

trigger ci

3d53270

mberr added 14 commits April 25, 2022 17:40

move EA datasets to own module

fbc8334

move combination into own submodule

4089400

heavy refactoring

5aac3d8

update OpenEA

cd93c1d

re-use utility

843a9aa

load OpenEA entity alignments

dfd15c9

trigger ci

6eaaaea

move sides to pykeen.typing

579fc60

move get_connected_components to utils

f5aae84

fix imports

9b20fc5

add collapse combinator

8be9ff7

update

a49465f

Merge remote-tracking branch 'origin/master' into add-merge-option-to…

3ca72db

…-ea-datasets

add more todo

2d798ac

mberr mentioned this pull request Apr 26, 2022

Add param to pass kwargs to download_from_google cthoyt/pystow#41

Merged

mberr added 4 commits April 26, 2022 17:50

Merge branch 'master' into add-merge-option-to-ea-datasets

d2b3170

check hexdigest of GDrive file using pystow

e3e48aa

trigger ci

keep alignment info

376f0f8

add missing docstring

883d9fe

trigger ci

dobraczka reviewed Apr 27, 2022

View reviewed changes

mberr added 2 commits April 28, 2022 16:18

remove multiple options

f698468

revert changes to utility

5caaac0

trigger ci

b320aeb

mberr changed the title ~~Extend EA datasets to allow loading a unified graph~~ 🌀🔗 Extend EA datasets to allow loading a unified graph Apr 28, 2022

mberr added 5 commits April 28, 2022 16:45

Python 3.7 compatibility

f4f11b6

trigger ci

Python 3.7 compatibility

c9bd7b6

trigger ci

fix double import

86bc86e

hide abstract base class

e7ea81a

trigger ci

trigger ci

a0f4198

mberr mentioned this pull request Apr 28, 2022

🧛🇪🇺 Implement more graph pair unification approaches #893

Merged

1 task

cthoyt approved these changes Apr 28, 2022

View reviewed changes

Merge branch 'master' into add-merge-option-to-ea-datasets

e33d239

Trigger CI

f934239

mberr enabled auto-merge (squash) April 28, 2022 16:35

mberr merged commit f1a77df into master Apr 28, 2022

mberr deleted the add-merge-option-to-ea-datasets branch April 28, 2022 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌀🔗 Extend EA datasets to allow loading a unified graph #871

🌀🔗 Extend EA datasets to allow loading a unified graph #871

mberr commented Apr 13, 2022

mberr commented Apr 14, 2022

dobraczka commented Apr 14, 2022

mberr commented Apr 14, 2022

dobraczka commented Apr 25, 2022

dobraczka Apr 27, 2022

mberr commented Apr 28, 2022

mberr commented Apr 28, 2022

🌀🔗 Extend EA datasets to allow loading a unified graph #871

🌀🔗 Extend EA datasets to allow loading a unified graph #871

Conversation

mberr commented Apr 13, 2022

mberr commented Apr 14, 2022

dobraczka commented Apr 14, 2022

mberr commented Apr 14, 2022

dobraczka commented Apr 25, 2022

dobraczka Apr 27, 2022

Choose a reason for hiding this comment

mberr commented Apr 28, 2022

mberr commented Apr 28, 2022