New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ConceptNet #160
Add ConceptNet #160
Conversation
Closes #15
Not necessarily required for this PR, but it might be interesting to allow keeping the meta-data for triples, e.g. to support qualifiers, cf. e.g. https://www.aclweb.org/anthology/2020.emnlp-main.596/ |
@mberr this PR is technically ready, but the implementation of dataset cleanup using |
Is this for randomized cleanup? This one moves one triple at a time, right? |
@cthoyt I think I have an idea for another cleanup implemention 🙂 |
I removed the blocked label since #187 has been merged to |
@PyKEEN-bot test |
Trigger CI
@@ -49,6 +51,8 @@ | |||
'WN18', | |||
'WN18RR', | |||
'YAGO310', | |||
'DRKG', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, but it was missing and this is a dataset related PR so I threw it in there (the docs weren't showing it becuase I forgot to include it in all)
Closes #15
This PR adds the ConceptNet dataset to PyKEEN. It is distributed as a single gzipped TSV file, so it gets automatically split into training/testing/validations sets. This PR required a small extension to the
SingleTabbedDataset
base class to allow for the specification ofusecols
since there are 5 columns in the file (edge URL, relation, head, tail, metadata).CC @isspek this could be useful for you given your interest in #2
Results
Caveats
There's a small concern that the library has some relations ending in
_inverse
, which PyKEEN interprets in a special way and throws this warning (when runningpython -m pykeen.datasets.conceptnet
to make a summary):TODO
_inverse
suffixes