Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document PyKEEN's performance tweaks #56

Merged
merged 17 commits into from
Aug 12, 2020
Merged

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Jul 21, 2020

Closes #55

This documentation should be a high-level explanation to non-technical users about what is in PyKEEN that makes it better than others. First thing to start with would be copying over the stuff written for the software paper by @lvermue

  • Aligned entity/relation ID and embeddings vector position
  • Automatic memory optimization
  • Sub-batching
  • Improvements enabled by separate implementations of score_ht, score_rt, score_hr, score_hrt
  • Filtering with index-based masking
  • Fast OWA(Not implemented yet)

Any other ideas?

@lvermue
Copy link
Member

lvermue commented Jul 24, 2020

@mberr I think that this covers most of what we've done so far.
There definitely has to be done some fine-tuning to the formatting and notation.
Anyway, any comments are welcome :)

@cthoyt Your scrutiny is welcome as well :)

Update description of TriplesFactory
@mberr
Copy link
Member

mberr commented Jul 25, 2020

I revised the first part about the TriplesFactory / ID-based triple representation. It would be nice to properly link to the properties of TriplesFactory such as entity_label_to_id, etc.

@mberr
Copy link
Member

mberr commented Jul 25, 2020

I am not sure whether this is the correct place, but somewhere we should also highlight the risk of manually modifying label-to-ID mappings, and the necessity of keeping the mappings consistent between train/test/validation.

@cthoyt
Copy link
Member Author

cthoyt commented Jul 25, 2020

@mberr ive written a bit about this in the "bring your own data" tutorial and also made some improvements to it in #54 that haven't yet been merged

@cthoyt cthoyt added the documentation Improvements or additions to documentation label Jul 26, 2020
lvermue and others added 3 commits July 27, 2020 01:07
I made some improvements to the language, and also started to improve the notation (it was pretty confusing before with all of the stars, since RST interpreted them as italics)
@cthoyt cthoyt marked this pull request as ready for review August 6, 2020 12:44
@cthoyt cthoyt requested a review from mberr August 6, 2020 12:44
@cthoyt
Copy link
Member Author

cthoyt commented Aug 6, 2020

@lvermue I made edits for clarity and improved the notation used to be a bit more consistent. Maybe you want to look into the algorithm at the bottom of "Filtering with Index-based Masking" to decide if you think this needs more notation, or if it's okay just as text

@cthoyt cthoyt requested review from lvermue and mali-git August 6, 2020 12:45
@cthoyt
Copy link
Member Author

cthoyt commented Aug 12, 2020

@mali-git @mberr thanks for looking at this, but lets still wait for @lvermue to see if he's happy

@cthoyt cthoyt added this to the PyKEEN 1.0.3 milestone Aug 12, 2020
@cthoyt cthoyt merged commit 22f8815 into master Aug 12, 2020
@cthoyt cthoyt deleted the add-peformance-explanation branch August 12, 2020 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High-level description of performance optimizations
4 participants