Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve vectorization of novelty computation #51

Merged
merged 64 commits into from Aug 12, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
843c845
Improve vectorization
mberr Jul 12, 2020
850abe2
Improve docstring
mberr Jul 12, 2020
54c720a
Extract function from method
mberr Jul 14, 2020
9b5c602
Add unittest for get_novelty_mask
mberr Jul 14, 2020
0237a1a
Fix line-too-long
mberr Jul 14, 2020
ae2a3f7
Add ability to include testing novelties
cthoyt Jul 15, 2020
ae9f78c
Add tests for prediction
cthoyt Jul 15, 2020
6ec3959
Merge branch 'master' into improve-novelty-computation
cthoyt Jul 15, 2020
b91425b
Add more tests
cthoyt Jul 16, 2020
0a34fc3
Add utility to predict high-score triples
mberr Jul 22, 2020
065b761
Adjust warning message
mberr Jul 22, 2020
e20d891
Use itertools to combine independent nested loops
cthoyt Jul 22, 2020
6bbc8d5
Add docs
cthoyt Jul 22, 2020
83de09f
Inline variable
mberr Jul 23, 2020
ed9b49a
Rename function and make k Optional
mberr Jul 23, 2020
7b122b7
Add warnings
mberr Jul 23, 2020
ad23942
Allow computing scores for all triples
mberr Jul 23, 2020
9be760f
Fix typo
mberr Jul 23, 2020
0782eca
Fix docstring
mberr Jul 23, 2020
8de7d79
Fix docstring
mberr Jul 23, 2020
004c0fe
Change variable name for consistency
mberr Jul 23, 2020
9c3b181
Fix accumulation device when scoring all triples
mberr Jul 24, 2020
c05b0e2
Actually fix device
mberr Jul 24, 2020
e75dc3e
Improve docstring
mberr Jul 24, 2020
86e2fb1
do not track gradients for score_all_triples
mberr Jul 24, 2020
8bb8742
Fix documentation generation
cthoyt Jul 26, 2020
320a12b
Fix name
cthoyt Jul 26, 2020
50e27e7
Add additional test
cthoyt Jul 26, 2020
a1ebd8c
Reorganize test pipeline
cthoyt Jul 26, 2020
0ce09c7
Update test_models.py
cthoyt Jul 26, 2020
83d6309
Add function for labeling tensor
cthoyt Jul 26, 2020
eafb67f
Pass flake8
cthoyt Jul 26, 2020
a2e285c
Return scores
mberr Jul 27, 2020
4ff0853
Add dictionary inversion utility
mberr Jul 27, 2020
39b51ed
Add invert_mapping to __all__
mberr Jul 10, 2020
42d339a
Create properties for ID-to-label conversion
mberr Jul 27, 2020
ea08a4b
Accelerate tensor_to_df
mberr Jul 27, 2020
18b6309
Add unittest
mberr Jul 27, 2020
2b4e79c
Fix unittest
mberr Jul 27, 2020
2fc3b28
Extend unittest
mberr Jul 27, 2020
988077c
Fix unittest
mberr Jul 27, 2020
5d6d2d4
Fix case where number of elements in batch is smaller than k
mberr Jul 27, 2020
d88b7a4
Improved input validation
mberr Jul 27, 2020
4e148a4
Fix line-too-long
mberr Jul 30, 2020
1068033
Merge branch 'master' into improve-novelty-computation
cthoyt Aug 11, 2020
4b98d1d
Fix tuple addition
cthoyt Aug 12, 2020
b478613
Make sure scores get sorted too!
cthoyt Aug 12, 2020
4e9426b
Return sweet sweet dataframes
cthoyt Aug 12, 2020
0cbee80
Update docs
cthoyt Aug 12, 2020
05db6d6
Factor out update logic
cthoyt Aug 12, 2020
fcf2956
Update docs
cthoyt Aug 12, 2020
3f16ab8
Update base.py
mberr Aug 12, 2020
793c59c
Fix dataframe processing
cthoyt Aug 12, 2020
9386a5d
Begin implementing df postprocessing for score_all_triples
cthoyt Aug 12, 2020
e66bace
Add predict all to notebook
cthoyt Aug 12, 2020
322d0a6
Add .values
mberr Aug 12, 2020
d9b5f66
Fix type annotation
mberr Aug 12, 2020
e932893
Add quick and dirty solution for get_novelty_all_mask
mberr Aug 12, 2020
da26ae4
Fix returns
cthoyt Aug 12, 2020
469064f
Fix tuple checking
cthoyt Aug 12, 2020
3e5f029
Update notebooks
cthoyt Aug 12, 2020
53d8fd4
Improve cosmetics of membership check
cthoyt Aug 12, 2020
9b625f3
Add more tests becuase I have no idea what's wrong
cthoyt Aug 12, 2020
4a7cf20
Fix negation
cthoyt Aug 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Expand Up @@ -12,6 +12,7 @@ PyKEEN
tutorial/understanding_evaluation
tutorial/running_hpo
tutorial/using_mlflow
tutorial/making_predictions

.. toctree::
:caption: Reference
Expand Down
13 changes: 4 additions & 9 deletions docs/source/reference/models.rst
Expand Up @@ -6,15 +6,10 @@ Models

Base Classes
------------
.. currentmodule:: pykeen.models.base
.. autosummary::
:toctree: generated/

Model
EntityEmbeddingModel
EntityRelationEmbeddingModel
MultimodalModel

.. automodapi:: pykeen.models.base
:no-inheritance-diagram:
:no-heading:
:headings: ~~

Initialization
--------------
Expand Down
43 changes: 43 additions & 0 deletions docs/source/tutorial/making_predictions.rst
@@ -0,0 +1,43 @@
Novel Link Prediction
=====================
After training, the interaction model (e.g., TransE, ConvE, RotatE) can assign a score to an arbitrary triple,
whether it appeared during training, testing, or not. In PyKEEN, each is implemented such that the higher the score
(or less negative the score), the more likely a triple is to be true.

However, for most models, these scores do not have obvious statistical interpretations. This has two main consequences:

1. The score for a triple from one model can not be compared to the score for that triple from another model
2. There is no *a priori* minimum score for a triple to be labeled as true, so predictions must be given as
a prioritization by sorting a set of triples by their respective scores.

After training a model, there are three high-level interfaces for making predictions:

1. :func:`pykeen.models.Model.predict_tails` for a given head/relation pair
2. :func:`pykeen.models.Model.predict_heads` for a given relation/tail pair
3. :func:`pykeen.models.Model.score_all_triples` for prioritizing links

Scientifically, :func:`pykeen.models.Model.score_all_triples` is the most interesting in a scenario where
predictions could be tested and validated experimentally.

.. code-block:: python

from pykeen.pipeline import pipeline
results = pipeline(dataset='Nations', model='RotatE')
model = results.model

# Predict tails
predicted_tails_df = model.predict_tails('brazil', 'intergovorgs')

# Predict heads
predicted_heads_df = model.predict_heads('conferences', 'brazil')

# Score All triples
predictions_df = model.score_all_triples()


Potential Caveats
-----------------
The model is trained on its ability to predict the appropriate tail for a given head/relation pair as well as its
ability to predict the appropriate head for a given relation/tail pair. This means that while the model can
technically predict relations between a given head/tail pair, it must be done with the caveat that it was not
trained for this task.