Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains functionality and analysis notebooks to conduct a deeper dive into the LP performance of BioBLP-X vs RotatE models, in the context of node degree of the entity that is being predicted. It inspects whether there is a trend that attribute encodings help the BioBLP models obtain better representations for entities than RotatE models in sparser regions of the graph (where the said entities have fewer degree of in/outgoing edges).
Introduces:
notebooks/nb_utils/eval_utils.py
) which parameterises the model, and entity type being analysed.head
,tail
, orboth
sides of a triple (using pykeen's evaluation modules)Steps to conduct the Analysis of the effect of Node Degree...
below on how to use wire this up)Note: The expectation is that the notebooks/nb_utils/eval_utils.py will later be subsumed within the bioblp package in a future commit once the immediate priorities of paper deadlines are in the past. For the interim, the support functionality lives within the nb_utils
Steps to conduct the Analysis of the effect of Node Degree on LP performance for model of choice:
Future Work (Things that need to be changed):
NodeDegreeAnalyser
(innotebooks/nb_utils/eval_utils.py) and plotting functionality have confusing parameter names such as
node_endpoint_typeand
eval_on_node_endpoint. The former refers to the position of the entity with an attribute that we are analysing in the triples, and the latter refers to when we obtain evaluation metrics while predicting
head,
tail, or
both. Currently it is difficult to differentiate at a glance when we talk about predicting the
head/
tail` of an entity, vs when we are talking about the position in which a certain entity type like drug/protein/disease occurs.