Eval node degree #42

pmitra01 · 2023-04-20T13:02:44Z

This PR contains functionality and analysis notebooks to conduct a deeper dive into the LP performance of BioBLP-X vs RotatE models, in the context of node degree of the entity that is being predicted. It inspects whether there is a trend that attribute encodings help the BioBLP models obtain better representations for entities than RotatE models in sparser regions of the graph (where the said entities have fewer degree of in/outgoing edges).

Introduces:

helper code to conduct the node degree evaluation (notebooks/nb_utils/eval_utils.py) which parameterises the model, and entity type being analysed.
retrieving and saving fuller evaluation metrics on predicting head, tail, or both sides of a triple (using pykeen's evaluation modules)
plotting functionality
artifact registry to simplify the handling of paths and the chained set of artifacts associated with any model or entity type. (See Steps to conduct the Analysis of the effect of Node Degree... below on how to use wire this up)
notebook /notebooks/07_04_01_eval_lp_node_degree_effect.ipynb to conduct node degree analysis of the LP performance of the models of one's choosing.
notebook 07_04_03_eval_lp_node_degree_effect-plot.ipynb for regenerating node degree evaluation analysis.

Prerequisites for regenerating plots (also mentioned within notebook):
* Get the json files for evaluation metrics by node degree from Google drive
* unpack and place the above archive in a directory BioBLP/notebooks/metrics

Note: The expectation is that the notebooks/nb_utils/eval_utils.py will later be subsumed within the bioblp package in a future commit once the immediate priorities of paper deadlines are in the past. For the interim, the support functionality lives within the nb_utils

Steps to conduct the Analysis of the effect of Node Degree on LP performance for model of choice:

fetch and unpack data: biokg_eval_data.tar.gz (Current drive link. This contains the metadata files that you will need to perform evaluation. It is likely that most of these data files are already available in your workspace. However, I'm shipping the entire bundle of files just in case.

update the artifact_registry.toml(current view) with the correct paths to the parent directories for pretrained KGE models, and biokg graph data artifacts. Note: The artifact_registry should in theory auto-resolve all the paths to relevant child artifacts relative to these parent directories.)

Run the notebook BioBLP/notebooks/07_04_01_eval_lp_node_degree_effect.ipynb to conduct node degree analysis of the LP performance of the models of your choice.

Future Work (Things that need to be changed):

the node degree analysis, specifically the methods of the class NodeDegreeAnalyser (in notebooks/nb_utils/eval_utils.py) and plotting functionality have confusing parameter names such as node_endpoint_typeandeval_on_node_endpoint. The former refers to the position of the entity with an attribute that we are analysing in the triples, and the latter refers to when we obtain evaluation metrics while predicting head, tail, or both. Currently it is difficult to differentiate at a glance when we talk about predicting the head/ tail` of an entity, vs when we are talking about the position in which a certain entity type like drug/protein/disease occurs.

pmitra01 · 2023-04-24T12:33:25Z

Updated the PR to point to develop branch instead of the accidental pointer to main. The 70 commits ahead of main gave me a mini yikes, now we are back to 8 commits ahead of the source branch develop.

pmitra01 force-pushed the eval-node-degree branch from 2f5f939 to 7c210c2 Compare April 24, 2023 08:44

pmitra01 changed the base branch from main to develop April 24, 2023 12:30

pmitra01 requested a review from dfdazac April 24, 2023 12:34

pmitra01 and others added 9 commits April 24, 2023 17:40

add artifactory registry and utils

c6c8568

refactor eval analysis

19f874c

refactor node degree eval

3504d9d

add test and functions for plots

0c72b55

update model paths

6cf2d44

plotting improvements and nb

f0471c5

Use relative paths to retrieve files

c1b1d7d

Use relative paths to retrieve files

f3020d4

update node degree eval plot format

ea41d9a

pmitra01 force-pushed the eval-node-degree branch from 01765d6 to ea41d9a Compare April 25, 2023 16:10

update stats and plots

8640c5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval node degree #42

Eval node degree #42

pmitra01 commented Apr 20, 2023 •

edited

pmitra01 commented Apr 24, 2023

Eval node degree #42

Are you sure you want to change the base?

Eval node degree #42

Conversation

pmitra01 commented Apr 20, 2023 • edited

pmitra01 commented Apr 24, 2023

pmitra01 commented Apr 20, 2023 •

edited