Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval node degree #42

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from
Open

Eval node degree #42

wants to merge 10 commits into from

Conversation

pmitra01
Copy link
Collaborator

@pmitra01 pmitra01 commented Apr 20, 2023

This PR contains functionality and analysis notebooks to conduct a deeper dive into the LP performance of BioBLP-X vs RotatE models, in the context of node degree of the entity that is being predicted. It inspects whether there is a trend that attribute encodings help the BioBLP models obtain better representations for entities than RotatE models in sparser regions of the graph (where the said entities have fewer degree of in/outgoing edges).

Introduces:

  • helper code to conduct the node degree evaluation (notebooks/nb_utils/eval_utils.py) which parameterises the model, and entity type being analysed.
  • retrieving and saving fuller evaluation metrics on predicting head, tail, or both sides of a triple (using pykeen's evaluation modules)
  • plotting functionality
  • artifact registry to simplify the handling of paths and the chained set of artifacts associated with any model or entity type. (See Steps to conduct the Analysis of the effect of Node Degree... below on how to use wire this up)
  • notebook /notebooks/07_04_01_eval_lp_node_degree_effect.ipynb to conduct node degree analysis of the LP performance of the models of one's choosing.
  • notebook 07_04_03_eval_lp_node_degree_effect-plot.ipynb for regenerating node degree evaluation analysis.

Prerequisites for regenerating plots (also mentioned within notebook):
* Get the json files for evaluation metrics by node degree from Google drive
* unpack and place the above archive in a directory BioBLP/notebooks/metrics

Note: The expectation is that the notebooks/nb_utils/eval_utils.py will later be subsumed within the bioblp package in a future commit once the immediate priorities of paper deadlines are in the past. For the interim, the support functionality lives within the nb_utils


Steps to conduct the Analysis of the effect of Node Degree on LP performance for model of choice:

  1. fetch and unpack data: biokg_eval_data.tar.gz (Current drive link. This contains the metadata files that you will need to perform evaluation. It is likely that most of these data files are already available in your workspace. However, I'm shipping the entire bundle of files just in case.
  2. update the artifact_registry.toml(current view) with the correct paths to the parent directories for pretrained KGE models, and biokg graph data artifacts. Note: The artifact_registry should in theory auto-resolve all the paths to relevant child artifacts relative to these parent directories.)
  3. Run the notebook BioBLP/notebooks/07_04_01_eval_lp_node_degree_effect.ipynb to conduct node degree analysis of the LP performance of the models of your choice.

Future Work (Things that need to be changed):

  • the node degree analysis, specifically the methods of the class NodeDegreeAnalyser (in notebooks/nb_utils/eval_utils.py) and plotting functionality have confusing parameter names such as node_endpoint_typeandeval_on_node_endpoint. The former refers to the position of the entity with an attribute that we are analysing in the triples, and the latter refers to when we obtain evaluation metrics while predicting head, tail, or both. Currently it is difficult to differentiate at a glance when we talk about predicting the head/ tail` of an entity, vs when we are talking about the position in which a certain entity type like drug/protein/disease occurs.

@pmitra01 pmitra01 changed the base branch from main to develop April 24, 2023 12:30
@pmitra01
Copy link
Collaborator Author

Updated the PR to point to develop branch instead of the accidental pointer to main. The 70 commits ahead of main gave me a mini yikes, now we are back to 8 commits ahead of the source branch develop.

@pmitra01 pmitra01 requested a review from dfdazac April 24, 2023 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants