Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use remote HPO when running the notebooks #82

Merged
merged 16 commits into from
Apr 18, 2024
Merged

Use remote HPO when running the notebooks #82

merged 16 commits into from
Apr 18, 2024

Conversation

ielis
Copy link
Member

@ielis ielis commented Mar 5, 2024

Use remote HPO when running the FBN1 notebooks and suggest downloading HPO from a remote URL in the documentation.

We need this functionality to run CI in pyphetools.

The PR, however, needs pyphetools to be released first, because the current pyphetools version does not allow using a URL, and the FBN1 notebooks would explode when running HpoParser, since HpoParser currently needs hp.json as an existing local file, and URL is not a local file.

Therefore, if using remote HPO sounds OK, we first must do the following, before accepting this PR:

  • merge the PR with the CI workflow in pyphetools that runs FBN1 notebooks here
  • release pyphetools with a new patch version
  • merge this PR and possibly update the FBN1 phenopackets for consistency.
  • update the target commit in the pyphetools notebook CI workflow, basically to fix the line 6 of the workflow:
    image

@pnrobinson pls give me thumbs up if sounds good, or let me know if anything looks fishy. Thanks!

@ielis ielis marked this pull request as ready for review March 5, 2024 20:15
@pnrobinson
Copy link
Member

The current version of pyphetools does allow not indicating the URL and this will pull back the latest version of hp.json. The problem is that apparently on some systems there is an SSL error. Probably we need to document that the user will need to do this including https://cheapsslweb.com/blog/ssl-certificate-verify-failed-error-in-python.
This appears to be working on your machine, but I am guessing this is something that can go wrong on machines "out there" so we need to anticipate this problem?

@ielis
Copy link
Member Author

ielis commented Mar 13, 2024

@pnrobinson
After adding OntologyStore to hpo-toolkit==0.5.0, we can easily load the latest HPO (or a specific version) without unnecessary network traffic.

I updated few FBN1 notebooks to showcase the usage (e.g. cells 2..4 here).

I'd like to merge this PR to finalize setting up CI in pyphetools. The merge will update all the FBN1 notebooks AND the phenopackets. The changes in phenopackets include:

  • moving the time at last encounter to disease | onset time of phenopackets are mainly the
  • update of timestamps
  • update of variation descriptor id

The changes are not a result of my work but arise from the other pyphetools updates.

Pls give me a 👍 if it looks good!

@ielis ielis requested a review from pnrobinson March 13, 2024 19:57
@pnrobinson
Copy link
Member

@ielis
Sorry, when I first saw this it was work in progress.
pyphetools can download remote hp.json, but AFAICS the problem is that with certain settings, hpotk fails.
Here are possible solutions:
https://support.chainstack.com/hc/en-us/articles/9117198436249-Common-SSL-Issues-on-Python-and-How-to-Fix-it
Some could be added to the notebooks, e.g.,

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

possibly others could be added to hpotk, e.g.

import ssl 
import certifi from urllib.request 
import urlopen 

request = "https://.../hp.json" 
urlopen(request, context=ssl.create_default_context(cafile=certifi.where()))

(not tested)
but I would not think we need to change the code in phenopacket-store?

@ielis
Copy link
Member Author

ielis commented Apr 12, 2024

The SSL issue should be resolved in hpo-toolkit>0.5.0, however, I haven't had a chance to test it because never saw the SSL issue.

@pnrobinson
Copy link
Member

@ielis not sure where we are with this PR. There are some conflicts because I cleaned up the repo. Can we tidy up and merge? Also, it would be nice to have a default place to download the hp.json to

# Conflicts:
#	docs/contributing/python_notebook.md
#	notebooks/FBN1/Comeglio_isolated_ectopia_lentis.ipynb
#	notebooks/FBN1/FBN1_lipodystrophy_Lin_2020.ipynb
#	notebooks/FBN1/Katzke_2002_Marfan.ipynb
#	notebooks/FBN1/LeGoff_FBN1_acromicric_and_geleophysic_dysplasia.ipynb
#	notebooks/FBN1/Loeys2010_stiff_skin_syndrome.ipynb
#	notebooks/FBN1/Palz_Marfan_2000.ipynb
#	notebooks/FBN1/Summary_FBN1.ipynb
#	notebooks/FBN1/Tiecke_2001_Marfan.ipynb
#	notebooks/FBN1/phenopackets/PMID_10756346_B13.json
#	notebooks/FBN1/phenopackets/PMID_10756346_B32.json
#	notebooks/FBN1/phenopackets/PMID_10756346_B55.json
#	notebooks/FBN1/phenopackets/PMID_10756346_B73.json
#	notebooks/FBN1/phenopackets/PMID_10756346_D22.json
#	notebooks/FBN1/phenopackets/PMID_10756346_D46.json
#	notebooks/FBN1/phenopackets/PMID_11175294_B15.json
#	notebooks/FBN1/phenopackets/PMID_11175294_B22.json
#	notebooks/FBN1/phenopackets/PMID_11175294_B27.json
#	notebooks/FBN1/phenopackets/PMID_11175294_B35.json
#	notebooks/FBN1/phenopackets/PMID_11175294_B47.json
#	notebooks/FBN1/phenopackets/PMID_11175294_B7.json
#	notebooks/FBN1/phenopackets/PMID_11175294_D1.json
#	notebooks/FBN1/phenopackets/PMID_11175294_D13.json
#	notebooks/FBN1/phenopackets/PMID_11175294_D27.json
#	notebooks/FBN1/phenopackets/PMID_11175294_D36.json
#	notebooks/FBN1/phenopackets/PMID_11175294_D54.json
#	notebooks/FBN1/phenopackets/PMID_11175294_D60.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B1.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B18.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B19.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B3.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B37.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B43.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B45.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B51.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B52.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B60.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B62.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B65.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B79.json
#	notebooks/FBN1/phenopackets/PMID_12203992_B9.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D10.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D15.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D16.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D19.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D2.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D21.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D26.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D30.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D31.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D37.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D40.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D5.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D55.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D59.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D62.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D63.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D67.json
#	notebooks/FBN1/phenopackets/PMID_12203992_D7.json
#	notebooks/FBN1/phenopackets/PMID_12446365_BM.json
#	notebooks/FBN1/phenopackets/PMID_12446365_JL.json
#	notebooks/FBN1/phenopackets/PMID_12446365_OP.json
#	notebooks/FBN1/phenopackets/PMID_12446365_RWT.json
#	notebooks/FBN1/phenopackets/PMID_12446365_VW.json
#	notebooks/FBN1/phenopackets/PMID_20375004_1-II1.json
#	notebooks/FBN1/phenopackets/PMID_20375004_1-III2.json
#	notebooks/FBN1/phenopackets/PMID_20375004_2-III1.json
#	notebooks/FBN1/phenopackets/PMID_20375004_2-IV2.json
#	notebooks/FBN1/phenopackets/PMID_20375004_3-I1.json
#	notebooks/FBN1/phenopackets/PMID_20375004_3-II2.json
#	notebooks/FBN1/phenopackets/PMID_20375004_3-II3.json
#	notebooks/FBN1/phenopackets/PMID_20375004_4-II1.json
#	notebooks/FBN1/phenopackets/PMID_21683322_1.json
#	notebooks/FBN1/phenopackets/PMID_21683322_10.json
#	notebooks/FBN1/phenopackets/PMID_21683322_11.json
#	notebooks/FBN1/phenopackets/PMID_21683322_12.json
#	notebooks/FBN1/phenopackets/PMID_21683322_13.json
#	notebooks/FBN1/phenopackets/PMID_21683322_14.json
#	notebooks/FBN1/phenopackets/PMID_21683322_15.json
#	notebooks/FBN1/phenopackets/PMID_21683322_16.json
#	notebooks/FBN1/phenopackets/PMID_21683322_17.json
#	notebooks/FBN1/phenopackets/PMID_21683322_18.json
#	notebooks/FBN1/phenopackets/PMID_21683322_19.json
#	notebooks/FBN1/phenopackets/PMID_21683322_2.json
#	notebooks/FBN1/phenopackets/PMID_21683322_20.json
#	notebooks/FBN1/phenopackets/PMID_21683322_21.json
#	notebooks/FBN1/phenopackets/PMID_21683322_22-a.json
#	notebooks/FBN1/phenopackets/PMID_21683322_22-b.json
#	notebooks/FBN1/phenopackets/PMID_21683322_22-c.json
#	notebooks/FBN1/phenopackets/PMID_21683322_23.json
#	notebooks/FBN1/phenopackets/PMID_21683322_24.json
#	notebooks/FBN1/phenopackets/PMID_21683322_25.json
#	notebooks/FBN1/phenopackets/PMID_21683322_26.json
#	notebooks/FBN1/phenopackets/PMID_21683322_27-a.json
#	notebooks/FBN1/phenopackets/PMID_21683322_27-b.json
#	notebooks/FBN1/phenopackets/PMID_21683322_28.json
#	notebooks/FBN1/phenopackets/PMID_21683322_29.json
#	notebooks/FBN1/phenopackets/PMID_21683322_3.json
#	notebooks/FBN1/phenopackets/PMID_21683322_4.json
#	notebooks/FBN1/phenopackets/PMID_21683322_5.json
#	notebooks/FBN1/phenopackets/PMID_21683322_6.json
#	notebooks/FBN1/phenopackets/PMID_21683322_7.json
#	notebooks/FBN1/phenopackets/PMID_21683322_8.json
#	notebooks/FBN1/phenopackets/PMID_21683322_9.json
@ielis
Copy link
Member Author

ielis commented Apr 18, 2024

I re-ran the notebooks and updated the phenopackets, and the PR should be ready for merge now.

Also, it would be nice to have a default place to download the hp.json to.

Yes, we have OntologyStore API since hpotk==0.5.0.

This is how to use it:

import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2024-01-16')

Note, the release is optional. The store will get you the latest HPO from GitHub if you leave it out. Please use it everywhere from now on.

Pls give me a 👍 if this looks OK and we can merge.

Copy link
Member

@pnrobinson pnrobinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is running all of the notebooks with each release? How will we know if there was an error?

@ielis
Copy link
Member Author

ielis commented Apr 18, 2024

So this is running all of the notebooks with each release?

No, not in phenopacket-store but in pyphetools - the screenshot in the 1st PR post shows how pyphetools can use phenopacket-store in CI. This is ensure that the documentation is accurate - the notebooks are up to date. In other words, if we make a breaking change in pyphetools we should propagate the change across the notebooks to prevent users' frustration.

How will we know if there was an error?

The CI will fail if the notebook does not finish with exit status of 0.


The CI only runs FBN1 notebooks because running everything would take too long. However, we should consider adding a GitHub action that will run all the notebooks and package the phenopackets upon each phenopacket-store release. There are ways for doing that..

@ielis ielis merged commit 61014f1 into main Apr 18, 2024
@ielis ielis deleted the pyphe-ci-wip branch April 18, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants