Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess existence of experimental models #33

Closed
bschilder opened this issue Nov 15, 2023 · 15 comments
Closed

Assess existence of experimental models #33

bschilder opened this issue Nov 15, 2023 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@bschilder
Copy link
Contributor

Assess whether there is an existing experimental model for each candidate therapeutics target.

We can check this by seeing if there is an MPO or UPHENO annotation for the same phenotype.

@bschilder bschilder self-assigned this Nov 15, 2023
@bschilder bschilder added the enhancement New feature or request label Nov 15, 2023
@bschilder
Copy link
Contributor Author

bschilder commented Nov 30, 2023

Found a treasure trove of data on experimental models for diseases (and perhaps specific phenotypes) on Monarch:
https://data.monarchinitiative.org/latest/tsv/model_associations/

However, these files don't include gene-level info (which we would want if we have a particular gene therapy target in mind), but I'm checking to see if there's a way I can extract that from the larger Monarch knowledge graph:
https://data.monarchinitiative.org/monarch-kg/latest/

They also only provide MONDO ID's for each disease, so I need to find an effective way to map these back to the HPO/OMIM/DECIPHER/ORPH IDs provided by HPO.
I've reached out to the MONDO ontology creators as well:

@NathanSkene
Copy link

What's argument against just using Mammalian Phenotype Ontology overlap?

Also, here's some of the messages we sent relating to this previously:

Here's one of the gene's that a mouse model for respiratory failure: http://www.informatics.jax.org/reference/J:120296

Here's the list of mammalian phenotype ontology genes (for respiratory failure): http://www.informatics.jax.org/mp/annotations/MP:0001953 (edited)

Gene therapy for ABCA3 in respiratory failure is already being looked into: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8798122/

@bschilder
Copy link
Contributor Author

What's argument against just using Mammalian Phenotype Ontology overlap?

Several reasons:

  • Monarch includes MPO, as well as other model organism databases beyond just mouse.
  • We need to map MPO to HPO terms. UPHENO provides this, which is also integrated in Monarch.

@NathanSkene
Copy link

NathanSkene commented Nov 30, 2023 via email

@bschilder
Copy link
Contributor Author

bschilder commented Dec 1, 2023

Preliminary summary plot showing the proportion of orthologous genes overlapping between HPO and non-human ontology databases (within a given phenotype), repeated across many phenotypes:

Image

Will include this in the final report as well as showing how we can use this to prioritise gene/phenotype-specific therapeutic targets.

@NathanSkene
Copy link

NathanSkene commented Dec 1, 2023 via email

@bschilder
Copy link
Contributor Author

bschilder commented Dec 1, 2023

Great, didn’t think about looking at zebrafish models etc as well! Can you explain the x-axis?

Sure!

  • n_genes_intersect: for a given phenotype that has a match between a pair of species, count the number of orthologous genes shared between the gene-phenotype annotations of each species.
  • n_genes_hpo: the total number of unique human genes annotated for a given HPO phenotype.

Dividing one over the other thus gives you the proportion of HPO gene annotations recapitulated in the equivalent phenotype of another species.

This proportion will be influenced by both evolutionary distance and how well studied each species is (notice the difference between mouse and rats, despite the fact that they're equally related to humans).

@bschilder
Copy link
Contributor Author

Here's some gene therapy target phenotypes identified by our previous analyses. The exact phenotypes will likely change once we add chatGPT annotations to our filtering strategy with the round of enrichment results. But for now these can serve as an example.

with the heatmap colored by the "equivalence score", which is essentially UPHENO's way of quantifying how well a phenotype matches up across species (on a scale from 0-1). Data comes from here.

Currently the fuzzy equivalence score is the Jaccard similarity:

Not sure exactly on what basis they computed Jaccard similarity, but I'll look into this some more.

upheno_top_targets_heatmap.pdf

Looks like UPHENO has been thinking about adding fly ontology mappings as well, though there hasn't been any activity on this since 2016 it seems. Just pinged them to get an update:

@bschilder
Copy link
Contributor Author

Currently the fuzzy equivalence score is the Jaccard similarity
Not sure exactly on what basis they computed Jaccard similarity, but I'll look into this some more.

This HPO publication, in which they did the mapping with Exomiser.

For example, Exomiser (15) leverages the semantic associations between HPO, MP and ZP to prioritize variants effectively by matching human phenotypic abnormalities with phenotypes observed in animal models with knockouts of genes orthologous to human disease-associated genes.

Though this figure suggests there's also already mapping between fly and frog as well. I'll reach out to the HPO team to confirm where i might find this, and to confirm the methodology they used to do the phenotype mapping:

Image

@matentzn
Copy link

matentzn commented Dec 4, 2023

@bschilder would you be up for a quick call on the matter? I will sort you out with fuzzy and proper matches as well.

@bschilder
Copy link
Contributor Author

@bschilder would you be up for a quick call on the matter? I will sort you out with fuzzy and proper matches as well.

Absolutely! Thank you so much for reaching out! Setting up a time for us to meet.

@bschilder
Copy link
Contributor Author

bschilder commented Dec 8, 2023

Met with @matentzn who was extremely helpful in explaining the cross-species phenotype matching procedure to me, and pointing me to some additional resources.

For mapping MONDO IDs in the Monarch model's file, I'm switching to using this file as it avoid issues observed here:

With these changes, HPOExplorer can now map >90% of MONDO ids listed in the model file to OMIM IDs:

library(HPOExplorer)
>  model <- get_monarch("disease_to_model")
 [100%] Downloaded 883280 bytes...
>   model$db <- stringr::str_split(model$subject,":", simplify = TRUE)[,1]
>   model <- map_mondo(dat = model,
+                      input_col="object",
+                      output_col="OMIM_ID",
+                      to=c("OMIM","Orphanet"))
 [100%] Downloaded 1082741 bytes...
476 / 5,154 (9.24%) OMIM_ID missing.

The only issue is, as far as I can tell MONDO doesn't seem to contain any mappings between MONDO IDs and DECIPHER IDs. DECIPHER IDs only make up a small fraction of the HPO annotations, but would be nice to have a complete mapping nonetheless:

> phenos <- make_phenos_dataframe(add_disease_data = TRUE)
> phenos$disease_db <- stringr::str_split(phenos$disease_id,":", simplify = TRUE)[,1]
>  table(phenos$disease_db)

Screenshot 2023-12-08 at 23 12 22

@bschilder
Copy link
Contributor Author

To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.

@matentzn this is probably a poor attempt to explain this properly, but if there's a paper or docs page you could point me to that would be quite helpful! Thanks!

@matentzn
Copy link

matentzn commented Dec 9, 2023

DECIPHER

We have this for DECIPHER: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo_hasdbxref_decipher.sssom.tsv

Which will do the job for you!

To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.

Its simpler than that.

  1. We generate phenotypic profiles from ontologies, using jaccard similarity usually over the hierarchical relations in the ontology and information content for the reranking
  2. Cool Paper: https://www.osti.gov/biblio/1625303 with background
  3. The current "bestmatches" include a mix of logical and simple lexical matches and are hugely out of date (I would not use them in production, but they are probably "not wrong"

I requested an FBcv profile for you here: monarch-initiative/monarch-semantic-similarity-profiles#16

So you can take a look how it looks like.

@bschilder
Copy link
Contributor Author

DECIPHER

We have this for DECIPHER: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo_hasdbxref_decipher.sssom.tsv

Which will do the job for you!

Ah, amazing! I had totally missed that bc i was using this file, which I assumed included all the other ones:
https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo.sssom.tsv

I've implemented many of these functions within a new package for accessing/processing knowledge graphs in general (HPOExplorer was getting to bloated):
https://github.com/neurogenomics/KGExplorer/blob/29eccbbd33fd18d9ce85b0ae72b47d485d97faee/R/map_upheno_data_i.R

I was also just alerted to the monarchr package, which may extract much of the info i need more efficiently than I am now (which relies mostly on TSV downloads).

I've also begun exploring some of the graph query resources/tools you alerted to me on our call:

To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.

Its simpler than that.

  1. We generate phenotypic profiles from ontologies, using jaccard similarity usually over the hierarchical relations in the ontology and information content for the reranking
  2. Cool Paper: https://www.osti.gov/biblio/1625303 with background
  3. The current "bestmatches" include a mix of logical and simple lexical matches and are hugely out of date (I would not use them in production, but they are probably "not wrong"

Ahhh, this makes so much more sense now! Thanks for explaining that in more detail, and for the paper (super interesting work!). Along those lines, I've found the rphenoscape package useful for computing cross-ontology similarity matrices on the go.

I requested an FBcv profile for you here: monarch-initiative/monarch-semantic-similarity-profiles#16

So you can take a look how it looks like.

Thank you so much! I really appreciate this, and all your other help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants