Assess existence of experimental models #33

bschilder · 2023-11-15T17:12:48Z

Assess whether there is an existing experimental model for each candidate therapeutics target.

We can check this by seeing if there is an MPO or UPHENO annotation for the same phenotype.

bschilder · 2023-11-30T13:11:53Z

Found a treasure trove of data on experimental models for diseases (and perhaps specific phenotypes) on Monarch:
https://data.monarchinitiative.org/latest/tsv/model_associations/

However, these files don't include gene-level info (which we would want if we have a particular gene therapy target in mind), but I'm checking to see if there's a way I can extract that from the larger Monarch knowledge graph:
https://data.monarchinitiative.org/monarch-kg/latest/

They also only provide MONDO ID's for each disease, so I need to find an effective way to map these back to the HPO/OMIM/DECIPHER/ORPH IDs provided by HPO.
I've reached out to the MONDO ontology creators as well:

mondo-base.obo: MONDO IDs do not match between $xref and $id monarch-initiative/mondo#6873

NathanSkene · 2023-11-30T13:21:42Z

What's argument against just using Mammalian Phenotype Ontology overlap?

Also, here's some of the messages we sent relating to this previously:

Here's one of the gene's that a mouse model for respiratory failure: http://www.informatics.jax.org/reference/J:120296

Here's the list of mammalian phenotype ontology genes (for respiratory failure): http://www.informatics.jax.org/mp/annotations/MP:0001953 (edited)

Gene therapy for ABCA3 in respiratory failure is already being looked into: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8798122/

bschilder · 2023-11-30T13:24:46Z

What's argument against just using Mammalian Phenotype Ontology overlap?

Several reasons:

Monarch includes MPO, as well as other model organism databases beyond just mouse.
We need to map MPO to HPO terms. UPHENO provides this, which is also integrated in Monarch.

NathanSkene · 2023-11-30T13:25:56Z

Sounds good! Sent from Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Brian M. Schilder ***@***.***> Sent: Thursday, November 30, 2023 1:24:56 PM To: neurogenomics/RareDiseasePrioritisation ***@***.***> Cc: Skene, Nathan G ***@***.***>; Comment ***@***.***> Subject: Re: [neurogenomics/RareDiseasePrioritisation] Assess existence of experimental models (Issue #33) This email from ***@***.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders list<https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email stamping for this address. What's argument against just using Mammalian Phenotype Ontology overlap? Several reasons: * Monarch includes MPO, as well as other model organism databases beyond just mouse. * We need to map MPO to HPO terms. UPHENO provides this, which is also integrated in Monarch. — Reply to this email directly, view it on GitHub<#33 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AH5ZPEZIEVU2HSDYDGW4EDLYHCCKRAVCNFSM6AAAAAA7M2TMMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZTG44DCMRQHE>. You are receiving this because you commented.Message ID: ***@***.***>

bschilder · 2023-12-01T11:40:39Z

Preliminary summary plot showing the proportion of orthologous genes overlapping between HPO and non-human ontology databases (within a given phenotype), repeated across many phenotypes:

Will include this in the final report as well as showing how we can use this to prioritise gene/phenotype-specific therapeutic targets.

NathanSkene · 2023-12-01T11:42:28Z

Great, didn’t think about looking at zebrafish models etc as well! Can you explain the x-axis?

bschilder · 2023-12-01T11:48:55Z

Great, didn’t think about looking at zebrafish models etc as well! Can you explain the x-axis?

Sure!

n_genes_intersect: for a given phenotype that has a match between a pair of species, count the number of orthologous genes shared between the gene-phenotype annotations of each species.
n_genes_hpo: the total number of unique human genes annotated for a given HPO phenotype.

Dividing one over the other thus gives you the proportion of HPO gene annotations recapitulated in the equivalent phenotype of another species.

This proportion will be influenced by both evolutionary distance and how well studied each species is (notice the difference between mouse and rats, despite the fact that they're equally related to humans).

bschilder · 2023-12-01T16:15:13Z

Here's some gene therapy target phenotypes identified by our previous analyses. The exact phenotypes will likely change once we add chatGPT annotations to our filtering strategy with the round of enrichment results. But for now these can serve as an example.

with the heatmap colored by the "equivalence score", which is essentially UPHENO's way of quantifying how well a phenotype matches up across species (on a scale from 0-1). Data comes from here.

Currently the fuzzy equivalence score is the Jaccard similarity:

Not sure exactly on what basis they computed Jaccard similarity, but I'll look into this some more.

upheno_top_targets_heatmap.pdf

Looks like UPHENO has been thinking about adding fly ontology mappings as well, though there hasn't been any activity on this since 2016 it seems. Just pinged them to get an update:

Inter-ontology Closest Matches drosophila obophenotype/upheno#207

bschilder · 2023-12-01T16:33:17Z

Currently the fuzzy equivalence score is the Jaccard similarity
Not sure exactly on what basis they computed Jaccard similarity, but I'll look into this some more.

This HPO publication, in which they did the mapping with Exomiser.

For example, Exomiser (15) leverages the semantic associations between HPO, MP and ZP to prioritize variants effectively by matching human phenotypic abnormalities with phenotypes observed in animal models with knockouts of genes orthologous to human disease-associated genes.

Though this figure suggests there's also already mapping between fly and frog as well. I'll reach out to the HPO team to confirm where i might find this, and to confirm the methodology they used to do the phenotype mapping:

matentzn · 2023-12-04T13:07:47Z

@bschilder would you be up for a quick call on the matter? I will sort you out with fuzzy and proper matches as well.

bschilder · 2023-12-04T13:46:20Z

@bschilder would you be up for a quick call on the matter? I will sort you out with fuzzy and proper matches as well.

Absolutely! Thank you so much for reaching out! Setting up a time for us to meet.

bschilder · 2023-12-08T23:13:10Z

Met with @matentzn who was extremely helpful in explaining the cross-species phenotype matching procedure to me, and pointing me to some additional resources.

For mapping MONDO IDs in the Monarch model's file, I'm switching to using this file as it avoid issues observed here:

mondo-base.obo: MONDO IDs do not match between $xref and $id monarch-initiative/mondo#6873

With these changes, HPOExplorer can now map >90% of MONDO ids listed in the model file to OMIM IDs:

library(HPOExplorer)
>  model <- get_monarch("disease_to_model")
 [100%] Downloaded 883280 bytes...
>   model$db <- stringr::str_split(model$subject,":", simplify = TRUE)[,1]
>   model <- map_mondo(dat = model,
+                      input_col="object",
+                      output_col="OMIM_ID",
+                      to=c("OMIM","Orphanet"))
 [100%] Downloaded 1082741 bytes...
476 / 5,154 (9.24%) OMIM_ID missing.

The only issue is, as far as I can tell MONDO doesn't seem to contain any mappings between MONDO IDs and DECIPHER IDs. DECIPHER IDs only make up a small fraction of the HPO annotations, but would be nice to have a complete mapping nonetheless:

> phenos <- make_phenos_dataframe(add_disease_data = TRUE)
> phenos$disease_db <- stringr::str_split(phenos$disease_id,":", simplify = TRUE)[,1]
>  table(phenos$disease_db)

bschilder · 2023-12-08T23:27:22Z

To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.

@matentzn this is probably a poor attempt to explain this properly, but if there's a paper or docs page you could point me to that would be quite helpful! Thanks!

matentzn · 2023-12-09T15:55:24Z

DECIPHER

We have this for DECIPHER: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo_hasdbxref_decipher.sssom.tsv

Which will do the job for you!

To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.

Its simpler than that.

We generate phenotypic profiles from ontologies, using jaccard similarity usually over the hierarchical relations in the ontology and information content for the reranking
Cool Paper: https://www.osti.gov/biblio/1625303 with background
The current "bestmatches" include a mix of logical and simple lexical matches and are hugely out of date (I would not use them in production, but they are probably "not wrong"

I requested an FBcv profile for you here: monarch-initiative/monarch-semantic-similarity-profiles#16

So you can take a look how it looks like.

bschilder · 2023-12-12T00:03:33Z

DECIPHER

We have this for DECIPHER: https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo_hasdbxref_decipher.sssom.tsv

Which will do the job for you!

Ah, amazing! I had totally missed that bc i was using this file, which I assumed included all the other ones:
https://github.com/monarch-initiative/mondo/blob/master/src/ontology/mappings/mondo.sssom.tsv

I've implemented many of these functions within a new package for accessing/processing knowledge graphs in general (HPOExplorer was getting to bloated):
https://github.com/neurogenomics/KGExplorer/blob/29eccbbd33fd18d9ce85b0ae72b47d485d97faee/R/map_upheno_data_i.R

I was also just alerted to the monarchr package, which may extract much of the info i need more efficiently than I am now (which relies mostly on TSV downloads).

Coordinating R package projects monarch-initiative/monarchr#7

I've also begun exploring some of the graph query resources/tools you alerted to me on our call:

Explore other resources KGExplorer#1

To summarise, the phenotype matching procedure is meant to captured semantic similarity using a semi-heuristic model (a combination of explicit rules and data-driven). Data inputs come from a variety of sources. Ultimately, they linking together concepts (species, diseases, phenotypes, genes, pathways, etc.) in a knowledge graph derived from a mix of NLP queries to the published literature and other database.

Its simpler than that.

We generate phenotypic profiles from ontologies, using jaccard similarity usually over the hierarchical relations in the ontology and information content for the reranking

Cool Paper: https://www.osti.gov/biblio/1625303 with background

The current "bestmatches" include a mix of logical and simple lexical matches and are hugely out of date (I would not use them in production, but they are probably "not wrong"

Ahhh, this makes so much more sense now! Thanks for explaining that in more detail, and for the paper (super interesting work!). Along those lines, I've found the rphenoscape package useful for computing cross-ontology similarity matrices on the go.

I requested an FBcv profile for you here: monarch-initiative/monarch-semantic-similarity-profiles#16

So you can take a look how it looks like.

Thank you so much! I really appreciate this, and all your other help.

bschilder self-assigned this Nov 15, 2023

bschilder added the enhancement New feature or request label Nov 15, 2023

bschilder added this to the Publish rare disease celltyping manuscript milestone Nov 22, 2023

bschilder mentioned this issue Dec 2, 2023

Add HPOExplorer to HPO tools documentation obophenotype/human-phenotype-ontology#10231

Closed

twhetzel mentioned this issue Dec 8, 2023

mondo-base.obo: MONDO IDs do not match between $xref and $id monarch-initiative/mondo#6873

Closed

bschilder closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assess existence of experimental models #33

Assess existence of experimental models #33

bschilder commented Nov 15, 2023

bschilder commented Nov 30, 2023 •

edited

NathanSkene commented Nov 30, 2023

bschilder commented Nov 30, 2023

NathanSkene commented Nov 30, 2023 via email

bschilder commented Dec 1, 2023 •

edited

NathanSkene commented Dec 1, 2023 via email •

edited by bschilder

bschilder commented Dec 1, 2023 •

edited

bschilder commented Dec 1, 2023

bschilder commented Dec 1, 2023

matentzn commented Dec 4, 2023

bschilder commented Dec 4, 2023

bschilder commented Dec 8, 2023 •

edited

bschilder commented Dec 8, 2023

matentzn commented Dec 9, 2023

bschilder commented Dec 12, 2023

Assess existence of experimental models #33

Assess existence of experimental models #33

Comments

bschilder commented Nov 15, 2023

bschilder commented Nov 30, 2023 • edited

NathanSkene commented Nov 30, 2023

bschilder commented Nov 30, 2023

NathanSkene commented Nov 30, 2023 via email

bschilder commented Dec 1, 2023 • edited

NathanSkene commented Dec 1, 2023 via email • edited by bschilder

bschilder commented Dec 1, 2023 • edited

bschilder commented Dec 1, 2023

bschilder commented Dec 1, 2023

matentzn commented Dec 4, 2023

bschilder commented Dec 4, 2023

bschilder commented Dec 8, 2023 • edited

bschilder commented Dec 8, 2023

matentzn commented Dec 9, 2023

bschilder commented Dec 12, 2023

bschilder commented Nov 30, 2023 •

edited

bschilder commented Dec 1, 2023 •

edited

NathanSkene commented Dec 1, 2023 via email •

edited by bschilder

bschilder commented Dec 1, 2023 •

edited

bschilder commented Dec 8, 2023 •

edited