Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create synchronization pipeline for disease-gene in Mondo #7229

Open
sabrinatoro opened this issue Feb 2, 2024 · 10 comments
Open

Create synchronization pipeline for disease-gene in Mondo #7229

sabrinatoro opened this issue Feb 2, 2024 · 10 comments
Assignees

Comments

@sabrinatoro
Copy link
Collaborator

  • gene-disease annotations should be only for terms defined by the gene causing the disease (e.g ~type1; ...)
  • these annotations should come from OMIM only
  • the annotation should be brought in ONLY when there is a 1-1 relation between omim and gene.
@matentzn
Copy link
Member

matentzn commented Feb 3, 2024

This PR monarch-initiative/omim#107

Will add a new release artefact to omim ingest which contains all the MONDO->HGNC gene associations via

MONDO:Disease-exactMatch->OMIM:Disease--['has basis in germline mutation of']-->OMIM:Gene-->HGNC:Gene.

@twhetzel should maybe spend some time reviewing my choice of only including "evidence code 3" from morbidmap (I dont know exactly what that means, ask @joeflack4, but the evidence string is:

Evidence: (3) The molecular basis for the disorder is known; a mutation has been found in the gene." 

To review cases like this @twhetzel and I are deploying omim.owl from the Mondo ingest on Monarch OLS. This way we can see a bit better what is going on.

Next steps:

  • Review and merge Refactor OMIM gene handling omim#107
  • Make omim release
  • Curators review table mondo_genes.csv which will be, after the release, attached to the release
  • Curators decide if its good or not. It is important to ask the question: OMIM says gene-to-disease. Are we safe saying disease-to-gene? And also, is code 3 the correct and only evidence code from morbidmap we should be importing (ca 3500 genes).
  • Tech team creates automated pipeline to update OMIM disease-to-gene associations

@matentzn
Copy link
Member

matentzn commented Feb 3, 2024

BTW, we deployed the Mondo version of OMIM now here: https://ols.monarchinitiative.org/ontologies/omim/terms?iri=https%3A%2F%2Fomim.org%2Fentry%2F100100

@joeflack4
Copy link
Collaborator

@matentzn @twhetzel I don't know why Nico only included "evidence code 3", and I can't think of anything else I might know other than what comes from the comments section in morbidmap.txt provided by OMIM:

1 - The disorder is placed on the map based on its association with a gene, but the underlying defect is not known.
2 - The disorder has been placed on the map by linkage or other statistical method; no mutation has been found.
3 - The molecular basis for the disorder is known; a mutation has been found in the gene.
4 - A contiguous gene deletion or duplication syndrome, multiple genes are deleted or duplicated causing the phenotype.

@matentzn
Copy link
Member

matentzn commented Feb 3, 2024

It seemed to me that only case 3 fulfills @sabrinatoro conditions above (definition of this ticket). Maybe I am wrong.

@joeflack4
Copy link
Collaborator

@matentzn Ah OK. I should've read the full ticket. Hmm, yes, I think only (3) meets all of @sabrinatoro's requirements.

@twhetzel
Copy link
Collaborator

3 - The molecular basis for the disorder is known; a mutation has been found in the gene.
This one seems fine, although I am not sure if it guarantees there is a 1-1 relation between omim and gene.

Evidence codes for 1 and 2 may be relevant, but would need expert input on that. Agree that 4 is not relevant.

@matentzn where is it saying "disease-to-gene"? I saw monarch-initiative/omim#107 and flipped mappings, but not sure if there is a file with that to look at. It does sound a bit odd, but I can see some arguments for doing it that way based on some of the existing RO relations.

@joeflack4
Copy link
Collaborator

joeflack4 commented Feb 27, 2024

@twhetzel You may find it useful to glance at this. When you sign up for OMIM data downloads, this is one of the main files (mim2gene.txt). There is a "MIM Entry Type", and I think the ones we're interested should be "phenotype" and maybe "predominantly phenotypes" (maybe there's more). "Phenotype" being sometimes used interchangeably with "disease", especially in the OMIM (and OMIA, I assume) context.

mim2gene.tsv.zip (FYI its an old copy)

@twhetzel
Copy link
Collaborator

@matentzn do you need more information from anyone for this ticket?

@matentzn
Copy link
Member

Next step is: Curator review of

https://github.com/monarch-initiative/omim/releases/download/2024-03-24/mondo_genes.csv

I personally do not know exactly how to review this, but @sabrinatoro may be able to help. I would stick this in Google docs, then start looking at a few examples and taking notes.

@joeflack4
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants