Upgrade of Gene to Disease ingest mappings #709

RichardBruskiewich · 2023-01-23T19:43:27Z

Monarch graph has ingested HPOA (OMIM, Orphanet, MorbidMap, etc.) mappings but these have some subtle issues of precision and completeness, and appear generated from secondary data sources that have challenging semantics. More importantly, the Monarch Initiative (and other related projects) have spawned numerous additional code bases, highly overlapping but also heterogeneous in design to one another, for example:

Monarch OMIM parser
DIpper OMIM parser
Exomiser: OrphanetDiseaseGeneFactory and OmimGeneMap2Reader plus some
Monarch Ingest OMIM parser
HPO Annotation QC
Phenol: Ontology Library for Phenomics and Genomics
(additional overlapping code bases as may be identified along the way...)

Closely related to the G2D mapping task are the underlying disease and phenotype ontology efforts:

This issue has the goal of a compare and contrast (tabular?) review of relevant G2D input data parsing code bases to identify a common normalized (singular) approach for the ingest of Monarch knowledge graph G2D mappings. This would aim to characterize the following for each reviewed code library:

Enumeration and general review of the composition of G2D-related input (knowledge) data files which are parsed by the library
Parsing heuristics ('rules') and algorithms internally encoded by the library
Enumeration and description of library output formats
Review of possible output formats (e.g. TSV?) for the Monarch KG construction pipeline, which could be added to the given library, to allow for optimal and complete capture of gene-to-disease knowledge capture (from OMIM, Orphanet, etc.) within the Monarch knowledge graphs
Review and highlight the relationship of library to MONDO and HPO.

Reviews Archive

https://drive.google.com/drive/folders/1ob6BiPuVcVGyO7kkNfTHjfoxGXAPbc5m

RichardBruskiewich · 2023-01-23T19:47:04Z

@pnrobinson, @cmungall, @putmantime, @kevinschaper @matentzn ... I've 'assigned' you to this issue for the moment, simply to flag the issue for your kind feedback and augmentation.

I am otherwise initiating the review of the Phenol code (Peter, as I have questions about the code base, I'll coordinate with you and Daniel for guidance).

RichardBruskiewich · 2023-01-24T17:52:41Z

One ancient related issue (in the icebox): monarch-initiative/monarch-ingest#251

matentzn · 2023-02-21T15:25:05Z

Closely related to monarch-initiative/omim#80

putmantime · 2023-02-21T15:29:16Z

@RichardBruskiewich
@matentzn offered to give you an overview of the Exomiser/Koza/Mondo situation regarding g2d.
We'd like to have a data call after this review process is complete to come up with and schedule the work for a generalized solution.

RichardBruskiewich · 2023-03-17T16:30:16Z

@matentzn and @putmantime, thank you for the meeting on the 16th March 2023, to discuss this task and formulate a plan for its resolution. Briefly:

Study and document all the ways that OMIM and Orphanet are being processed within various code bases hosted by Monarch, to guide the creation of a more comprehensive Koza ingest for of a more normalized set of Gene-to-Disease (G2D) and Phenotype-to-Disease (P2D) subject-relationship (predicate) - object associations for the Monarch Graph.
Goal: The Monarch team is attempting to capture all the processes for G2D and P2D (specifically, OMIM and Orphanet data) capture across Monarch, to identify how it is currently being done, to clarify provenance of knowledge to allow easier comparative analyses, and create a comprehensive G2D and P2D ingest for Monarch.
To meet this goal, an inventory of existing Monarch-hosted (or used) project 'solutions' that have some component of parsing OMIM and Orphanet information into G2D and P2D subject-predicate-object associations will be reviewed. A tentative list of such 'solutions' is already compiled in the task plan (although more may be added if necessary) with identified "application experts" listed alongside. This list current includes the following Monarch-affiliated applications: Exomizer, Phenol, HPOQC, MONDO OMIM ingest, Dipper and Koza itself.
We will conduct a basic self-study of each 'solution' code base, with the aim of composing a basic architecture and data flow diagram, with brief supporting notes, to serve as a conversation piece with the "application experts" guiding the capture of suitable descriptions of each application with respect to the objective of capturing G2D and P2D associations.
A common interview script of questions is formulated to be posed to each such "application expert" to drive the compilation of software and data characteristics of each application, and includes a request for (sample) 'dumps' of files containing data relating to G2D and P2D associations. An approximately 1 hour interview based on the script will be scheduled and convened with each identified application expert, to correct/refine the aforementioned application architecture and data flow diagram and document additional information relevant to the task goal.
The resolution of this issue will be the documented answers to the aforementioned questions, the data dumps requested, and a first-order comparison of these applications and their data dumps against one another, to guide future Monarch G2D and P2D association Koza ingest design and implementation. These deliverables will be hosted in a secure Monarch private storage bucket for further Monarch team assessment.

sagehrke · 2023-10-26T19:03:56Z

@madanucd this ticket may be of help to your G2D ingest assessment.

RichardBruskiewich · 2024-01-09T19:58:44Z

@sagehrke I'm not that sure what to make of this exercise now after all the discussions some many months ago. We had a "70% solution" but not sure what comes next.

sagehrke · 2024-01-10T18:39:36Z

Perhaps @madanucd and @kevinschaper can connect with you, @RichardBruskiewich, to see what next steps are regarding G2D review and any potential updates to ingest mappings.

RichardBruskiewich · 2024-01-10T19:14:22Z

Given that my Monarch subaward budget is depleted, I can no longer contribute to the resolution of this issue.

sagehrke · 2024-02-01T23:12:25Z

Related to #707

pnrobinson · 2024-05-22T17:52:44Z

phenol and hpoannotQC should be considered the source of truth. This pipeline outputs phenotype.hpoa, which does not have genetic data. Other parts of phenol combine the genetic data and this is used for the HPO website and API. THe latter has been recently reworded by Mike and could provide a more unified view on several ontologies and could be more easily adapted for Monarch (e.g., uberon, Mondo, Maxo browsers).

RichardBruskiewich assigned cmungall, RichardBruskiewich, kevinschaper, pnrobinson and putmantime Jan 23, 2023

RichardBruskiewich assigned matentzn Jan 23, 2023

RichardBruskiewich unassigned cmungall, kevinschaper and pnrobinson Mar 13, 2023

RichardBruskiewich added enhancement New feature or request ingest labels Mar 17, 2023

RichardBruskiewich mentioned this issue Mar 20, 2023

Upgrade to Biolink 3.1.1 monarch-initiative/monarch-ingest#389

Closed

RichardBruskiewich removed their assignment Nov 14, 2023

sagehrke assigned RichardBruskiewich and unassigned putmantime Jan 8, 2024

RichardBruskiewich removed their assignment Jan 24, 2024

sagehrke assigned madanucd Feb 1, 2024

monicacecilia transferred this issue from monarch-initiative/monarch-ingest May 22, 2024

sagehrke mentioned this issue May 22, 2024

Implement a unified ingest pipeline (sup) #710

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade of Gene to Disease ingest mappings #709

Upgrade of Gene to Disease ingest mappings #709

RichardBruskiewich commented Jan 23, 2023 •

edited

Loading

RichardBruskiewich commented Jan 23, 2023 •

edited

Loading

RichardBruskiewich commented Jan 24, 2023

matentzn commented Feb 21, 2023

putmantime commented Feb 21, 2023

RichardBruskiewich commented Mar 17, 2023 •

edited

Loading

sagehrke commented Oct 26, 2023

RichardBruskiewich commented Jan 9, 2024

sagehrke commented Jan 10, 2024

RichardBruskiewich commented Jan 10, 2024 •

edited

Loading

sagehrke commented Feb 1, 2024

pnrobinson commented May 22, 2024

Upgrade of Gene to Disease ingest mappings #709

Upgrade of Gene to Disease ingest mappings #709

Comments

RichardBruskiewich commented Jan 23, 2023 • edited Loading

Reviews Archive

RichardBruskiewich commented Jan 23, 2023 • edited Loading

RichardBruskiewich commented Jan 24, 2023

matentzn commented Feb 21, 2023

putmantime commented Feb 21, 2023

RichardBruskiewich commented Mar 17, 2023 • edited Loading

sagehrke commented Oct 26, 2023

RichardBruskiewich commented Jan 9, 2024

sagehrke commented Jan 10, 2024

RichardBruskiewich commented Jan 10, 2024 • edited Loading

sagehrke commented Feb 1, 2024

pnrobinson commented May 22, 2024

RichardBruskiewich commented Jan 23, 2023 •

edited

Loading

RichardBruskiewich commented Jan 23, 2023 •

edited

Loading

RichardBruskiewich commented Mar 17, 2023 •

edited

Loading

RichardBruskiewich commented Jan 10, 2024 •

edited

Loading