Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generally improve evidence/provenance capture and reporting #787

Closed
mbrush opened this issue Jul 12, 2019 · 2 comments
Closed

Generally improve evidence/provenance capture and reporting #787

mbrush opened this issue Jul 12, 2019 · 2 comments
Assignees

Comments

@mbrush
Copy link
Member

mbrush commented Jul 12, 2019

At the May 2019 F2F in Corvallis, and on recent Monarch Data calls, we discussed the need to provide better provenance and evidence for associations in the Monarch app. Internally this is useful for us to understand, QC, and document our own data. And of course for external users it is critical for them to trust and apply the data we provide.

At present unfolding the Support column for a Monarch association shows an unorganized list of sources, evidence codes, and publications (see figure below). One key requirement were to be clear about what reported source(s) actually made the assertion captured in an association, as opposed to what sources provided supporting data used in inferring an association. This includes indicating when the association is something inferred by Monarch by joining/reasoning over data from one or more sources. When an association is inferred, we should provide access to the inference/reasoning path - which is currently captured in 'evidence Graphs' served by the Biolink API, but need to be rendered in a more human readable way (graph-viz was proposed).

Another requirement is to organize the evidence and publications according to the sources that used them - in particular when an association is asserted by >1 source. This lets us understand what sources used what publications and what type of evidence these publications provided. At the May F2F it was suggested that we should organize evidence and publications according to the reported sources that used them. And perhaps distinguishing 'asserting sources' form a 'supporting sources' - rather than simply calling all of them 'sources'.

Hoping others can add examples they have come across examples of evidence/provenance shortcomings. A simple example I can provide is the association reported here between the ALG9 gene and Multicystic kidney dysplasia phenotype. The 'Support' drop down lists one ECO code, one publication, and two sources (HPOA and CLINVAR).

ALG9-MKD

We wouldn't know it from looking at the metadata here, but neither of these sources directly asserts the reported association. Rather, it is inferred by Monarch by joining data along the path from gene -> variant -> disease -> phenotype (as specified in the cypher query here). Furthermore, it is not clear how the publication and evidence code are used by/related to the indicated sources (who used what and how).

@monicacecilia
Copy link
Member

Chatting with @kshefchek & @cmungall

To organize provenance data, turn BBOP/OBO format into a table?

We want people to be able to see the chain of inference.

@kshefchek
Copy link
Contributor

Can we merge with this with monarch-initiative/monarch-ui#28? I will have a prototype up by early next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants