Generally improve evidence/provenance capture and reporting #787

mbrush · 2019-07-12T01:12:29Z

At the May 2019 F2F in Corvallis, and on recent Monarch Data calls, we discussed the need to provide better provenance and evidence for associations in the Monarch app. Internally this is useful for us to understand, QC, and document our own data. And of course for external users it is critical for them to trust and apply the data we provide.

At present unfolding the Support column for a Monarch association shows an unorganized list of sources, evidence codes, and publications (see figure below). One key requirement were to be clear about what reported source(s) actually made the assertion captured in an association, as opposed to what sources provided supporting data used in inferring an association. This includes indicating when the association is something inferred by Monarch by joining/reasoning over data from one or more sources. When an association is inferred, we should provide access to the inference/reasoning path - which is currently captured in 'evidence Graphs' served by the Biolink API, but need to be rendered in a more human readable way (graph-viz was proposed).

Another requirement is to organize the evidence and publications according to the sources that used them - in particular when an association is asserted by >1 source. This lets us understand what sources used what publications and what type of evidence these publications provided. At the May F2F it was suggested that we should organize evidence and publications according to the reported sources that used them. And perhaps distinguishing 'asserting sources' form a 'supporting sources' - rather than simply calling all of them 'sources'.

Hoping others can add examples they have come across examples of evidence/provenance shortcomings. A simple example I can provide is the association reported here between the ALG9 gene and Multicystic kidney dysplasia phenotype. The 'Support' drop down lists one ECO code, one publication, and two sources (HPOA and CLINVAR).

We wouldn't know it from looking at the metadata here, but neither of these sources directly asserts the reported association. Rather, it is inferred by Monarch by joining data along the path from gene -> variant -> disease -> phenotype (as specified in the cypher query here). Furthermore, it is not clear how the publication and evidence code are used by/related to the indicated sources (who used what and how).

monicacecilia · 2019-07-12T16:23:04Z

Chatting with @kshefchek & @cmungall

To organize provenance data, turn BBOP/OBO format into a table?

We want people to be able to see the chain of inference.

kshefchek · 2019-10-24T23:31:43Z

Can we merge with this with monarch-initiative/monarch-ui#28? I will have a prototype up by early next week

monicacecilia assigned mbrush and kshefchek Jul 12, 2019

kshefchek closed this as completed Oct 24, 2019

monicacecilia mentioned this issue Oct 24, 2019

Implement simple table view of evidence monarch-initiative/monarch-ui#28

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generally improve evidence/provenance capture and reporting #787

Generally improve evidence/provenance capture and reporting #787

mbrush commented Jul 12, 2019

monicacecilia commented Jul 12, 2019

kshefchek commented Oct 24, 2019

Generally improve evidence/provenance capture and reporting #787

Generally improve evidence/provenance capture and reporting #787

Comments

mbrush commented Jul 12, 2019

monicacecilia commented Jul 12, 2019

kshefchek commented Oct 24, 2019