Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QC to ensure that if we provide evidence for a subset, the mapping must be exact #7689

Merged
merged 13 commits into from
Jul 31, 2024

Conversation

matentzn
Copy link
Member

@matentzn matentzn commented May 7, 2024

This QC check was created as a follow up to #7681

It ensures that, if a subset is declared for a term in ORDO the evidence for it (and ORDO code) must correspond to an exact mapping as well. So:

If

MONDO:123 subset: ordo_disease {source="Orphanet:123"} 

There must also be an exact mapping to Orphanet:123.

@twhetzel
Copy link
Collaborator

@matentzn I have not reviewed this since it the QC failed

@matentzn
Copy link
Member Author

I assigned this to you because the QC needs to be fixed by a curator! It fails because of the test..

@twhetzel
Copy link
Collaborator

twhetzel commented Jun 3, 2024

@matentzn I am not sure I am understanding the query correctly. What I think it is checking is to make sure that for the ordo_disease subset that the Orphanet CURIE that is listed in the source annotation is also used in an xref annotation.

If that is what is happening, then why is this line an error:
Error: http://purl.obolibrary.org/obo/MONDO_0957397,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:652487
When mondo-edit.obo contains:
id: MONDO:0957397
subset: ordo_disease {source="Orphanet:652487"}
xref: Orphanet:652487 {xref="MONDO:equivalentTo"}

@matentzn
Copy link
Member Author

matentzn commented Jun 4, 2024

@matentzn I am not sure I am understanding the query correctly. What I think it is checking is to make sure that for the ordo_disease subset that the Orphanet CURIE that is listed in the source annotation is also used in an xref annotation.

My best guess:

This has already been fixed by some other PR? Else I also dont understand it.

@twhetzel
Copy link
Collaborator

twhetzel commented Jun 4, 2024

The OBO snippet I posted was from mondo-edit.obo in the branch for this PR, qc-ordo-subset-exact-mapping.

@twhetzel
Copy link
Collaborator

twhetzel commented Jun 4, 2024

Here is another error that I think does make sense to report as an error:
Error: http://purl.obolibrary.org/obo/MONDO_0009349,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:268936
id: MONDO:0009349
subset: ordo_disease {source="Orphanet:268936"}
xref: Orphanet:2162 {source="OMIM:236100"}
--> Is the fix to add an xref to Orphanet:268936 based on some source TBD and add source="MONDO:equivalentTo"???

@twhetzel
Copy link
Collaborator

twhetzel commented Jun 4, 2024

After the update of this branch with the latest mondo-edit.obo there are 15 errors from this SPARQL query that need to be re-examined.

@twhetzel
Copy link
Collaborator

twhetzel commented Jun 4, 2024

Here are the remaining 15 errors and the relevant mondo-edit.obo snippet following merging master into this branch earlier today. The general categories are:

Error: http://purl.obolibrary.org/obo/MONDO_0013626,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:247353
id: MONDO:0013626
name: psoriasis 14, pustular
subset: ordo_disease {source="Orphanet:404546", source="Orphanet:163931", source="Orphanet:247353"}
xref: Orphanet:163931 {source="MONDO:equivalentTo"}
xref: Orphanet:404546 {source="OMIM:614204", source="MONDO:equivalentTo"}
--> Is the fix to add an xref or remove source="Orphanet:247353" from the subset?


Error: http://purl.obolibrary.org/obo/MONDO_0014017,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:642675
id: MONDO:0014017
name: intellectual developmental disorder with autism and macrocephaly
subset: orphanet_rare {source="Orphanet:642675"}
xref: Orphanet:106 {source="OMIM:615032"}
xref: Orphanet:642675 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0014498,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:576349
id: MONDO:0014498
xref: Orphanet:47045 {source="DOID:0090065"}
xref: Orphanet:576349 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0016520,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:2345
id: MONDO:0016520
name: obsolete isolated Klippel-Feil syndrome
subset: ordo_disease {source="Orphanet:2345"}
--> No xrefs to Orphanet


Error: http://purl.obolibrary.org/obo/MONDO_0018347,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:397933
id: MONDO:0018347
name: obsolete severe intellectual disability-progressive postnatal microcephaly- midline stereotypic hand movements syndrome
subset: ordo_disease {source="Orphanet:397933"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0018888,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:53691
id: MONDO:0018888
name: obsolete congenital cornea plana
subset: ordo_disease {source="Orphanet:53691"}
--> No xrefs to Orphanet


Error: http://purl.obolibrary.org/obo/MONDO_0019482,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:86903
id: MONDO:0019482
name: obsolete dendritic cell sarcoma not otherwise specified
subset: ordo_disease {source="Orphanet:86903"}
--> No xrefs to Orphanet


Error: http://purl.obolibrary.org/obo/MONDO_0019486,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:86909
id: MONDO:0019486
name: obsolete myoclonic epilepsy of infancy
subset: ordo_disease {source="Orphanet:86909"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0020548,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:99922
id: MONDO:0020548
name: obsolete ocular pemphigoid
subset: ordo_disease {source="Orphanet:99922"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0031219,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:252202
id: MONDO:0031219
name: mismatch repair cancer syndrome
subset: ordo_disease {source="Orphanet:252202"}
xref: Orphanet:252202 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0033479,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:631095
id: MONDO:0033479
name: spinocerebellar ataxia 44
subset: ordo_disease {source="Orphanet:631095"}
xref: Orphanet:631095 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0033947,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528647
id: MONDO:0033947
name: obsolete hereditary angioedema with normal C1Inh
subset: ordo_disease {source="Orphanet:528647"}
--> This only has an xref to GARD


Error: http://purl.obolibrary.org/obo/MONDO_0044067,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:636945
id: MONDO:0044067
name: candidiasis, invasive
subset: ordo_disease {source="Orphanet:636945"}
xref: Orphanet:636945 {xref="MONDO:equivalentTo"}


Error: http://purl.obolibrary.org/obo/MONDO_0060596,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:528084
id: MONDO:0060596
name: neurodevelopmental disorder with dysmorphic facies and distal limb anomalies
subset: ordo_disease {source="Orphanet:528084"}
xref: Orphanet:528084 {source="MONDO:relatedTo"}
--> Change to equivalentTo


Error: http://purl.obolibrary.org/obo/MONDO_0957397,http://purl.obolibrary.org/obo/mondo#ordo_disease,Orphanet:652487
id: MONDO:0957397
name: intellectual developmental disorder, autosomal dominant 72
subset: ordo_disease {source="Orphanet:652487"}
xref: Orphanet:652487 {xref="MONDO:equivalentTo"}

@matentzn
Copy link
Member Author

matentzn commented Jun 5, 2024

Thanks!

I would suggest we continue this after:

  • dev is merged into main in mondo ingest
  • another data release was done in mondo ingest
  • I update the ORDO subsets according to our recent changes

Some of the examples you found sound like real bugs in the query, but I cant pinpoint them right now.

@twhetzel
Copy link
Collaborator

twhetzel commented Jun 5, 2024

That plan sounds good to me!

Copy link
Collaborator

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was another run of the mondo-ingest pipeline Tuesday (11-Jun) and some subset updates to Mondo (albeit it does not look like updates for ordo) so I updated this branch with master again and now the QC checks pass.

However, there are these remaining questions we did not have time to discuss at the Tech call on 5-Jun:

  • Why is this SPARQL query only for “ordo_disease” and not the other two subsets related to Orphanet?
  • Should obsolete terms be in an “ordo_disease” subset and also have an xref to Orphanet? See Sabrina’s comments:
    "If a term is obsolete in Mondo, it doesn't make sense (to me) that it is in a rare disease subset (it would be like saying "this term does not exist anymore, but it is in a subset")." Updating the Orphanet rare disease subset #7681 (comment)
  • Are there any situations where a MONDO term would have an xref to Orphanet, but then that Orphanet ID not be a source for an Orphanet subset? Is this an issue with the SPARQL query?
    See MONDO:0014017 that has an xref to "Orphanet:106 {source="OMIM:615032"}", but Orphanet:106 is not a source for the Orphanet subsets?

See OBO snippet below (from latest master on 12-Jun, 9:40am PT):

id: MONDO:0014017
subset: ordo_disorder {source="Orphanet:642675"}
xref: Orphanet:106 {source="OMIM:615032"}
xref: Orphanet:642675 {xref="MONDO:equivalentTo"}

@matentzn
Copy link
Member Author

matentzn commented Jun 12, 2024

Why is this SPARQL query only for “ordo_disease” and not the other two subsets related to Orphanet?

No particular reason other than that this was an important use case - ideally we add all other subsets to this qc check as well. Maybe just remove the VALUES .. clause? this will test all the subsets and their annotations!

Should obsolete terms be in an “ordo_disease” subset and also have an xref to Orphanet? See Sabrina’s comments:
"If a term is obsolete in Mondo, it doesn't make sense (to me) that it is in a rare disease subset (it would be like saying "this term does not exist anymore, but it is in a subset")." #7681 (comment)

IMO: we should have a really, really good reason for any ORDO class in the ordo_disease subset. Ideally this case should not exist. But in case there is a good one, then yes, it should be xrefed as well. Sabrinas problem should be solved in the way the subsets are constructed (not adding rare subset to obsolete classes).

Are there any situations where a MONDO term would have an xref to Orphanet, but then that Orphanet ID not be a source for an Orphanet subset? Is this an issue with the SPARQL query?

Hmmmmm. Yeah I guess that is possible. For example when there are two Orphanet mappings (proxy merge) and only one of them is in the ordo_disorder subset. Good question!

?entity oboInOwl:hasDbXref ?xref .
VALUES ?mondo_source {
"MONDO:obsoleteEquivalent"
"MONDO:equivalentTo"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, from @sabrinatoro we should leave this as MONDO:equivalentTo until the obsolete Mondo terms that are in an ordo subset (issue 7693) are reviewed.

After those are reviewed, this MONDO:obsoleteEquivalent should be added back into the query (TODO: Add ticket for this work and link as step to #7693).

Copy link
Collaborator

@twhetzel twhetzel Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think the second line should have been "After those are reviewed, this MONDO:obsoleteEquivalent could be removed from the query" since it's only needed for now while those obsolete terms are being reviewed.

@matentzn matentzn assigned twhetzel and unassigned matentzn Jul 29, 2024
@twhetzel
Copy link
Collaborator

Chatted with Sabrina and both "MONDO:obsoleteEquivalent" and "MONDO:equivalentTo" should be in the query. If there are still failures then we need to look at the failures and see what the issues are.

@twhetzel
Copy link
Collaborator

This now fails due to 1 proxy merge:

FAIL Rule ../sparql/qc/mondo/qc-proxy-merges.sparql: 2 violation(s)
entity,property,value
http://purl.obolibrary.org/obo/MONDO_0014269,Orphanet:397593,http://purl.obolibrary.org/obo/MONDO_0018337
http://purl.obolibrary.org/obo/MONDO_0018337,Orphanet:397593,http://purl.obolibrary.org/obo/MONDO_0014269
  • MONDO_0018337 is obsolete and in an "ordo_disorder" subset and has an xref to Orphanet:397593 with source MONDO:obsoleteEquivalent.

  • MONDO_0014269 is not in an "ordo_disorder" subset (but is a Disorder in Orphanet) and has an xref to Orphanet:397593 with source MONDO:equivalentTo. This equivalentTo statement is correct.

What's the best way to handle this?

MONDO_0018337
Screenshot 2024-07-30 at 12 39 09 PM


MONDO_0014269
Screenshot 2024-07-30 at 12 55 24 PM

@twhetzel
Copy link
Collaborator

Removed xref and subset on obsolete term and added subset to correct/active Mondo term.

@twhetzel twhetzel assigned sabrinatoro and unassigned twhetzel Jul 31, 2024
Copy link
Collaborator

@sabrinatoro sabrinatoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good.
I am approving and merging

src/ontology/mondo-edit.obo Show resolved Hide resolved
@sabrinatoro sabrinatoro merged commit f49262c into master Jul 31, 2024
1 check passed
@sabrinatoro sabrinatoro deleted the qc-ordo-subset-exact-mapping branch July 31, 2024 14:49
twhetzel added a commit that referenced this pull request Jul 31, 2024
…g must be exact (#7689)

* Create qc-ordo-subset-exact-mapping.sparql

* Update src/sparql/qc/mondo/qc-ordo-subset-exact-mapping.sparql

* Update src/sparql/qc/mondo/qc-ordo-subset-exact-mapping.sparql

Co-authored-by: Nico Matentzoglu <nicolas.matentzoglu@gmail.com>

* add back MONDO:obsoleteEquivalent

* change annotation to source

* fix proxy merge/qc issue

* add ordo subset to correct term

---------

Co-authored-by: Trish Whetzel <trish@tislab.org>
Co-authored-by: Trish Whetzel <plwhetzel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants