Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create new axiom annotation to indicate an OMIM term is an included entry #5507

Open
nicolevasilevsky opened this issue Oct 14, 2022 · 26 comments
Assignees
Labels
effort-XL Ticket may take more than 2 hours complete help wanted Need more personnel OMIM

Comments

@nicolevasilevsky
Copy link
Member

nicolevasilevsky commented Oct 14, 2022

OMIM entries sometimes include "included" entries, such as https://omim.org/entry/233910 (related to #5433).

On the tech call, we decided to create a new axiom annotation that indicates a new Mondo term is an included entry in OMIM.

I propose:
MONDO:includedEntryInOMIM

Description:
Used for cases where the term is an 'included' entry in an OMIM record but it is not equivalent to the OMIM record. For example, dystonia, dopa-responsive, with or without hyperphenylalaninemia, autosomal recessive, (https://omim.org/entry/233910.)

@nicolevasilevsky
Copy link
Member Author

nicolevasilevsky commented Oct 14, 2022

Once we agree upon this, we should:

@sabrinatoro
Copy link
Collaborator

@nicolevasilevsky : I have several questions. Maybe @cmungall has some thoughts.

Use cases: As a user:

  • use-case-1 = if I look for an OMIM term, I want to find both Mondo terms.
  • use-case-2= if I look for Mondo:2 (=included term in omim), I want to find the original omim id
    Note that if we don't care about use-case-2 (ie users don't need it), then the following is irrelevant.

Issue 1:
What is the plan? Do we need to have 1 Mondo term "equivalent", and the other Mondo term "included"? Or should the terms have both terms at the same "level" of equivalence? For instance:

  • option#1
    Mondo:1 - OMIM:233910 = MONDO:equivalentTo
    Mondo:2 - OMIM:233910 = MONDO:includedEntryInOMIM
  • option#2
    Mondo:1 - OMIM:233910 = MONDO:equivalentSharedOMIM
    Mondo:2 - OMIM:233910 = MONDO:equivalentSharedOMIM
    MONDO:equivalentSharedOMIM expresses that 1 OMIM = 2 Mondo

Issue 2:
How do we get this in the mondo release? At this point (if I understand correctly), we keep only the "MONDO:equivalentTo".
If we want to fulfill use-case-2, we would need to see both mondo terms if I search 1 omim.

Issue 3:
Should we update the omim import to treat all the OMIM "included" terms in this way according to this issue?

Issue 4:
should this be specific to OMIM? There are probably more examples of 1 term in a source representing 2 terms in Mondo. Right now we are making the term from the source "mondo is narrower than source". For consistency, maybe we should do the same with OMIM?

@maglott
Copy link

maglott commented Oct 14, 2022

I vote for option #1. That is more explicit as to which term is primary and which term is included. I want to note that I consider this a case where a source (in this case OMIM) assigns the same identifier non-synonymous concepts, as a method to inform the reader that the 'included' concept is discussed in the record. I do not consider this a case of 1 term in a source representing 2 terms in Mondo, but one identifier in a source representing more than 1 distinct concept in Mondo.

@nicolevasilevsky
Copy link
Member Author

nicolevasilevsky commented Oct 14, 2022

@maglott thanks for your input! 😸

@sabrinatoro comments below

issue 1

I was thinking we'd do option 1
Mondo:1 - OMIM:233910 = MONDO:equivalentTo
Mondo:2 - OMIM:233910 = MONDO:includedEntryInOMIM

issue 2

For the release, I think we'd want to display MONDO:includedEntryInOMIM. I don't know how that is set up, but it is something Nico could do, I assume

issue 3

Good question about the OMIM import. I don't know if there is an easy way for @hrshdhgd @joeflack4 @matentzn to be able to parse out included terms from OMIM records. This has been brought up before.

issue 4

I think this is the lumping and splitting question, yes? Some resources may lump when we split or vice versa. I think for this particular issue though, it is different as @maglott said above. OMIM has included entries that are not intended to be lumped together, but they are similar, therefore they are included in the same entry. (At least that is my understanding).

Related issues:

pending PR that is awaiting the decision on this ticket

@nicolevasilevsky
Copy link
Member Author

Ha, I am looking at this old ticket and we did bring up this same idea before:
#2808 (comment)

It seems like this ticket duplicates the old ticket.

@kanems
Copy link
Collaborator

kanems commented Oct 17, 2022

I don't know how Mondo plans to import and manage the OMIM data long-term, but for all the MIM numbers in MedGen where we have the OMIM included terms split out, they are reported as 'included' in one of MedGen's FTP files: https://ftp.ncbi.nlm.nih.gov/pub/medgen/
MedGenIDMappings.txt.gz
example row:
#CUI|pref_name|source_id|source|
C0001519|Holmes-Adie syndrome|103100|OMIM included|

The MedGen reporting may still have some errors or gaps, and we know we have some MIM Numbers not in scope for Mondo, but if you want a secondary file to check how you parse out the OMIM data, this report may be useful to you.

@nicolevasilevsky
Copy link
Member Author

Thanks, @kanems, this sounds super helpful! I'll discuss it with the team and get back to you.

@nicolevasilevsky
Copy link
Member Author

nicolevasilevsky commented Nov 9, 2022

this is a related issue:

@nicolevasilevsky
Copy link
Member Author

nicolevasilevsky commented Nov 9, 2022

talking to Sabrina, per her conversation with Joe, the only way to know if a term is included in an OMIM entry is to do it manually.

to do:

  • review existing synonyms from OMIM, if they are included terms, they should be split into new classes with xref [OMIM:XXXXXX] MONDO:includedEntryInOMIM
  • @matentzn and Nicole: query for duplicate OMIM xrefs. If there are two OMIM xrefs and one is equiv and one is related, check the OMIM record to see if it is an included entry
  • look at past tickets and try to clean up included entries

It seems nearly impossible to clean this up perfectly without a huge amount of time/effort. We'll aim to clean these up as we come across them or upon user request.

@joeflack4
Copy link
Collaborator

joeflack4 commented Nov 9, 2022

@nicolevasilevsky @sabrinatoro I'm not entirely sure that's correct. I remember discussing this with Sabrina, but I didn't remember our determination so I wanted to double check.

Take this example: https://omim.org/entry/235000

Primary label is HEMIHYPERPLASIA, ISOLATED; IH. It also says:

Other entities represented in this entry: HEMI-3 SYNDROME, INCLUDED

Here's how the row looks in mimTitles.txt:

Prefix	MIM Number	Preferred Title; symbol	Alternative Title(s); symbol(s)	Included Title(s); symbols
Percent	235000	HEMIHYPERPLASIA, ISOLATED; IH	HEMIHYPERPLASIA; HHP;; HEMIHYPERTROPHY, ISOLATED	HEMI-3 SYNDROME, INCLUDED

In the OMIM ingest, we've been stripping away that , INCLUDED text. Dazhi wrote this, so I don't know if this was part of a prior group decision, or something he decided on:

def get_alt_labels(titles):
    labels = []
    for title in titles.split(';;'):
        # remove ', included', if present
        label = re.sub(r',\s*INCLUDED', '', title.strip(), re.IGNORECASE)
        label = cleanup_label(label)
        labels.append(label)
    return labels

If I'm understanding this correctly, should I simply change this to not remove the word , INCLUDED, and then that would be all I would need to do, correct? I could also add more structure to such entries marked 'included' in the OMIM ingest RDF output, but not sure that would be my highest priority.

@nicolevasilevsky
Copy link
Member Author

Interesting - let's talk about this on the QC call and make sure we're all on the same page. Thanks @joeflack4!

@maglott
Copy link

maglott commented Nov 9, 2022

I would like to weigh in against having the label, i.e. the name of a disorder, use the word 'included'. Should it be considered as a note from OMIM that their experts consider that value a distinct entity, and add ', INCLUDED' to make that point. As Megan noted earlier, MedGen treats them as distinct entities, strips ', INCLUDED' , but reports that we know we got the name from OMIM as the name of a disorder that is described under the same MIM number as a different entity.

@nicolevasilevsky
Copy link
Member Author

Hi @maglott we won't have the label include the word included. We'll do just as @kanems does: we'll split out any included entries as new Mondo terms and we'll add a database cross reference that has the source MONDO:includedEntryInOMIM.

Joe's comments above will help us identify which terms are included entries, but we won't label terms in Mondo as 'disease, included.'

nicolevasilevsky added a commit that referenced this issue Nov 9, 2022
@maglott
Copy link

maglott commented Nov 10, 2022

Thanks, @nicolevasilevsky

nicolevasilevsky added a commit that referenced this issue Nov 10, 2022
* split OMIM included entry

address #5507

* update editors guide

* add page about OMIM included entries

* split out 'desmoid tumor caused by somatic mutation'

* add gh

* split out polyneuropathy, inflammatory demyelinating, chronic

* revise OMIM xrefs

* revise xref for OMIM:600669

* add gh

* revise syn

* fix source

* revise syn

* remove duplicate class

created on a different PR
@joeflack4
Copy link
Collaborator

@sabrinatoro Here is a report of everything from mimTitles.txt that has a label with the text , INCLUDED at the end of it:
mimTitlesWithIncluded.tsv.zip

About ~1,300 of ~28,000 (~5%) entries have a label with , INCLUDED at the end of it.

@sabrinatoro @matentzn has asked me to add an rdfs:comment on all entries where this appears. My preference would be embedded JSON, e.g.:
a. rdfs:comment '{"hasIncludedInLabel": true}'
But I'm guessing this would be preferred, so I'm going to go with this instead:
b. rdfs:comment "This term has one or more labels that end with ', INCLUDED'."

@matentzn
Copy link
Member

Yes, b. for now please!

@nicolevasilevsky
Copy link
Member Author

Joe created a spreadsheet here with included entries. Related to monarch-initiative/omim#82

@matentzn
Copy link
Member

matentzn commented Mar 6, 2023

I have actually never read the full issue, I think someone should raise it during a 1:1 call with me, including a plan with open action items and some ideas of how we can use automation magic to update everything at once.

Also, we need to decide how these "included" mappings are shared with the user in terms of skos. Are they "skos:closeMatch" mappings?

@nicolevasilevsky
Copy link
Member Author

@sabrinatoro will add this to a future 1:1 agenda, @matentzn.

@sabrinatoro
Copy link
Collaborator

Here are my issues/questions. Using OMIM:183090 as an example:

  • OMIM:183090 is MONDO:equivalentTo 'spinocerebellar ataxia type 2' (MONDO:0008458)
  • OMIM:183090 is MONDO:includedEntryInOMIM 'amyotrophic lateral sclerosis, susceptibility to, 13' (MONDO:0800224)

What this means is that we are "splitting" the omim entry into "main entry" and "included entry" because we think these are different diseases. BUT

  • users do not see the xref for the MONDO:includedEntryInOMIM term (maybe it is a good thing as it might be confusing?)
  • should this be in sssom? if so, how?

@cmungall
Copy link
Member

cmungall commented Mar 10, 2023 via email

@joeflack4
Copy link
Collaborator

This issue has come up again in relation to the synonym synchronization.

I'm wondering if this issue is complete. It seems like the main goal was to create the MONDO:includedEntryInOMIM axiom, and I observe that this axiom is currently in mondo-edit.obo.

If there are any remaining sub-tasks remaining in this issue, we could identify what those are and keep the issue open, or close it and make a fresh issue(s) for them.

@twhetzel
Copy link
Collaborator

This item is on the agenda for the Curation call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort-XL Ticket may take more than 2 hours complete help wanted Need more personnel OMIM
Projects
Development

No branches or pull requests

8 participants