-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some MeSH xrefs point to tree locations rather than unique IDs #698
Comments
Strongly agree that xrefs to MESH terms should use unique ID rather than tree node ID. GO has switched to use these. Uberon should too. For CL, please could you file a ticket on the CL tracker. |
@dosumis, I filed a ticket using the CL tracker. |
+1 thanks @dhimmel, we'll implement this ASAP. |
It's worth noting that many tree numbers don't match current MeSH unique ids. Presumably this is due to ontology structure modifications, since the initial xref was added. |
279 still to do. I can do these fairly easily... if someone has a sane obo or owl version of mesh it would help (the original with the tree numbers came via an odd route. the RH mesh on bioportal is a bit unusual). |
@cmungall, I don't have an obo, but have a Also, I am willing to do some mappings if you tell me where/how to make the edits. I would like to proceed with my research using the updated UBERON as soon as possible. As for the analogous issue in CL, there have been no responses to my post. Is CL still maintained? |
I may be interested in looking into networkx for another project. Do you know if there are bindings to Neo4J (cc @nlwashington)? There is nothing stopping me from using any of the formats you provide... but if you wanted to make an obo and/or rdf export that would speed it up and might prove generically useful for a bunch of people. I responded on the CL list just to prove it's alive, cc @nicolevasilevsky to bring it up next call |
I added this to our agenda: https://docs.google.com/document/d/18sjGxfODgaK0MqeDBb4j3jqtX2g4yzy98pq5mpsZqPk/edit |
I'm currently working with the EFO ontology v3.22.0, and looks like many of these xrefs to MeSH tree locations rather than identifiers are still present. They only exist for 1 CL term and 121 UBERON terms, so I am guessing these are still this way in UBERON and are the tree numbers are getting imported from UBERON into EFO. Expand for table of xrefs to mesh tree locations
Would be awesome to get these all converted to IDs. Will update if I find an automated way to do this, but the challenge is that they are MeSH-release specific AFAICT, and have therefore grown more and more stale. |
Wow, I just bumped into this issue today and found that @dhimmel noticed it 7 years ago. The issue of MeSH tree numbers - many of which do not point to currently existing tree numbers - being provided as xrefs still exists. These xrefs are not formally differentiated from xrefs that point to MeSH term IDs, i.e., UBERON contains a mixture of xrefs of the form If I were to implement a semi-automated way of fixing these xrefs to point to current, existing MeSH term IDs, which version controlled file should I ideally contribute them to? |
@cmungall @shawntanzk @matentzn |
Hi @paolaroncaglia - thanks for the background. @bgyori PR looks good. Have approved but asked Ray to eyeball it too. Presumably Chris' ancient PR will be much harder to merge. |
wow 2015 lol, ill add it to the tech board, but @bgyori has fixed a huge part of it already (thanks heaps!). I see that there are still unmapped terms that gilda couldn't do, I guess that @rays22 will look into those too? Anyway Just a heads up, merging that PR will automatically close this ticket, so either do all the fixes there or make sure to reopen after merging :) |
btws @paolaroncaglia we are using this now: https://github.com/orgs/obophenotype/projects/11/views/1 |
Exactly, requires a decision on priority level etc that I'd rather leave to you :-) Thanks for confirming though! |
@paolaroncaglia oh right, will discuss if we need a triage section :) but I think you can also just put it whatever you think is right, we will move it when we come across it if we want to prioritize/deprioritize it anyway :) |
I am looking into those. |
There are still 75 unmapped terms that gilda could not match.
|
I did some manual mapping of the remaining 75 MeSH to UBERON terms in the table below. I am going to add the 66
|
Awesome!
Should we just delete these 9 remaining tree number xrefs since at this point they're no better than a missing MESH xref and break tooling that assumes MESH xrefs will be actual MESH IDs? |
Yes, I agree that deleting them would be the best way forward. |
Replace MeSH tree numbers with manually mapped unique IDs for the batch of 66 remaining terms. If applied, this commit will fix #698.
Delete MeSH tree number xrefs that could not be mapped to unique IDs. If applied, this commit will fix #698.
* Replace MeSH tree numbers with unique IDs Replace MeSH tree numbers with manually mapped unique IDs for the batch of 66 remaining terms. If applied, this commit will fix #698. * Delete MeSH tree number xrefs Delete MeSH tree number xrefs that could not be mapped to unique IDs. If applied, this commit will fix #698. * Delete secondary MeSH tree number xrefs
Thanks everyone who helped on this issue over the years! Great to see all the tree numbers gone and @rays22's manual mapping efforts. This is a different issue and not sure what types of automated checks UBERON has, but it might be good to ensure future MESH xrefs are properly formatted according to a regex (and are not tree numbers). This could be done for all xref sources using bioregistry actually. |
@dhimmel - will create a ticket to see if we can implement some automated checks for that :) thanks! |
I have two comments regarding MeSH cross-references:
xref: MESH:A01.456.505.733
rather thanxref: MESH:D009666
. A tree number represents a path up the MeSH hierarchy from term to top-level category. Therefore, MeSH terms can have multiple tree numbers, and tree numbers are subject to change whenever the hierarchy is reorganized, even if the term remains. Is there a reason tree numbers are used instead of unique ids? If you would like to switch, you may find my mapping helpful (notebook, tsv file).def: "A male germ cell that develops from the haploid secondary spermatocytes. Without further division, spermatids undergo structural changes and give rise to spermatozoa." [MESH:A05.360.490.890.860]
), it appears some of the hard work has already been done. Why are definition sources not included as xrefs and is this the right location to submit CL feature requests?The text was updated successfully, but these errors were encountered: