Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include prokaryotic names (e.g., from LPSN) as a source in the verifier #109

Closed
cpauvert opened this issue Nov 28, 2023 · 7 comments
Closed

Comments

@cpauvert
Copy link

Hello,
I just discovered gnverfier and the Global Names Architecture, thanks for these initiatives. The fact that it could work with OpenRefine is also a very nice development.

In my research institute, we are interested in microorganisms especially bacteria (:microbe:), where I also thought of having a verifier system like gnverifier.

Are additional data sources considered for the Global Names Architecture?

If yes, I would suggest to include the List of Prokaryotic names with Standing in Nomenclature, for which there is an API and daily dumps (with registration though).

Best,

@dimus
Copy link
Member

dimus commented Nov 30, 2023

Hello @cpauvert, thank you for nice words about gnverifier. Yes, sure, I can add the List of Prokaryotic Names to data sources. I am already registered, so it should not be a problem to download it. I will add the list this or next week.

@cpauvert
Copy link
Author

Awesome, thanks @dimus for your quick reply. Do not hesitate to ping once you do, so that I can try it out and point my microbiologists colleague to your resource as well!

@dimus
Copy link
Member

dimus commented Dec 7, 2023

@cpauvert I added LPSN, if you find any problems, please reopen the ticket and let me know

@dimus dimus closed this as completed Dec 7, 2023
@cpauvert
Copy link
Author

cpauvert commented Dec 7, 2023

Hi @dimus
Thanks for the quick implementation!

I tried out with Bacteroides vulgatus expecting to be properly corrected to Phocaeicola vulgatus as per the LPSN indication, but I was not...

https://verifier.globalnames.org/?capitalize=on&format=tsv&names=Bacteroides+vulgatus

Until I realized that the match to the LPSN did appear if I ticked the "Show all matches"

https://verifier.globalnames.org/?all_matches=on&capitalize=on&format=tsv&names=Bacteroides+vulgatus

It's just that the LPSN score was slightly lower 9.41496 vs 9.41391. Any reason for a difference in score when the match were essentially the same?

BEst,

PS: can't wait to try this out with openrefine!!

@dimus
Copy link
Member

dimus commented Dec 7, 2023

@cpauvert sure, you welcime!

You can also try to limit searches:

image

It would add a filter ds=208 (to only show data from this data-source) to the query:

https://verifier.globalnames.org/?capitalize=on&ds=208&format=html&names=Bacteroides+vulgatus+

The same can be done for OpenRefine https://github.com/gnames/gnverifier/wiki/OpenRefine-readme#filters-to-remove-false-positive-matches

@craynaud007
Copy link

Hello @dimus,

I work with @cpauvert and I just tested gnverfier with OpenRefine, thanks for this initiative.
While using it I encounter some issues. First, I tried to select the LPSN data source as explain in “Reconciling taxonomic names in OpenRefine via Global Names” and it work but partially. In fact, some species that were not in the LPSN, had a match according to another data source even if I specified to use only LPSN. So, the question is how can I restrict to one data source on OpenRefine?

The second problem I encountered was due to Current Name. As an example, I reconciled Bacteroides vulgatus and it gave me Bacteroides vulgatus Eggerth and Gagnon 1933. And if I look deeper in it, it is said that the current Name is Phocaeicola vulgatus (cf. screenshot). So, the question is how can I say I only want the current name and not the older one?

image

Best Regards

@cpauvert
Copy link
Author

Hi,

@cpauvert I added LPSN, if you find any problems, please reopen the ticket and let me know

I cannot reopen the issue as I'm not a collaborator, let us know if you'd rather have a separate issue to discuss.

I can expand @craynaud007 comments here (and tagging @magelm here as well), it seems the LPSN API actually does not indicate the current valid name directly. For instance, see the API output with Bacteroides vulgatus (https://api.lpsn.dsmz.de/fetch/773979) where only Bacteroides vulgatus is mentioned, but it does indicate the record with the current name:

lpsn_correct_name_id:	7841

This record then actually points (as expected) to Phocaeicola vulgatus (https://api.lpsn.dsmz.de/fetch/7841)

@dimus are you using the API or the data dumps?
BEst,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants