New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide a method to indicate the status a taxon name #132
Comments
@dimus can you suggest a names reconciliation service for this? I expect it's out there, and probably either in or touching Global Names... |
@ahhurlbert I was thinking to allow annotation of the state of the name in the data source. For instance, you'd be able to have a fields like "taxon status" next to the fields you already provide to describe the subject/object source/target taxon occurrence. This way, you can use GloBI to exclude or indicate names that have known issues to avoid having to recheck names over and over. What do you think about this? |
It sounds like am interesting use case, can we chat about it on Monday?
|
@jhpoelen Sounds possibly feasible but I'm a bit unclear on what the workflow would look like.
Just trying to picture how this would work and whether it would be as efficient as simply having a separate names database (or relying on a names reconciliation service). |
@dimus - thanks for your message please contact me by email to setup a time to talk jhpoelen at xs4all dot nl . @ahhurlbert - thanks for sharing the use case. At this point, I can image three use cases: transcription mistake, outdated name, not-sure-that-this-is name. How about something like: transcription mistake
outdated names
not-sure-what-this-is name In short - fix the transcriptions errors (e.g. typos) in the source and introduce a way to annotate and correct outdated taxon names using a dedicated taxon correction list. Ideally the avian diet database (or any other data source) should be publishable (e.g. data paper in esa pubs) by itself without having the rely on GloBI. GloBI is just a way to integrate, link and access this rich source of information into a larger body of interaction datasets: software comes and goes, but data is forever. I'd be willing to discuss more over phone / skype if necessary (or organize a workshop?). In my mind, data peer review and access methods (which what I believe this is) can be super useful but might take some back and forth to figure out the most efficient way to implement them. Curious to hear your thoughts. |
Hi Jorrit, I am available today in the second half of the day, and about Cheers Dima |
Oups sorry, I forgot I do have a meeting at the second half of today -- and Dima |
@dimus - I'll try and contact you tomorrow Wed 29 April at 11:00a eastern. Please let me know if you'd like to chat at another time. @ahhurlbert - please let me know if you'd like to join. |
@jhpoelen Sorry can't make it. Maybe we can skype in a week or so? |
11:00 is good with me On Tue, Apr 28, 2015 at 10:09 PM, ahhurlbert notifications@github.com
|
Created GlobalNamesArchitecture/gni#38 after todays discussion with @dimus . Hopefully, GloBI data providers can use globalnames to help detect (and potentially correct) names in a way that others can also benefit from. |
Here's a list of taxon name descriptions I stumbled across: I imagine allowing the data sources to annotate names with their known status with terms from this list. @ahhurlbert am able to do skype this week . . . let me know a good time for you. |
How about 12 pm EST? |
Sounds good. Talk to you tomorrow (Tue) at 12p EST. |
I've added a Name_Status field to reflect the current taxonomic status of the prey name. I'm using 'verified' to indicate a presumably valid name that did not flag in GloBI, and 'unknown' for names that were flagged as invalid. I've gone through and fixed ~10 typos. Also, I'm surprised that 'Bombidae' did not match any outdated taxonomies. It is an old family name for bumblebees which have since been incorporated into 'Apidae' within subtribe 'Bombini'. The only extant genus of this subtribe is 'Bombus', so I went ahead and changed all diet database entries with Prey_Family == 'Bombidae' to Prey_Family 'Apidae' and Prey_Genus 'Bombus'. |
@ahhurlbert Nice! Question - is the |
After our discussion, I figured that adding two columns like: |
The predator (bird) names are being checked as part of our workflow (and the taxonomic authority they are based on is listed in the Taxonomy field), so there should be no invalid names. That is, if I come across an old paper that uses an outdated bird name, the first thing I do is figure out what the currently accepted name is (using Avibase.org) and that's what is put in the table. Certainly there is the possibility for typos, but as those will be corrected as soon as they are identified I don't think there's a need to add a separate field for this status. I've changed Name_Status to Prey_Name_Status to clarify which entity this describes. |
To keep you up to date -- Wencan, our GSoC Student -- started working on algorithm for GN to fiture out status out of existing data/metadata. |
@dimus thanks for sharing! |
@ahhurlbert Hey Allen - I've prepared a new version of the taxon name report for you using the Prey_Name_Status data that you provide: you can find the current list of unmatched or suspicious name order by state by following http://tinyurl.com/hurlbertTaxonNameReportV4 and clicking on the looking glass (i.e. execute) button. I've attached the result that came out. Note that the name status is treated as a controlled vocabulary term. In this case it would be the "Hurlbert Name Status" vocabulary. I suspect that we'll figure out a mapping to other name status vocabs at some point. Let me know if this name report will help you manage your names better. If so, let me know how you'd like to receive / manage reports like these (download adhoc csv?, dedicated github repo with automatically updated name reports by GloBI data source). |
GloBI now has a way to capture a taxonomic name status field. @ahhurlbert please reopen issue if you feel the feature needs some more work. |
In personal communication, @ahhurlbert suggested to provide a method assign a state of a specific name for a possibly incorrect or outdated taxon name. This state would indicate that someone looked at the name (likely resulting from a no-match against external taxonomies), and report on the status of that name.
For instance, when an invalid or outdated name is used in a data source, we'd like to have a way to indicate that the name is indeed invalid or outdated without necessarily having to submit a name correction (see https://github.com/jhpoelen/eol-globi-data/wiki/Taxonomy-Matching#submitting-name-corrections).
Possible states might include:
invalid
,recently published
ormisspelled
.The text was updated successfully, but these errors were encountered: