Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a method to indicate the status a taxon name #132

Closed
jhpoelen opened this issue Apr 20, 2015 · 22 comments
Closed

provide a method to indicate the status a taxon name #132

jhpoelen opened this issue Apr 20, 2015 · 22 comments

Comments

@jhpoelen
Copy link
Member

In personal communication, @ahhurlbert suggested to provide a method assign a state of a specific name for a possibly incorrect or outdated taxon name. This state would indicate that someone looked at the name (likely resulting from a no-match against external taxonomies), and report on the status of that name.

For instance, when an invalid or outdated name is used in a data source, we'd like to have a way to indicate that the name is indeed invalid or outdated without necessarily having to submit a name correction (see https://github.com/jhpoelen/eol-globi-data/wiki/Taxonomy-Matching#submitting-name-corrections).

Possible states might include: invalid, recently published or misspelled.

@jhammock
Copy link
Collaborator

@dimus can you suggest a names reconciliation service for this? I expect it's out there, and probably either in or touching Global Names...

@jhpoelen
Copy link
Member Author

@ahhurlbert I was thinking to allow annotation of the state of the name in the data source. For instance, you'd be able to have a fields like "taxon status" next to the fields you already provide to describe the subject/object source/target taxon occurrence. This way, you can use GloBI to exclude or indicate names that have known issues to avoid having to recheck names over and over. What do you think about this?

@dimus
Copy link

dimus commented Apr 25, 2015

It sounds like am interesting use case, can we chat about it on Monday?
On Apr 20, 2015 5:23 PM, "Jen Hammock" notifications@github.com wrote:

@dimus https://github.com/dimus can you suggest a names reconciliation
service for this? I expect it's out there, and probably either in or
touching Global Names...


Reply to this email directly or view it on GitHub
#132 (comment)
.

@ahhurlbert
Copy link

@jhpoelen Sounds possibly feasible but I'm a bit unclear on what the workflow would look like.

  1. Students enter data from an old research paper. Many of the names are obsolete, but they have no idea.
  2. Once in GloBI, names get checked and a list of invalid names is returned.
  3. Then a student goes through that list, finds each occurrence of an offending name in the database, and then assigns it a status. (which sounds potentially cumbersome, especially if certain names occur multiple times)
  4. Depending on the status, some names will cease to be on the flagged list in future checks. (but what happens the next time data are added with one of those invalid names--will we have to flag it again?)

Just trying to picture how this would work and whether it would be as efficient as simply having a separate names database (or relying on a names reconciliation service).

@jhpoelen
Copy link
Member Author

@dimus - thanks for your message please contact me by email to setup a time to talk jhpoelen at xs4all dot nl .

@ahhurlbert - thanks for sharing the use case. At this point, I can image three use cases: transcription mistake, outdated name, not-sure-that-this-is name. How about something like:

transcription mistake

  1. student makes mistake in transcribing a name "Avez"
  2. GloBI cannot find match for "Avez"
  3. avian diet database curator requests / received name report from GloBI
  4. student/ curator review the name "Avez" and checks against data source
  5. the data source mentions "Aves" instead of "Avez"
  6. the occurrences including "Avez" are corrected in the data source (e.g. AvianDietDatabase.txt)

outdated names
Data source contains a name that is no longer used and this outdated name is not available through (meta-) taxonomic services such as ITIS or EOL.

  1. name is transcribed correctly by student
  2. GloBI can't find a match
  3. after review of name list, the student double checks that the name is same as in source
  4. the current name for the taxon is determined and added to a specific taxon correction list (perhaps something like, or actually re-using, the GloBI general taxon correction list). The correction is described as "outdated name" or similar using a controlled vocabulary of naming terms.
  5. student submits the outdated name to a naming authority (e.g. ITIS) and suggests to add the name to the list of previously valid names,

not-sure-what-this-is name
Similar to the outdated name. Only for this unknown taxon name the correction code is something like "undetermined" or "unknown" and no suggestion is provided. Alternatively, a higher order taxon can be provided to provide some information about the taxon (if available). When GloBI provides a name report, the reason of the correction (or non-correction) is provided so that the student / curator can easily exclude "undetermined" or "unknown" names.

In short - fix the transcriptions errors (e.g. typos) in the source and introduce a way to annotate and correct outdated taxon names using a dedicated taxon correction list.

Ideally the avian diet database (or any other data source) should be publishable (e.g. data paper in esa pubs) by itself without having the rely on GloBI. GloBI is just a way to integrate, link and access this rich source of information into a larger body of interaction datasets: software comes and goes, but data is forever.

I'd be willing to discuss more over phone / skype if necessary (or organize a workshop?). In my mind, data peer review and access methods (which what I believe this is) can be super useful but might take some back and forth to figure out the most efficient way to implement them. Curious to hear your thoughts.

@dimus
Copy link

dimus commented Apr 28, 2015

Hi Jorrit, I am available today in the second half of the day, and about
any time tomorrow. Google Hangout or Skype are good for me -- my skype is
dimus62

Cheers

Dima

@dimus
Copy link

dimus commented Apr 28, 2015

Oups sorry, I forgot I do have a meeting at the second half of today -- and
tomorrow is still free for me.

Dima

@jhpoelen
Copy link
Member Author

@dimus - I'll try and contact you tomorrow Wed 29 April at 11:00a eastern. Please let me know if you'd like to chat at another time.

@ahhurlbert - please let me know if you'd like to join.

@ahhurlbert
Copy link

@jhpoelen Sorry can't make it. Maybe we can skype in a week or so?

@dimus
Copy link

dimus commented Apr 29, 2015

11:00 is good with me

On Tue, Apr 28, 2015 at 10:09 PM, ahhurlbert notifications@github.com
wrote:

@jhpoelen https://github.com/jhpoelen Sorry can't make it. Maybe we can
skype in a week or so?


Reply to this email directly or view it on GitHub
#132 (comment)
.

@jhpoelen
Copy link
Member Author

Created GlobalNamesArchitecture/gni#38 after todays discussion with @dimus . Hopefully, GloBI data providers can use globalnames to help detect (and potentially correct) names in a way that others can also benefit from.

@jhpoelen
Copy link
Member Author

jhpoelen commented May 4, 2015

Here's a list of taxon name descriptions I stumbled across:
https://en.wikipedia.org/wiki/Glossary_of_scientific_naming#Latin_descriptions_of_names_or_taxa

I imagine allowing the data sources to annotate names with their known status with terms from this list.

@ahhurlbert am able to do skype this week . . . let me know a good time for you.

@ahhurlbert
Copy link

How about 12 pm EST?

@jhpoelen
Copy link
Member Author

jhpoelen commented May 5, 2015

Sounds good. Talk to you tomorrow (Tue) at 12p EST.

@ahhurlbert
Copy link

I've added a Name_Status field to reflect the current taxonomic status of the prey name. I'm using 'verified' to indicate a presumably valid name that did not flag in GloBI, and 'unknown' for names that were flagged as invalid. I've gone through and fixed ~10 typos.

Also, I'm surprised that 'Bombidae' did not match any outdated taxonomies. It is an old family name for bumblebees which have since been incorporated into 'Apidae' within subtribe 'Bombini'. The only extant genus of this subtribe is 'Bombus', so I went ahead and changed all diet database entries with Prey_Family == 'Bombidae' to Prey_Family 'Apidae' and Prey_Genus 'Bombus'.

@jhpoelen
Copy link
Member Author

jhpoelen commented May 5, 2015

@ahhurlbert Nice! Question - is the Name_Status associated with the predator or the prey name?

@jhpoelen
Copy link
Member Author

jhpoelen commented May 5, 2015

After our discussion, I figured that adding two columns like: Name_Status and Prey_Name_Status, would probably make it clear which name the status related to.

@ahhurlbert
Copy link

The predator (bird) names are being checked as part of our workflow (and the taxonomic authority they are based on is listed in the Taxonomy field), so there should be no invalid names. That is, if I come across an old paper that uses an outdated bird name, the first thing I do is figure out what the currently accepted name is (using Avibase.org) and that's what is put in the table. Certainly there is the possibility for typos, but as those will be corrected as soon as they are identified I don't think there's a need to add a separate field for this status.

I've changed Name_Status to Prey_Name_Status to clarify which entity this describes.

@dimus
Copy link

dimus commented May 6, 2015

To keep you up to date -- Wencan, our GSoC Student -- started working on algorithm for GN to fiture out status out of existing data/metadata.

@jhpoelen
Copy link
Member Author

jhpoelen commented May 6, 2015

@dimus thanks for sharing!

jhpoelen pushed a commit that referenced this issue May 8, 2015
@jhpoelen
Copy link
Member Author

@ahhurlbert Hey Allen - I've prepared a new version of the taxon name report for you using the Prey_Name_Status data that you provide: you can find the current list of unmatched or suspicious name order by state by following http://tinyurl.com/hurlbertTaxonNameReportV4 and clicking on the looking glass (i.e. execute) button.

I've attached the result that came out. Note that the name status is treated as a controlled vocabulary term. In this case it would be the "Hurlbert Name Status" vocabulary. I suspect that we'll figure out a mapping to other name status vocabs at some point.

Let me know if this name report will help you manage your names better. If so, let me know how you'd like to receive / manage reports like these (download adhoc csv?, dedicated github repo with automatically updated name reports by GloBI data source).

screen shot 2015-05-27 at 12 58 38 pm

@jhpoelen
Copy link
Member Author

jhpoelen commented Jul 6, 2015

GloBI now has a way to capture a taxonomic name status field.

@ahhurlbert please reopen issue if you feel the feature needs some more work.

@jhpoelen jhpoelen closed this as completed Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants