Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

occ_data/occ_search: maybe not use species field for main name field #329

Closed
sckott opened this issue Oct 23, 2018 · 6 comments
Closed

occ_data/occ_search: maybe not use species field for main name field #329

sckott opened this issue Oct 23, 2018 · 6 comments
Milestone

Comments

@sckott
Copy link
Contributor

sckott commented Oct 23, 2018

right now we present in the first column in results from occ_data and occ_search the species name - the value from the field species in the occurrence route. For all taxa at species rank and above this is fine.

  • for those at species rank, species name of course makes sense
  • for those above species rank, these are occurrences, so there has to be a species name, unless there's probably some cases in which there's no species identification

Here is where we select species column https://github.com/ropensci/rgbif/blob/master/R/zzz.r#L106-L107 and move it to the front

For ranks below species, e.g., http://api.gbif.org/v1/occurrence/search?taxonKey=6163845 for Ursus arctos horribilis, the species value of Ursus arctos is accurate for it's species rank level name, but the user probably expects to see a subspecific name instead of the species rank name.

I don't think we want to go with scientificName or acceptedScientificName as they are complete names with authorities and years, or do we? Does anyone want this name format? Do we stick with what we have now except add in:

  • for each occurrence that we detect the rank is below the species level, give back the appropriate name matching that rank by constructing from the parts, e.g., the entry for species + the entry for infraspecificEpithet

any feedback appreciated @kgturner @damianooldoni @MattBlissett @jwhalennds @poldham

@sckott sckott added this to the v1.2 milestone Oct 23, 2018
@peterdesmet
Copy link
Member

In result sets, I prefer to have as first column some kind of ID. For occurrences I would expect “occurrenceID” to be the first column.

Constructing names from parts might lead to names that are not wellformed (e.g. plant names have “var.” in their name)... that is, if we care. I would expect scientificName.

My 2 cents

@kgturner
Copy link

kgturner commented Oct 24, 2018 via email

@sckott
Copy link
Contributor Author

sckott commented Oct 24, 2018

thanks, good idea to make first column the occurrence id. as it should be unique for each row.

constructing a name from parts does open up the possibility for occassional errors and badly formed names as you said Peter.

the 2nd column as scientificName makes sense to me

@damianooldoni
Copy link
Collaborator

damianooldoni commented Oct 25, 2018

I have to admit I work most of the time with keys and having a unique key as first column makes sense to me. I like scientific names way more than canonical or composed names (names from parts as you call them) because they are as much unique as possible. Name from parts are less informative and can induce misinterpretations of the returned data.

@sckott
Copy link
Contributor Author

sckott commented Oct 25, 2018

thanks @damianooldoni for your feedback.

sckott added a commit that referenced this issue Feb 22, 2019
…olumns in the dataframe

fixed test accordingly for output changes
cleaned up tests for each and put assertinos outside of vcr calls
bump dev version
@sckott
Copy link
Contributor Author

sckott commented Feb 22, 2019

okay, pushed changes. considering this done. made the following changes:

occ_data and occ_search both now have:

  • first column key: the occurrence key
  • second column scientificName: the scientific name associated with the occurrence
  • the name column that previously was the first column is a dup of scientificName to retain a column of that name to not break any downstream; the previous name field was the species field renamed, so this name field has changed content - BUT i think in subsequent versions good idea perhaps to drop name altogether

@sckott sckott closed this as completed Feb 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants