`occ_data`/`occ_search`: maybe not use `species` field for main name field #329

sckott · 2018-10-23T22:30:00Z

right now we present in the first column in results from occ_data and occ_search the species name - the value from the field species in the occurrence route. For all taxa at species rank and above this is fine.

for those at species rank, species name of course makes sense
for those above species rank, these are occurrences, so there has to be a species name, unless there's probably some cases in which there's no species identification

Here is where we select species column https://github.com/ropensci/rgbif/blob/master/R/zzz.r#L106-L107 and move it to the front

For ranks below species, e.g., http://api.gbif.org/v1/occurrence/search?taxonKey=6163845 for Ursus arctos horribilis, the species value of Ursus arctos is accurate for it's species rank level name, but the user probably expects to see a subspecific name instead of the species rank name.

I don't think we want to go with scientificName or acceptedScientificName as they are complete names with authorities and years, or do we? Does anyone want this name format? Do we stick with what we have now except add in:

for each occurrence that we detect the rank is below the species level, give back the appropriate name matching that rank by constructing from the parts, e.g., the entry for species + the entry for infraspecificEpithet

any feedback appreciated @kgturner @damianooldoni @MattBlissett @jwhalennds @poldham

The text was updated successfully, but these errors were encountered:

peterdesmet · 2018-10-24T06:58:26Z

In result sets, I prefer to have as first column some kind of ID. For occurrences I would expect “occurrenceID” to be the first column.

Constructing names from parts might lead to names that are not wellformed (e.g. plant names have “var.” in their name)... that is, if we care. I would expect scientificName.

My 2 cents

kgturner · 2018-10-24T15:25:40Z

Not a fan of scientificName or acceptedScientificName because of the authors, as Scott mentioned. It makes sense for the first column in a row to be a unique identifier (so maybe key?), but perhaps the next columns be genericName, specificEpithet, infraspecificEpithet. Constructing a column from the parts seems fine, but also, you could leave it to the user to do it. My two cents as well.

…

On Wed, Oct 24, 2018 at 2:58 AM Peter Desmet ***@***.***> wrote: In result sets, I prefer to have as first column some kind of ID. For occurrences I would expect “occurrenceID” to be the first column. Constructing names from parts might lead to names that are not wellformed (e.g. plant names have “var.” in their name)... that is, if we care. I would expect scientificName. My 2 cents — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#329 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACARS6OPRROLUd1rcWxHSteCeEp-B0grks5uoA-TgaJpZM4X2yXx> .

-- Kathryn G Turner, PhD kathryngturner.com https://profiles.impactstory.org/u/0000-0001-8982-0301 alienplantation.com

sckott · 2018-10-24T16:56:06Z

thanks, good idea to make first column the occurrence id. as it should be unique for each row.

constructing a name from parts does open up the possibility for occassional errors and badly formed names as you said Peter.

the 2nd column as scientificName makes sense to me

damianooldoni · 2018-10-25T09:16:19Z

I have to admit I work most of the time with keys and having a unique key as first column makes sense to me. I like scientific names way more than canonical or composed names (names from parts as you call them) because they are as much unique as possible. Name from parts are less informative and can induce misinterpretations of the returned data.

sckott · 2018-10-25T15:13:13Z

thanks @damianooldoni for your feedback.

…olumns in the dataframe fixed test accordingly for output changes cleaned up tests for each and put assertinos outside of vcr calls bump dev version

…rch uses #329

sckott · 2019-02-22T00:17:10Z

okay, pushed changes. considering this done. made the following changes:

occ_data and occ_search both now have:

first column key: the occurrence key
second column scientificName: the scientific name associated with the occurrence
the name column that previously was the first column is a dup of scientificName to retain a column of that name to not break any downstream; the previous name field was the species field renamed, so this name field has changed content - BUT i think in subsequent versions good idea perhaps to drop name altogether

sckott added this to the v1.2 milestone Oct 23, 2018

sckott added a commit that referenced this issue Feb 22, 2019

occ_get test fixes - uses same gbifparser() internal fxn that occ_sea…

4402409

…rch uses #329

sckott closed this as completed Feb 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`occ_data`/`occ_search`: maybe not use `species` field for main name field #329

`occ_data`/`occ_search`: maybe not use `species` field for main name field #329

sckott commented Oct 23, 2018

peterdesmet commented Oct 24, 2018

kgturner commented Oct 24, 2018 via email

sckott commented Oct 24, 2018

damianooldoni commented Oct 25, 2018 •

edited

sckott commented Oct 25, 2018

sckott commented Feb 22, 2019

occ_data/occ_search: maybe not use species field for main name field #329

occ_data/occ_search: maybe not use species field for main name field #329

Comments

sckott commented Oct 23, 2018

peterdesmet commented Oct 24, 2018

kgturner commented Oct 24, 2018 via email

sckott commented Oct 24, 2018

damianooldoni commented Oct 25, 2018 • edited

sckott commented Oct 25, 2018

sckott commented Feb 22, 2019

`occ_data`/`occ_search`: maybe not use `species` field for main name field #329

`occ_data`/`occ_search`: maybe not use `species` field for main name field #329

damianooldoni commented Oct 25, 2018 •

edited