-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
occurrencelist() not returning all the gbif records for a species #25
Comments
Hey @dmcglinn , answer in a second |
Note that So, the problem is that the search is saying I want exactly "Aristolochia serpentaria", when what it seems like you want is that, but with variants, right? Try this: library(rgbif)
out <- gbifdata(occurrencelist(scientificname = 'Aristolochia serpentaria*', coordinatestatus = TRUE, maxresults = 1000))
unique(out$taxonName)
[1] Aristolochia serpentaria l. Aristolochia serpentaria
Levels: Aristolochia serpentaria Aristolochia serpentaria l.
nrow(out)
[1] 96 Notice the asterisk after the taxon name, and that you get two names returned, one with l. , presumably for Linnaeus. Gives 96 georeferenced records though, where GBIF gives 116 (GBIF does give 431 records as you said, but not all have lat/long data) |
Hi again @dmcglinn Here's the responsible line in the code https://github.com/ropensci/rgbif/blob/master/R/methods.r#L44 It removes rows that have NA's for both lat and long. And 20 of the 116 records have zeros for both lat and long, even on the GBIF site, see here http://data.gbif.org/ws/rest/occurrence/list?scientificname=Aristolochia%20serpentaria*&coordinatestatus=TRUE So those zeros get converted to NA's and removed in What do you think? |
Hey @schamberlain thanks for the help and speedy replies on these issues. It does look like adding the '*' to the species name so that variants were returned was the primary issue, but also changing coordinatestatus to FALSE increased the number of returns as well. The query:
returns 306 records which is identical to but these queries do not return the full 431 items that a normal gbif species query on Aristolochia serpentaria returns. It appears that this may be do the fact that a GBIF query returns a broader range of names, specifically
whereas the GBIF query at http://data.gbif.org/ returns these names as well as synonym names such as: Aristolochia hastata |
I checked that those additional names were indeed synonyms here: http://www.itis.gov/ |
Interesting. So GBIF.org is giving back synonyms as well as actual matches of the query string, whereas their API does not do that. Let me see if there is a parameter that we could fiddle with to get exactly what they give back. |
The API docs says
and
So the I'm guessing GBIF.org gets a taxonconceptkey based on your search, then looks up synonyms - but doesn't do this with the API - weird. |
Yea that's unfortunate, but I suppose one solution is the following?
However this now returns many more records than the original gbif.org query. |
Hmmm, was trying getting synonyms from ITIS, and feeding those in to GBIF, but GBIF has different synonyms! Anyway, would be nice if GBIF had a synonyms API. |
The problem with the approach I proposed is that it does not guarantee that duplicate records are not returned. Does |
going out for a bit... |
just posted this pull request to include unique id's with the query results: #29 |
Once #29 is merged the following query will return the same number of results as the GBIF web portal:
431 results matches the number of results returned when you do a simple web query for this species. |
merged your pull, thanks for that! What do you think @dmcglinn ? Should functions try to match exactly what happens in the GBIF web interface? Or not? |
I think you should provide the option for this with a new function, see my suggested solution in #30 The primary benefit in my mind is that if someone doesn't want to do the work of sorting out synonymy on their own and then querying each name individually you can provide the option of using GBIF's internal synonym mapping to complete the query. There is also the added benefit of the similarity between the web interface and the R query but that seems relatively minor (you'll probably just get less users complaining that something may have gone wrong). However, more functions in the package results in more effort maintaining so you may ultimately decide its not worth it. |
Thanks for the new function! Right, we should definitely strive to make it easier for users, which your function does. I would like to have just one function that does everything with the occurencelist endpoint, but I imagine that is too difficult b/c there is a lot going on there. Another thing not included is the ability to specify many values for the same parameter, discussed here #28 . Hoping that they will change that since it's a lot of waste to used named params over and over again. |
closing this for now |
In version 0.3.0, I noticed that the occurrencelist() function does return all of the records that an identical query at http://data.gbif.org/ .
For example the query:
occurrencelist(scientificname = 'Aristolochia serpentaria', coordinatestatus = TRUE, maxresults = 1e6)
returns 179 records but the identical query at http://data.gbif.org/ returns 431 records.
I have not tried to track down the potential source of this discrepancy in the code yet. I also have not investigated if other versions of rgbif have similar issues.
The text was updated successfully, but these errors were encountered: