-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Submission: nbaR #257
Comments
Thanks very much for your submission @hettling - we're discussing now and will get back to you soon |
Editor checks:
Editor commentsThanks for your submission @hettling ! Here's the output from goodpractice. If you haven't used ── GP nbaR ─────────
It is good practice to
✖ write unit tests for all functions, and all package code in general. 92% of code lines are covered by test cases.
R/ApiClient.r:49:NA
R/ApiClient.r:53:NA
R/ApiClient.r:64:NA
R/ApiClient.r:65:NA
R/ApiClient.r:66:NA
... and 679 more lines
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
shorter than 80 characters
tests/testthat/test-GatheringEvent.r:92:1
tests/testthat/test-MultiMediaGatheringEvent.r:92:1
tests/testthat/test-MultiMediaObject.r:106:1
tests/testthat/test-MultiMediaObject.r:108:1
tests/testthat/test-Specimen.r:103:1 Seeking reviewers now 🕐 Reviewers:
|
@hettling are you going to be okay with transferring to ropensci github org, or do you need to retain in naturalis? |
@sckott I'm ok to transfer it to ropensci! |
great, thanks @hettling |
Reviewers assigned:
|
@DomBennett do you think you can get your review in soon? @mbjoseph your review is due a week from today |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 8 Review CommentsSorry for the delay of my review, the package is bigger than I initially thought! I think the package is very effective at what it sets out to do: it provides all the necessary tools for interacting with the NBA API, it is richly documented with details of all the classes and functions, and, in virtually all of my interactions with the package it functioned without any problems or errors. I therefore would very much suggest it be admitted to ROpenSci. I did, however, struggle to review this package due to large parts of it being auto generated with swagger. Although I think it is really important to use a tool like swagger -- indeed it is recommended by ROpenSci -- to capture an updating service like an API, it does make the reviewer’s life a little difficult because I don’t know whether problem code is originating from the developer or the codegen engine. To give an example, I found the process of querying the API a little annoying: generate a client object, generate query condition objects followed by a query specification object and, then, the results object. I understand that this R6-based process is created by the codegen, not the authors themselves. I can also see that the process is very powerful as by following so closely the design of the NBA API, it allows users to create any kind of query they want. (Quick sidenote, as far as my Googling takes me, I think this may be the first swagger R package to be submitted to ROpenSci -- with the possible exception of fishbaseapi?! I wasn’t able to find any similar R packages on which to emulate my own review.) I fear though that your average bio/eco R user will find the R6 objects a little alien. I had to follow the articles and documentation very closely before I was able to make my own queries that were distinct from those provided as examples. My main requests, as a reviewer, would therefore be: Improve the documentation: more examples and better explanations There is a lot of documentation already, and it was really critical for me to be able to start making more of my own queries. I think though I would have liked a little more organisation in the “getting started” article and more real-world examples. I think the shark vignette is great because it provides a real world use for the package, additional articles need not be so long though. In particular I think it would be great if the “Getting started” article were split up (e.g. into real basics and then independent articles for each data domain). I think a visual schematic showing the basic class structures would really help as well. Finally, I was interested in the operators (equals, not equals, contains….) but found these were not sufficiently explained. More examples would help. Function wrappers? I have a thought that for the most basic of queries you could simply have wrapper functions, e.g. E.g. this below wrapper let's a user run of the examples from "getting started". specimen_search <- function(value, field) {
qcs <- vector(mode = 'list', length = length(value))
for (i in seq_along(value)) {
qcs[[i]] <- QueryCondition$new(field = field[[i]], operator = 'EQUALS',
value = value[[i]])
}
qs <- QuerySpec$new(conditions = qcs)
sc <- SpecimenClient$new()
res <- sc$query(querySpec = qs, size = 100)
lapply(X = res$content$resultSet, FUN = function(x) {
x$item$toList()
})
}
res <- specimen_search(value = c('female', 'species', "Equidae"),
field = c('sex', "identifications.taxonRank",
"identifications.defaultClassification.family")) Below I highlight some additional specific issues relating to automated checks and my general play-arounds. Test resultsWhen I first downloaded the repo, I got no errors when testing the package. But since 8 Nov 2018, I started getting the error and warnings below. This may just be due to the NBA API itself, but is it possible for the package be able to note the difference between API failure and internal failure?
Check results
If the authors wish to submit to CRAN they will likely need to ensure that the package size is less than 5MB. The excessive size of the package is likely due to the images in the vignettes. They could prevent this by halting the building of vignettes in the R package itself or removing the images. I prefer the former option as I feel they could, in fact do with more use-cases (as explained above and below), and they can always refer users to the online documentation in the locally available R help files. If they choose the former, they should refer to this guide: https://pkgdown.r-lib.org/articles/pkgdown.html#articles Goodpractice resultsBy removing the vignettes from the R package itself, you may also fix these issues goodpractice raised:
Spell checkI highlight a few spelling errors (en.US):
Pkgdown issues
These above errors can be fixed by updating the pkgdown yaml. (Unless you are using tic to auto-generate the yaml?) Additionally, it might be nice to organise the functions by utility to a user (e.g. Also, Changelog leads to a 404. Help filesIn one of the vignettes, it was suggested I “Have a look at ?Response to see what this object contains.” -- yet the help file had little to no information. I think the level of information found in this help file is not too different from that of others. I imagine that much of the documentation will have been generated with the code generator but i think it may be necessary to provide more information particularly for the more commonly used functions/classes. In particular, there are very few examples provided for the majority of the help files. I think it is required that all public functions/classes come with examples for any ROpenSci package. With roxygen, examples can be stored in an examples/ folder and called in using roxygen syntax. Vignettes/articlesToo big Additionally, the size of all the articles could be massively reduced by preventing much of the unnecessary printing to console of warnings, download status, long method lists by either clever behind-the-scenes coding or using suppressMessages/Warnings. sc? allowed_operators <- sc$get_field_info()$content
names(allowed_operators) Or, perhaps a wrapper function? Schematic? It might be nice in the getting started vignette to provide a schematic of the query steps to highlight the commonalities (e.g. $new, $resultset … etc.)
Map of tomatoes There is code, some of which was commented out, placed at the bottom of the En Tibi vignette without any accompanying text. Running this code I was able to generate a map of collected S. lycopersicum specimens. Is there a reason for it to be excluded? I am a little confused with the query for ‘gatheringEvent.siteCoordinates’ -- they are set to ‘NOT_EQUALS’ yet return only specimens which contain values in these slots. More information on operators It would be great if the different operators could be demonstrated. For example, I wanted to download the taxonomic records for a given list of genera. Initially I was trying to compile multiple query conditions using ‘OR’. But I since discovered that it could be easily done using the ‘IN’ operator with value as a vector of genera. I noticed there are lots of operators available… would it be too much to ask for more examples that involve other operators other than just EQUALS? Errors/issuesNamed list Named lists in query conditions do not work? library(nbaR)
sc <- SpecimenClient$new()
field <- 'identifications.defaultClassification.genus'
qc <- QueryCondition$new(field = field, operator = 'EQUALS', value = 'Solanum')
# success
qs <- QuerySpec$new(conditions = list(qc))
res <- sc$query(querySpec = qs)
res$content$totalSize
# fail
qc_list <- list('solanum' = qc)
qs <- QuerySpec$new(conditions = qc_list)
res <- sc$query(querySpec = qs)
res$content$totalSize Size I may have missed something here, but for me the library(nbaR)
sc <- SpecimenClient$new()
qc <- QueryCondition$new(field = 'identifications.defaultClassification.genus',
operator = 'EQUALS', value = 'Solanum')
qc2 <- QueryCondition$new(field = 'identifications.defaultClassification.specificEpithet',
operator = 'EQUALS', value = 'lycopersicum')
qs <- QuerySpec$new(conditions = list(qc, qc2))
res <- sc$query(querySpec = qs, size = 1000)
res$content$totalSize == length(res$content$resultSet) Final notes
My session info
In sum, I have no doubt that this package will do everything I’d expect as a portal to NBA. I would just like a bit more documentation, clearer explanation and more examples. |
thanks @DomBennett |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 10 Review CommentsOverall the nbaR package grants users access to one of the world's richest collection of data, specimens, and multimedia, which is incredible! As somebody who works occasionally on species distribution modeling and spatiotemporal statistics, the NBA data are a treasure trove, and having access via R is great. It does seem like nbaR is comprehensive, in that it includes a lot of (all?) functionality from the NBA API. The user interface for the package seems tightly coupled to the API that serves the data, which I suspect is a consequence of using swagger (which I had never seen before). This is great for handling future updates to the NBA API, but seems to have some downsides. In particular, nbaR may be at a lower-level of abstraction than most R users might expect for a data retrieval package, which could make it somewhat unintuitive to use. The current user interface also could limit the ease with which output from the package can be integrated with existing tools and common R workflows (e.g., data frame-based workflows). One potential solution that retains robustness to changes in the API would be to include wrapper functions that ingest the R6 objects that are currently created, and from those objects, create more commonly used objects (e.g., data frames, lists, and/or vectors). In particular, functionality to make data frames from query results might be nice, as data frames will generally be a preferred format for people wanting to analyze data. One expectation for new users might be that each record is a row in a data frame. The nested nature of the data does not immediately admit simply generating a table of records, but list columns might help to make this data structure (one row per record) more manageable for users to understand. Adding some functionality like this could make the package more accessible, and would probably be my biggest suggestion for improvement. Musings on reviewing autogenerated codeThe code has a lot of repetition, and is not particularly human readable presumably because it is generated automatically from the API. I'm not sure how to evaluate this part of the package: on the one hand, it is awesome that the code is programmatically generated to deal with future changes in the API, but on the other hand, there are repetitive blocks of code like the following, which itself is repeated twice in the definition of the self[["sourceSystemId"]] <-
SpecimenList[["sourceSystemId"]]
self[["recordURI"]] <-
SpecimenList[["recordURI"]]
self[["id"]] <-
SpecimenList[["id"]]
self[["unitID"]] <-
SpecimenList[["unitID"]]
self[["unitGUID"]] <-
SpecimenList[["unitGUID"]]
self[["collectorsFieldNumber"]] <-
SpecimenList[["collectorsFieldNumber"]]
self[["assemblageID"]] <-
SpecimenList[["assemblageID"]]
self[["sourceInstitutionID"]] <-
SpecimenList[["sourceInstitutionID"]]
self[["sourceID"]] <-
SpecimenList[["sourceID"]]
self[["previousSourceID"]] <-
SpecimenList[["previousSourceID"]]
self[["owner"]] <-
SpecimenList[["owner"]]
self[["licenseType"]] <-
SpecimenList[["licenseType"]]
self[["license"]] <-
SpecimenList[["license"]]
self[["recordBasis"]] <-
SpecimenList[["recordBasis"]]
self[["kindOfUnit"]] <-
SpecimenList[["kindOfUnit"]]
self[["collectionType"]] <-
SpecimenList[["collectionType"]]
self[["sex"]] <-
SpecimenList[["sex"]]
self[["phaseOrStage"]] <-
SpecimenList[["phaseOrStage"]]
self[["title"]] <-
SpecimenList[["title"]]
self[["notes"]] <-
SpecimenList[["notes"]]
self[["preparationType"]] <-
SpecimenList[["preparationType"]]
self[["previousUnitsText"]] <-
SpecimenList[["previousUnitsText"]]
self[["numberOfSpecimen"]] <-
SpecimenList[["numberOfSpecimen"]]
self[["fromCaptivity"]] <-
SpecimenList[["fromCaptivity"]]
self[["objectPublic"]] <-
SpecimenList[["objectPublic"]]
self[["multiMediaPublic"]] <-
SpecimenList[["multiMediaPublic"]] One reason to prefer DRY code is to avoid making the same change over and over, but if a machine is tasked with making those changes, maybe DRY matters less? (As an aside: I was somewhat unsure in reviewing whether my suggestions relate more to swagger or nbaR!) The structure of the code might make it difficult for users to glean much information by reading the source code, relative to some other packages. From my perspective this could be fine, as long as there is good documentation (for humans) that gives users the information they need. DocumentationI found the default R help files (e.g., On to the vignettes! As the package is currently written, these vignettes are a key resource for users. I really appreciated that the vignettes seemed complete in the sense of covering major functionality, and both of the case study vignettes were fun to work through and will be useful for users who are trying to solve those types of problems. That said, there may be some better ways to structure the vignettes. The "Get started" vignette currently is long, which might deter users who are looking for something like a "Quickstart" guide. Would it make sense to break the current vignette into several smaller vignettes on the basis of which Minor notes
> library(testthat)
> library(nbaR)
>
> test_check("nbaR")
Downloading: 5.4 kB ── 1. Failure: getDistinctValuesPerGroup works (@test-specimenClient-misc.R#90) ────────────────────────────────────────────
res$content has length 3, not length 2.
OK: 1428 SKIPPED: 2 FAILED: 1
1. Failure: getDistinctValuesPerGroup works (@test-specimenClient-misc.R#90)
Character length in example code Some of these code chunks have very long lines (over 80 char). It would be great to have shorter lines so that users don't need to scroll right. Empty query example* https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L103-L104 Starting with an empty query might confuse new users (I was confused when I submitted an empty query and got back some results). Would it make more sense to begin with a common query operation? Pretty printing? I wonder whether it would be better to have some more human readable representation when a client query result is printed? The current output is: > # specify two query conditions
> l <- list(identifications.typeStatus="holotype", sex="female")
> # run query
> res <- client$query(queryParams=l)
> res
<Response>
Public:
clone: function (deep = FALSE)
content: QueryResult, R6
initialize: function (content, response)
response: response Users might expect something that would tell how many results were returned by the query, and maybe some information about the first couple of results. A public print method that provides some of this information would be nice. Note that this might not be necessary to add to these classes if there are wrapper functions that users are calling instead. Advanced queries It would be great if users could use more idiomatic R code for multiple queries, e.g., something like Query services example https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L232-L246 This example is great - and may be a good substitute for the current first example of an empty query. find_by_ids https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L333-L339 This method could be more idiomatic if it took a vector of ids rather than one string of ids separated by a comma. get_distinct_values https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L353 There is a typo here: "thy data" get_distinct_values_per_group and count_distinct_values_per_group Both of these return complex nested lists, that a typical user might not immediately know how to use. Would it be possible to return a data frame here instead? DwCA download services It would be worth suppressing the output of the download to improve readability. Differentiating among clients https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L474 It would be great to have some text here to explain how the taxonomic data services and
Typos
Explaining the size argument for new users https://github.com/naturalis/nbaR/blob/master/vignettes/sharks.Rmd#L89-L92 The comment here suggests that size is important - it could be nice to add some text explaining why this is necessary in this case, and how users can determine whether it is necessary in their own applications. Explaining warnings https://github.com/naturalis/nbaR/blob/master/vignettes/sharks.Rmd#L190 This command raised two warnings: > times <- geo_age(unique(c(as.character(data$youngChronoName), as.character(data$oldChronoName))))
Warning messages:
1: In FUN(X[[i]], ...) :
Could not retreive values for geo unit "Mioceen" from earthlifeconsortium.org
2: In FUN(X[[i]], ...) :
Could not retreive values for geo unit "Unspecified age" from earthlifeconsortium.org It would be nice to have some text here for users to explain what these warnings are, and whether they matter.
It looks like there are some undocumented code blocks at the end of this vignette. I was going to suggest adding some content to show how to map specimens. Bringing this code into the vignette with some additional text would be awesome! Wrap upOverall, there is a lot of great functionality in nbaR. My main suggestion would be to add wrapper functions that return data frames or other more common R objects, to facilitate the integration of nbaR with existing tools and make the user interface more intuitive. The vignettes are awesome, and with some minor tweaks, might be more accessible for users who are looking for ways to solve their particular use case. My session info:
Thanks @sckott for inviting me to review! |
thanks for your review @mbjoseph ! |
@hettling all reviews are now in, continue discussion here and let me know if you have any questions about the process |
Thanks @DomBennett and @mbjoseph for thorough reviews of the package! |
@sckott I'm starting to tackle the reviewer's comments now, is there a deadline? I couldn't find this information on https://ropensci.github.io/dev_guide/policies.html#review-process ... |
I don't think we have a deadline. If it will be a long time, we usually assign a |
@hettling Applying the holding tag now until this is active again |
@sckott Ok, reasonable. I would like to mention that there is progress in applying the reviewer's comments, documented in this milestone. Remaining issues are mostly documentation related. |
thanks for the update |
Hi @sckott @DomBennett @mbjoseph, Wrapper functions This is documented in the following issues: #18, #32, #34 Documentation
Tests/Checks/Coverage
Miscellaneuos
What I did not do
I hope you agree with me that these changes significantly improved the usage of this package. Thanks again for taking the time to review. Best, Hannes |
thanks very much for your response @hettling @DomBennett @mbjoseph are you happy with the changes? any further questions/suggestions? |
@DomBennett @mbjoseph any thoughts? if no response by the end of the week i'll assume you're happy with the changes |
@hettling this looks great! I like the reorganization of the documentation, and the wrapper functions provide a nice solution for common use cases, without sacrificing access to the functionality of the API. I approve. 👍 |
thanks @mbjoseph |
Hi @hettling Looks good Hannes! That's an amazingly thorough response to every point raised in the reviews. I re-ran |
Also... I just noticed you have this kind of code at the beginning of a couple of your tests: wd <- getwd()
if (grepl("testthat", wd)) {
data_dir <- file.path("data")
} else {
## for running test at package level
data_dir <- file.path("tests", "testthat", "data")
}
tc <- TaxonClient$new(basePath = "http://api.biodiversitydata.nl/v2")
if (!tc$ping()) {
skip("NBA not available, skipping test")
} I recently learned about |
thanks for your feedback @DomBennett also for package size, if you have any datasets in thepkg, you can resave datasets with different compression options to see what is the smallest. |
@mbjoseph Thanks! |
Hi @DomBennett , Strangely I do not get the note from goodpractice about package size. I run it as follows:
and then if I look for the checks that contain 'size':
If I check the size of the package in my library location, I get |
any comment @DomBennett ? |
thanks @DomBennett |
Approved! Thanks again for your submission @hettling ! And thanks for your reviews @DomBennett and @mbjoseph 👌 A few comments:
To-dos:
We've started putting together a bookdown with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved. The repo is at https://github.com/ropensci/dev_guide Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, we'll have our community manager @stefaniebutland get in touch with you on that |
Thanks @sckott, that's great news! |
you should have admin now. do you walso want Rutger and Maarten to have admin access? or just write? |
Thanks! Admin rights would be good for Rutger and Maarten. |
they've been added |
Summary
The package is a full R client to the Netherlands Biodiversity API (NBA, see http://docs.biodiversitydata.nl/en/latest/) giving researchers access to several large Natural History and Botany collections in the Netherlands. Additionally, some convenience functions feature integration with other packages.
https://github.com/naturalis/nbaR
The package fits into the data retrieval category as it provides API access to the databases
at the Naturalis Biodiversity Center, Leiden, the Netherlands. Since R is a very important language in the ecology/evolutionary biology research community, this package gives low-threshold access to our data.
Target audience are ecologists, evolutionary biologists, taxonomists, and researchers interested in museum collections.
yours differ or meet our criteria for best-in-category?
No.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Note: There is not yet a full manuscript since this requires collaboration with other departments in our institution. The time-allocation for writing the paper is not controlled by our group, we hence decided not to wait until a full paper is prepared with our collaborators.
Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:R CMD check
succeeds without warnings or notes.Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Dom Bennett @DomBennett Alexander Zizka @azizka
The text was updated successfully, but these errors were encountered: