Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: nbaR #257

Closed
14 of 19 tasks
hettling opened this issue Oct 4, 2018 · 36 comments
Closed
14 of 19 tasks

Submission: nbaR #257

hettling opened this issue Oct 4, 2018 · 36 comments

Comments

@hettling
Copy link

hettling commented Oct 4, 2018

Summary

  • What does this package do? (explain in 50 words or less):

The package is a full R client to the Netherlands Biodiversity API (NBA, see http://docs.biodiversitydata.nl/en/latest/) giving researchers access to several large Natural History and Botany collections in the Netherlands. Additionally, some convenience functions feature integration with other packages.

  • Paste the full DESCRIPTION file inside a code block below:
Package: nbaR
Title: R Package Client for the Netherlands Biodiversity API
Version: 0.0.0
Authors@R: c(person("Hannes", "Hettling", email = "hannes.hettling@naturalis.nl", role = c("aut", "cre")),
             person("Rutger", "Vos", role="aut"))
Maintainer: Hannes Hettling <hannes.hettling@naturalis.nl>
Description: Access to the digitised Natural History collection at the Naturalis Biodiversity Center.
    This is the official client to the Netherlands Biodiversity API
    (NBA, <http://api.biodiversitydata.nl>) for the R programming language.
    More information on the NBA can be found at <http://docs.biodiversitydata.nl>.
URL: https://github.com/naturalis/nbaR
BugReports: https://github.com/naturalis/nbaR/issues/new
Depends: R (>= 3.3.0)
Encoding: UTF-8
License: MIT + file LICENSE
LazyData: true
Suggests:
    testthat,
    knitr,
    rmarkdown,
    tic,
    remotes,
    pkgdown
Imports:
    jsonlite,
    httr,
    R6,
    ape
RoxygenNote: 6.1.0
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/naturalis/nbaR

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

The package fits into the data retrieval category as it provides API access to the databases
at the Naturalis Biodiversity Center, Leiden, the Netherlands. Since R is a very important language in the ecology/evolutionary biology research community, this package gives low-threshold access to our data.

  •   Who is the target audience and what are scientific applications of this package?  

Target audience are ecologists, evolutionary biologists, taxonomists, and researchers interested in museum collections.

No.

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Note: There is not yet a full manuscript since this requires collaboration with other departments in our institution. The time-allocation for writing the paper is not controlled by our group, we hence decided not to wait until a full paper is prepared with our collaborators.

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

R CMD check succeeds without warnings or notes.

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Dom Bennett @DomBennett Alexander Zizka @azizka

@sckott sckott added the package label Oct 4, 2018
@sckott
Copy link
Contributor

sckott commented Oct 4, 2018

Thanks very much for your submission @hettling - we're discussing now and will get back to you soon

@sckott sckott added the pub:mee label Oct 4, 2018
@sckott sckott self-assigned this Oct 4, 2018
@sckott
Copy link
Contributor

sckott commented Oct 12, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @hettling !

Here's the output from goodpractice. If you haven't used goodpractice it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started. There's very little in the report below, which is a good thing!

── GP nbaR ─────────
It is good practice towrite unit tests for all functions, and all package code in general. 92% of code lines are covered by test cases.

    R/ApiClient.r:49:NA
    R/ApiClient.r:53:NA
    R/ApiClient.r:64:NA
    R/ApiClient.r:65:NA
    R/ApiClient.r:66:NA
    ... and 679 more linesavoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    tests/testthat/test-GatheringEvent.r:92:1
    tests/testthat/test-MultiMediaGatheringEvent.r:92:1
    tests/testthat/test-MultiMediaObject.r:106:1
    tests/testthat/test-MultiMediaObject.r:108:1
    tests/testthat/test-Specimen.r:103:1

Seeking reviewers now 🕐


Reviewers:

@sckott
Copy link
Contributor

sckott commented Oct 12, 2018

@hettling are you going to be okay with transferring to ropensci github org, or do you need to retain in naturalis?

@hettling
Copy link
Author

@sckott I'm ok to transfer it to ropensci!

@sckott
Copy link
Contributor

sckott commented Oct 16, 2018

great, thanks @hettling

@sckott
Copy link
Contributor

sckott commented Oct 23, 2018

Reviewers assigned:

@sckott
Copy link
Contributor

sckott commented Nov 8, 2018

@DomBennett do you think you can get your review in soon?

@mbjoseph your review is due a week from today

@DomBennett
Copy link

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8


Review Comments

Sorry for the delay of my review, the package is bigger than I initially thought!

I think the package is very effective at what it sets out to do: it provides all the necessary tools for interacting with the NBA API, it is richly documented with details of all the classes and functions, and, in virtually all of my interactions with the package it functioned without any problems or errors. I therefore would very much suggest it be admitted to ROpenSci.

I did, however, struggle to review this package due to large parts of it being auto generated with swagger. Although I think it is really important to use a tool like swagger -- indeed it is recommended by ROpenSci -- to capture an updating service like an API, it does make the reviewer’s life a little difficult because I don’t know whether problem code is originating from the developer or the codegen engine.

To give an example, I found the process of querying the API a little annoying: generate a client object, generate query condition objects followed by a query specification object and, then, the results object. I understand that this R6-based process is created by the codegen, not the authors themselves. I can also see that the process is very powerful as by following so closely the design of the NBA API, it allows users to create any kind of query they want.

(Quick sidenote, as far as my Googling takes me, I think this may be the first swagger R package to be submitted to ROpenSci -- with the possible exception of fishbaseapi?! I wasn’t able to find any similar R packages on which to emulate my own review.)

I fear though that your average bio/eco R user will find the R6 objects a little alien. I had to follow the articles and documentation very closely before I was able to make my own queries that were distinct from those provided as examples. My main requests, as a reviewer, would therefore be:

Improve the documentation: more examples and better explanations

There is a lot of documentation already, and it was really critical for me to be able to start making more of my own queries. I think though I would have liked a little more organisation in the “getting started” article and more real-world examples. I think the shark vignette is great because it provides a real world use for the package, additional articles need not be so long though. In particular I think it would be great if the “Getting started” article were split up (e.g. into real basics and then independent articles for each data domain). I think a visual schematic showing the basic class structures would really help as well. Finally, I was interested in the operators (equals, not equals, contains….) but found these were not sufficiently explained. More examples would help.

Function wrappers?

I have a thought that for the most basic of queries you could simply have wrapper functions, e.g. specimen_search or taxon_search, which would take just names/ids and return R data structures that the majority of R users would understand. These wrappers would likely lead to much less coding on the part of the end-user and, although less flexible, they should make the package much more accessible.

E.g. this below wrapper let's a user run of the examples from "getting started".

specimen_search <- function(value, field) {
  qcs <- vector(mode = 'list', length = length(value))
  for (i in seq_along(value)) {
    qcs[[i]] <- QueryCondition$new(field = field[[i]], operator = 'EQUALS',
                                   value = value[[i]])
  }
  qs <- QuerySpec$new(conditions = qcs)
  sc <- SpecimenClient$new()
  res <- sc$query(querySpec = qs, size = 100)
  lapply(X = res$content$resultSet, FUN = function(x) {
    x$item$toList()
  })
}

res <- specimen_search(value = c('female', 'species', "Equidae"),
                       field = c('sex', "identifications.taxonRank",
                                 "identifications.defaultClassification.family"))

Below I highlight some additional specific issues relating to automated checks and my general play-arounds.

Test results

When I first downloaded the repo, I got no errors when testing the package. But since 8 Nov 2018, I started getting the error and warnings below. This may just be due to the NBA API itself, but is it possible for the package be able to note the difference between API failure and internal failure?

test-multimediaClient-metadata.R:36: warning: Settings work
Status code:500
Internal Server Error
Exception: Invalid NBA setting: 50000
Exception type: java.lang.IllegalArgumentException
Full stack trace stored in response object
test-specimenClient-find.R:57: warning: Test error handling in find functions
404 (NOT FOUND)
No Specimen exists with ID XXX
test-specimenClient-misc.R:90: failure: getDistinctValuesPerGroup works
res$content has length 3, not length 2.
test-specimenClient-query.R:149: warning: Errors and warnings work
Status code:500
Internal Server Error
Exception: Field associatedMultiMediaUris.accessUri cannot be queried: field is not indexed
Exception type: nl.naturalis.nba.api.InvalidConditionException
Full stack trace stored in response object
test-taxonClient-find.R:35: warning: Test error handling in find functions
404 (NOT FOUND)
No Specimen exists with ID XXX
test-taxonClient-query.R:80: warning: Errors and warnings work
Status code:500
Internal Server Error
Exception: Invalid element "somefield" in path "somefield"
Exception type: nl.naturalis.nba.api.InvalidConditionException
Full stack trace stored in response object
══ Results ════════════════════════════════════════
Duration: 85.4 s

OK:       1420
Failed:   1
Warnings: 8
Skipped:  2

Check results

Warning: @noRd [/Users/djb208/Coding/nbaR/R/Utils.r#27]: has no parameters
Warning: @noRd [/Users/djb208/Coding/nbaR/R/Utils.r#205]: has no parameters

* checking installed package size ... NOTE
  installed size is  5.9Mb
  sub-directories of 1Mb or more:
    R     1.4Mb
    doc   4.3Mb

* checking top-level files ... NOTE
Non-standard file/directory found at top level:
  ‘tic.R’

If the authors wish to submit to CRAN they will likely need to ensure that the package size is less than 5MB. The excessive size of the package is likely due to the images in the vignettes. They could prevent this by halting the building of vignettes in the R package itself or removing the images. I prefer the former option as I feel they could, in fact do with more use-cases (as explained above and below), and they can always refer users to the online documentation in the locally available R help files. If they choose the former, they should refer to this guide: https://pkgdown.r-lib.org/articles/pkgdown.html#articles

Goodpractice results

By removing the vignettes from the R package itself, you may also fix these issues goodpractice raised:

  ✖ write unit tests for all functions, and all package code in general.
    93% of code lines are covered by test cases.

    R/Feature.r:86:NA
    R/Feature.r:111:NA
    R/Feature.r:112:NA
    R/GeoArea.r:122:NA
    R/GeoArea.r:173:NA
    ... and 628 more lines

  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version.
    This typically indicates Rd problems.
  ✖ fix this R CMD check ERROR: Re-running with no redirection of
    stdout/stderr. Hmm ... looks like a package You may want to clean up by 'rm -Rf
    /var/folders/ps/g89999v12490dmp0jnsfmykm0043m3/T//Rtmpn15kYK/Rd2pdf11fbd7d461aa9'
  ✖ fix this R CMD check NOTE: installed size is 6.3Mb sub-directories of
    1Mb or more: R 1.8Mb doc 4.3Mb

Spell check

I highlight a few spelling errors (en.US):

Baeysian                          sharks.Rmd:22
conventience                      sharks.Rmd:160
digitised                         nbaR.Rd:12
                                         description:1
donw                              tomato.Rmd:156
errenous                          sharks.Rmd:247
exaclty                           tomato.Rmd:62
expleore                          sharks.Rmd:147
objetcs                           sharks.Rmd:171
respose                           tomato.Rmd:47
retreive                          geo_age.Rd:19
                                  tomato.Rmd:28
whch                              tomato.Rmd:62
worning                           sharks.Rmd:236

Pkgdown issues

Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`Element\``
Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`GeoAreaClient\``
Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`Element\``
Warning: In '_pkgdown.yml', topic must be a valid R expression.
Problem topic: `\`GeoAreaClient\``
Warning: Topics missing from index: chronos_calib, geo_age

These above errors can be fixed by updating the pkgdown yaml. (Unless you are using tic to auto-generate the yaml?)

Additionally, it might be nice to organise the functions by utility to a user (e.g. SpecimentClient, TaxonClient should take precedence over Agent). In packages I have written in the past I have organised the functions/classes by public and private categories.

Also, Changelog leads to a 404.

Help files

In one of the vignettes, it was suggested I “Have a look at ?Response to see what this object contains.” -- yet the help file had little to no information. I think the level of information found in this help file is not too different from that of others.

I imagine that much of the documentation will have been generated with the code generator but i think it may be necessary to provide more information particularly for the more commonly used functions/classes. In particular, there are very few examples provided for the majority of the help files. I think it is required that all public functions/classes come with examples for any ROpenSci package.

With roxygen, examples can be stored in an examples/ folder and called in using roxygen syntax.

Vignettes/articles

Too big
I think it may be a good idea to split the main “get started” vignette into multiple separate vignettes. The initial size of the document can be off-putting and all of the available resources may not be useful for everyone who may wish to use nbaR. I would suggest breaking down by data type: specimen, taxon … etc. and having an article for each domain.

Additionally, the size of all the articles could be massively reduced by preventing much of the unnecessary printing to console of warnings, download status, long method lists by either clever behind-the-scenes coding or using suppressMessages/Warnings.

sc?
At this point in the text ….
Note: The function getFieldInfo on a certain field lists which operators are allowed for that field (e.g. sc$get_field_info()$content$unitID$allowedOperators).
I got confused as to what sc was. (Also is it getFieldInfo or get_field_info?) I think this section could be improved. Perhaps a better example showing how a user might look up all the available query options for the specimen service.

allowed_operators <- sc$get_field_info()$content
names(allowed_operators)

Or, perhaps a wrapper function?

Schematic?

It might be nice in the getting started vignette to provide a schematic of the query steps to highlight the commonalities (e.g. $new, $resultset … etc.)

# specify data service
client <- DATASERVICE$new()
# create query conditions
condition1 <- QueryCondition$new()
condition2 <- QueryCondition$new()
...
# specify query
query <- QuerySpec$new(conditions(list(condition1, condition2 ...))
# search
res <- res$query(querySpec = query)

Map of tomatoes

There is code, some of which was commented out, placed at the bottom of the En Tibi vignette without any accompanying text. Running this code I was able to generate a map of collected S. lycopersicum specimens. Is there a reason for it to be excluded? I am a little confused with the query for ‘gatheringEvent.siteCoordinates’ -- they are set to ‘NOT_EQUALS’ yet return only specimens which contain values in these slots.

More information on operators

It would be great if the different operators could be demonstrated. For example, I wanted to download the taxonomic records for a given list of genera. Initially I was trying to compile multiple query conditions using ‘OR’. But I since discovered that it could be easily done using the ‘IN’ operator with value as a vector of genera.

I noticed there are lots of operators available… would it be too much to ask for more examples that involve other operators other than just EQUALS?

Errors/issues

Named list

Named lists in query conditions do not work?

library(nbaR)
sc <- SpecimenClient$new()
field <- 'identifications.defaultClassification.genus'
qc <- QueryCondition$new(field = field, operator = 'EQUALS', value = 'Solanum')
# success
qs <- QuerySpec$new(conditions = list(qc))
res <- sc$query(querySpec = qs)
res$content$totalSize
# fail
qc_list <- list('solanum' = qc)
qs <- QuerySpec$new(conditions = qc_list)
res <- sc$query(querySpec = qs)
res$content$totalSize

Size

I may have missed something here, but for me the size argument didn’t work -- the results were always stuck at 10.

library(nbaR)
sc <- SpecimenClient$new()
qc <- QueryCondition$new(field = 'identifications.defaultClassification.genus',
                         operator = 'EQUALS', value = 'Solanum')
qc2 <- QueryCondition$new(field = 'identifications.defaultClassification.specificEpithet',
                          operator = 'EQUALS', value = 'lycopersicum')
qs <- QuerySpec$new(conditions = list(qc, qc2))
res <- sc$query(querySpec = qs, size = 1000)
res$content$totalSize == length(res$content$resultSet)

Final notes

  • In the en tibi vignette, the first paragraph is repeated.
  • Shark vignette: get_age is not a function. Should be geo_age?
  • Shark vignette: broken images?
  • Might be better to save the shark phylogeny as an R object (see save and use xz compression for CRAN ). That way the file can be compressed and loading is easier for a user: data(shark_phylogeney). See: http://r-pkgs.had.co.nz/data.html
  • It is good practice to spell out logicals: T and F should be TRUE and FALSE. (T and F are not protected)
  • Keep 80 column width in vignette examples please.

My session info

Session info ---------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.442)           
 language (EN)                        
 collate  en_GB.UTF-8                 
 tz       Europe/Stockholm            
 date     2018-11-08                  

Packages -------------------------------------------------------------------------------------
 package   * version date       source        
 ape         5.1     2018-04-04 CRAN (R 3.5.0)
 base      * 3.5.1   2018-07-05 local         
 compiler    3.5.1   2018-07-05 local         
 curl        3.2     2018-03-28 CRAN (R 3.5.0)
 datasets  * 3.5.1   2018-07-05 local         
 devtools    1.13.6  2018-06-27 CRAN (R 3.5.0)
 digest      0.6.18  2018-10-10 cran (@0.6.18)
 graphics  * 3.5.1   2018-07-05 local         
 grDevices * 3.5.1   2018-07-05 local         
 grid        3.5.1   2018-07-05 local         
 httr        1.3.1   2017-08-20 CRAN (R 3.5.0)
 jsonlite    1.5     2017-06-01 CRAN (R 3.5.0)
 lattice     0.20-35 2017-03-25 CRAN (R 3.5.1)
 memoise     1.1.0   2017-04-21 CRAN (R 3.5.0)
 methods   * 3.5.1   2018-07-05 local         
 nbaR      * 0.0.0   2018-11-03 local         
 nlme        3.1-137 2018-04-07 CRAN (R 3.5.1)
 parallel    3.5.1   2018-07-05 local         
 R6          2.3.0   2018-10-04 cran (@2.3.0) 
 Rcpp        0.12.18 2018-07-23 CRAN (R 3.5.0)
 stats     * 3.5.1   2018-07-05 local         
 tools       3.5.1   2018-07-05 local         
 utils     * 3.5.1   2018-07-05 local         
 withr       2.1.2   2018-03-15 CRAN (R 3.5.0)
 yaml        2.2.0   2018-07-25 CRAN (R 3.5.0)

In sum, I have no doubt that this package will do everything I’d expect as a portal to NBA. I would just like a bit more documentation, clearer explanation and more examples.
Thanks for inviting me to review!

@sckott
Copy link
Contributor

sckott commented Nov 8, 2018

thanks @DomBennett

@mbjoseph
Copy link
Member

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10


Review Comments

Overall the nbaR package grants users access to one of the world's richest collection of data, specimens, and multimedia, which is incredible! As somebody who works occasionally on species distribution modeling and spatiotemporal statistics, the NBA data are a treasure trove, and having access via R is great. It does seem like nbaR is comprehensive, in that it includes a lot of (all?) functionality from the NBA API. The user interface for the package seems tightly coupled to the API that serves the data, which I suspect is a consequence of using swagger (which I had never seen before). This is great for handling future updates to the NBA API, but seems to have some downsides. In particular, nbaR may be at a lower-level of abstraction than most R users might expect for a data retrieval package, which could make it somewhat unintuitive to use. The current user interface also could limit the ease with which output from the package can be integrated with existing tools and common R workflows (e.g., data frame-based workflows).

One potential solution that retains robustness to changes in the API would be to include wrapper functions that ingest the R6 objects that are currently created, and from those objects, create more commonly used objects (e.g., data frames, lists, and/or vectors). In particular, functionality to make data frames from query results might be nice, as data frames will generally be a preferred format for people wanting to analyze data. One expectation for new users might be that each record is a row in a data frame. The nested nature of the data does not immediately admit simply generating a table of records, but list columns might help to make this data structure (one row per record) more manageable for users to understand. Adding some functionality like this could make the package more accessible, and would probably be my biggest suggestion for improvement.

Musings on reviewing autogenerated code

The code has a lot of repetition, and is not particularly human readable presumably because it is generated automatically from the API. I'm not sure how to evaluate this part of the package: on the one hand, it is awesome that the code is programmatically generated to deal with future changes in the API, but on the other hand, there are repetitive blocks of code like the following, which itself is repeated twice in the definition of the Specimen class:

self[["sourceSystemId"]] <-
  SpecimenList[["sourceSystemId"]]
self[["recordURI"]] <-
  SpecimenList[["recordURI"]]
self[["id"]] <-
  SpecimenList[["id"]]
self[["unitID"]] <-
  SpecimenList[["unitID"]]
self[["unitGUID"]] <-
  SpecimenList[["unitGUID"]]
self[["collectorsFieldNumber"]] <-
  SpecimenList[["collectorsFieldNumber"]]
self[["assemblageID"]] <-
  SpecimenList[["assemblageID"]]
self[["sourceInstitutionID"]] <-
  SpecimenList[["sourceInstitutionID"]]
self[["sourceID"]] <-
  SpecimenList[["sourceID"]]
self[["previousSourceID"]] <-
  SpecimenList[["previousSourceID"]]
self[["owner"]] <-
  SpecimenList[["owner"]]
self[["licenseType"]] <-
  SpecimenList[["licenseType"]]
self[["license"]] <-
  SpecimenList[["license"]]
self[["recordBasis"]] <-
  SpecimenList[["recordBasis"]]
self[["kindOfUnit"]] <-
  SpecimenList[["kindOfUnit"]]
self[["collectionType"]] <-
  SpecimenList[["collectionType"]]
self[["sex"]] <-
  SpecimenList[["sex"]]
self[["phaseOrStage"]] <-
  SpecimenList[["phaseOrStage"]]
self[["title"]] <-
  SpecimenList[["title"]]
self[["notes"]] <-
  SpecimenList[["notes"]]
self[["preparationType"]] <-
  SpecimenList[["preparationType"]]
self[["previousUnitsText"]] <-
  SpecimenList[["previousUnitsText"]]
self[["numberOfSpecimen"]] <-
  SpecimenList[["numberOfSpecimen"]]
self[["fromCaptivity"]] <-
  SpecimenList[["fromCaptivity"]]
self[["objectPublic"]] <-
  SpecimenList[["objectPublic"]]
self[["multiMediaPublic"]] <-
  SpecimenList[["multiMediaPublic"]]

One reason to prefer DRY code is to avoid making the same change over and over, but if a machine is tasked with making those changes, maybe DRY matters less? (As an aside: I was somewhat unsure in reviewing whether my suggestions relate more to swagger or nbaR!) The structure of the code might make it difficult for users to glean much information by reading the source code, relative to some other packages. From my perspective this could be fine, as long as there is good documentation (for humans) that gives users the information they need.

Documentation

I found the default R help files (e.g., ?ChronoStratigraphy) to be not super helpful in trying to understand how to use the package. In particular, the "Usage" section usually contains the name of the class, so that users might need to read through the fields and methods to understand how to use the class. This could be hard for typical R users, particularly if they are not familiar with OOP in R6, and it also means that the help pages do not illustrate how the classes fit together. So, one option would be to hand-edit these autogenerated "Usage" sections, but that doesn't seem ideal if the API is going to change frequently. The current solution of leaning more heavily on the vignettes to demonstrate how all of these classes fit together makes sense from the perspective of automation, and it might even be worth linking users to relevant vignettes from the help files. If (human-authored) wrapper functions are added to the package, then those help files could have more detailed "Usage" sections.

On to the vignettes! As the package is currently written, these vignettes are a key resource for users. I really appreciated that the vignettes seemed complete in the sense of covering major functionality, and both of the case study vignettes were fun to work through and will be useful for users who are trying to solve those types of problems. That said, there may be some better ways to structure the vignettes. The "Get started" vignette currently is long, which might deter users who are looking for something like a "Quickstart" guide. Would it make sense to break the current vignette into several smaller vignettes on the basis of which *Client class they use? For example, vignettes for SpecimenClient, TaxonClient, MultimediaClient, GeoClient, and MetadataClient, with names that reflect common use cases that users will recognize, e.g., "Querying specimen records", "Searching by taxon", "Finding and accessing multimedia", etc.?

Minor notes

  1. Tests were also failing for me - same one that @DomBennett mentioned:
> library(testthat)
> library(nbaR)
>
> test_check("nbaR")
Downloading: 5.4 kB     ── 1. Failure: getDistinctValuesPerGroup works (@test-specimenClient-misc.R#90)  ────────────────────────────────────────────
res$content has length 3, not length 2.

OK: 1428 SKIPPED: 2 FAILED: 1
1. Failure: getDistinctValuesPerGroup works (@test-specimenClient-misc.R#90)
  1. Typos in the swagger README (https://github.com/naturalis/nbaR/blob/master/other/swagger/README.md)
  • "NAB" should be "NBA"
  • "overwiev"
  • "contatining"
  1. Specimen class docs

Further fields include (among others) information about Finding place, identification, and multimedia content.

  • It seems like "Finding" should not be capitalized (but maybe this is a swagger issue?)
  1. Get started vignette (https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd)

Character length in example code

Some of these code chunks have very long lines (over 80 char). It would be great to have shorter lines so that users don't need to scroll right.

Empty query example*

https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L103-L104

Starting with an empty query might confuse new users (I was confused when I submitted an empty query and got back some results). Would it make more sense to begin with a common query operation?

Pretty printing?

I wonder whether it would be better to have some more human readable representation when a client query result is printed? The current output is:

> # specify two query conditions
> l <- list(identifications.typeStatus="holotype", sex="female")
> # run query
> res <- client$query(queryParams=l)
> res
<Response>
  Public:
    clone: function (deep = FALSE)
    content: QueryResult, R6
    initialize: function (content, response)
    response: response

Users might expect something that would tell how many results were returned by the query, and maybe some information about the first couple of results. A public print method that provides some of this information would be nice. Note that this might not be necessary to add to these classes if there are wrapper functions that users are calling instead.

Advanced queries

It would be great if users could use more idiomatic R code for multiple queries, e.g., something like r 'sex == female & identifications.taxonRank == species' rather than needing to create multiple QueryCondition objects and combine them in a QuerySpec object. I'm not sure how easy this would be to implement, but it could increase usability.

Query services example

https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L232-L246

This example is great - and may be a good substitute for the current first example of an empty query.

find_by_ids

https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L333-L339

This method could be more idiomatic if it took a vector of ids rather than one string of ids separated by a comma.

get_distinct_values

https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L353

There is a typo here: "thy data"

get_distinct_values_per_group and count_distinct_values_per_group

Both of these return complex nested lists, that a typical user might not immediately know how to use. Would it be possible to return a data frame here instead?

DwCA download services

It would be worth suppressing the output of the download to improve readability.

Differentiating among clients

https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L474

It would be great to have some text here to explain how the taxonomic data services and TaxonClient use cases differ from the SpecimenClient use cases described earlier in the vignette. (This would be helpful regardless of whether the "Get started" vignette stays as one big document, or is split up.) Same comment for the geographic data services: https://github.com/naturalis/nbaR/blob/master/vignettes/nbaR.Rmd#L616

  1. Vignette: Calibrating a molecular phylogeny of sharks

Typos

Explaining the size argument for new users

https://github.com/naturalis/nbaR/blob/master/vignettes/sharks.Rmd#L89-L92

The comment here suggests that size is important - it could be nice to add some text explaining why this is necessary in this case, and how users can determine whether it is necessary in their own applications.

Explaining warnings

https://github.com/naturalis/nbaR/blob/master/vignettes/sharks.Rmd#L190

This command raised two warnings:

> times <- geo_age(unique(c(as.character(data$youngChronoName), as.character(data$oldChronoName))))
Warning messages:
1: In FUN(X[[i]], ...) :
  Could not retreive values for geo unit "Mioceen" from earthlifeconsortium.org
2: In FUN(X[[i]], ...) :
  Could not retreive values for geo unit "Unspecified age" from earthlifeconsortium.org

It would be nice to have some text here for users to explain what these warnings are, and whether they matter.

  1. Vignette: The oldest tomato specimen in the world

It looks like there are some undocumented code blocks at the end of this vignette. I was going to suggest adding some content to show how to map specimens. Bringing this code into the vignette with some additional text would be awesome!

Wrap up

Overall, there is a lot of great functionality in nbaR. My main suggestion would be to add wrapper functions that return data frames or other more common R objects, to facilitate the integration of nbaR with existing tools and make the user interface more intuitive. The vignettes are awesome, and with some minor tweaks, might be more accessible for users who are looking for ways to solve their particular use case.

My session info:

─ Session info ────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       Ubuntu 18.04.1 LTS          
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language en_US                       
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Denver              
 date     2018-11-10                  

─ Packages ────────────────────────────────────────────────────────────────────────────────────
 package     * version date       lib source                         
 ape           5.2     2018-09-24 [1] CRAN (R 3.5.1)                 
 assertthat    0.2.0   2017-04-11 [1] CRAN (R 3.5.1)                 
 backports     1.1.2   2017-12-13 [1] CRAN (R 3.5.1)                 
 base64enc     0.1-3   2015-07-28 [1] CRAN (R 3.5.1)                 
 callr         3.0.0   2018-08-24 [1] CRAN (R 3.5.1)                 
 cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)                 
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.5.1)                 
 curl          3.2     2018-03-28 [1] CRAN (R 3.5.1)                 
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.5.1)                 
 devtools      2.0.1   2018-10-26 [1] CRAN (R 3.5.1)                 
 digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)                 
 fs            1.2.6   2018-08-23 [1] CRAN (R 3.5.1)                 
 glue          1.3.0   2018-07-17 [1] CRAN (R 3.5.1)                 
 httr          1.3.1   2017-08-20 [1] CRAN (R 3.5.1)                 
 jsonlite      1.5     2017-06-01 [1] CRAN (R 3.5.1)                 
 lattice       0.20-35 2017-03-25 [4] CRAN (R 3.5.0)                 
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.5.1)                 
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.5.1)                 
 nbaR        * 0.0.0   2018-11-09 [1] Github (naturalis/nbaR@2572a3e)
 nlme          3.1-137 2018-04-07 [4] CRAN (R 3.5.0)                 
 packrat       0.4.9-3 2018-06-01 [1] CRAN (R 3.5.1)                 
 pkgbuild      1.0.2   2018-10-16 [1] CRAN (R 3.5.1)                 
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.5.1)                 
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.5.1)                 
 processx      3.2.0   2018-08-16 [1] CRAN (R 3.5.1)                 
 ps            1.2.1   2018-11-06 [1] CRAN (R 3.5.1)                 
 R6            2.3.0   2018-10-04 [1] CRAN (R 3.5.1)                 
 Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.1)                 
 remotes       2.0.2   2018-10-30 [1] CRAN (R 3.5.1)                 
 rlang         0.3.0.1 2018-10-25 [1] CRAN (R 3.5.1)                 
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.5.1)                 
 rstudioapi    0.8     2018-10-02 [1] CRAN (R 3.5.1)                 
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)                 
 testthat      2.0.1   2018-10-13 [1] CRAN (R 3.5.1)                 
 usethis       1.4.0   2018-08-14 [1] CRAN (R 3.5.1)                 
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.5.1)                 

[1] /home/max/R/x86_64-pc-linux-gnu-library/3.5
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

Thanks @sckott for inviting me to review!

@sckott
Copy link
Contributor

sckott commented Nov 12, 2018

thanks for your review @mbjoseph !

@sckott
Copy link
Contributor

sckott commented Nov 12, 2018

@hettling all reviews are now in, continue discussion here and let me know if you have any questions about the process

@hettling
Copy link
Author

Thanks @DomBennett and @mbjoseph for thorough reviews of the package!

@hettling
Copy link
Author

@sckott I'm starting to tackle the reviewer's comments now, is there a deadline? I couldn't find this information on https://ropensci.github.io/dev_guide/policies.html#review-process ...

@sckott
Copy link
Contributor

sckott commented Nov 13, 2018

I don't think we have a deadline. If it will be a long time, we usually assign a holding tag so we know we don't have to worry about a submission until the person has time again. I'd guess every few weeks if we don't hear anything we ping whoever we're waiting on.

@sckott sckott added the holding label Feb 27, 2019
@sckott
Copy link
Contributor

sckott commented Feb 27, 2019

@hettling Applying the holding tag now until this is active again

@hettling
Copy link
Author

@sckott Ok, reasonable. I would like to mention that there is progress in applying the reviewer's comments, documented in this milestone. Remaining issues are mostly documentation related.

@sckott
Copy link
Contributor

sckott commented Feb 28, 2019

thanks for the update

@hettling
Copy link
Author

hettling commented May 6, 2019

Hi @sckott @DomBennett @mbjoseph,
thank you for the thorough evaluation of the package.
The new version of the nbaR package
aims to address all of these issues. For a better overview, I copied your comments into github issues summarised in the milestone Ropenscience-revision.
This resulted in the following changes:

Wrapper functions
The package now features low-level API access using function wrappers which are
more easy to use. For each endpoint, there is one wrapper which can be called directly without any QuerySpec, QueryCondition etc. objects. For complex structures, the wrappers return native R datatypes, such as data.frame (with list columns) or list (types are configurable by the user) instead of R6 objects such as Specimen or Taxon. Documentation on the wrappers can be found in the reference and further description is provided in the get started vignette.

This is documented in the following issues: #18, #32, #34

Documentation

  • The main vignette has been split into smaller parts. Documentation is now spread over 5 pages: Getting started, Objects and advanced queries, services summary, sharks workflow, tomato workflow (issue #17, issue #36, issue #34).
  • Help files: Have been improved. Extended descriptions have been added for the main model datatypes (Specimen, Taxon, Multimedia, Geo and also for QuerySpec and QueryCondition). These changes have to be made in the code annotations of the API itself, therefore not yet all of the classes are documented in detail. Since I am not the developer of the API, this has to be communicated and done by someone else; we are working on completing more descriptions. However, many classes will never be used by the nbaR user: For example, almost no records have a LithoStratigraphy field, thus the user will in most cases not come across this object.(issue #24).
  • More information on operators is now given in the vignette (issue #28).
  • Typos: issue #46, issue #35.
  • Different clients are now better explained: (issue #45).
  • Pkdown issues have been resolved. In addition, client classes and model classes are now better organised in the reference page (issue #23).
  • Character length in vignette code has been reduced to max 80 chrs per line (issue #37).
  • Empty query example at start of vignette has been removed (issue #38).
  • Size argument now better explained in shark vignette (issue #47).

Tests/Checks/Coverage

  • Unit tests failed if the NBA was down, thus we now do a ping in the unit tests to test beforehand if the NBA is alive (issue #19.
  • Check results: R CMD CHECK now proceeds without errors, warnings or notes (issue #20).
  • Spell-check was done (issue #22).
  • Goodpractice results were improved (issue #21). However, it was not possible to cut all lines of code to 80 characters, due to autogenerated code.

Miscellaneuos

  • dwca download functions now suppress excessive output (issue #44).
  • Much of the duplicated code is now avoided by having rewritten the fromJSONString and fromList methods in the templates (issue #33).
  • Classes now have a print method, which lists all members (and their datatypes) and available functions in a nice way (issue #39).
  • A QueryCondition object can now also be created with a named list (issue #29).
  • Warnings for function geo_age are now more clear (issue #48).
  • Function find_by_ids can now also take a vector of ids as argument (issue #41).
  • Map of tomatoes now added to tomato vignette (issue #27).
  • Reduced R package size etc., (issue #31).

What I did not do

  • Schematic code example for clients: Since we have now a Getting Started using the wrapper functions and instanciating clients is in the advanced parts, I believe this is not necessary anymore (issue #26).
  • Returning data.frame for get_distinct_values_per_group and count_distinct_values_per_group. This is now done in the wrappers.
  • More ideomatic code for specifying query parameters: I think the way to pass query parameters as a list is sufficient, see also issue #40.

I hope you agree with me that these changes significantly improved the usage of this package. Thanks again for taking the time to review.

Best,

Hannes

@sckott
Copy link
Contributor

sckott commented May 6, 2019

thanks very much for your response @hettling

@DomBennett @mbjoseph are you happy with the changes? any further questions/suggestions?

@sckott
Copy link
Contributor

sckott commented May 14, 2019

@DomBennett @mbjoseph any thoughts? if no response by the end of the week i'll assume you're happy with the changes

@mbjoseph
Copy link
Member

@hettling this looks great! I like the reorganization of the documentation, and the wrapper functions provide a nice solution for common use cases, without sacrificing access to the functionality of the API. I approve. 👍

@sckott
Copy link
Contributor

sckott commented May 14, 2019

thanks @mbjoseph

@DomBennett
Copy link

Hi @hettling

Looks good Hannes! That's an amazingly thorough response to every point raised in the reviews.

I re-ran goodpractice and got "NOTE: installed size is 5.9Mb sub-directories of 1Mb or more: R 2.0Mb doc 3.3Mb". This confuses me a little as the total repo size is less than 3.9 MB. Are the vigenettes particularly large upon installation? This will own be a real problem for you if you try and put the package on CRAN but it does also slow down the installation time. One way I've cut package sizes down in the past is to exclude vignettes from the build.

@DomBennett
Copy link

Also... I just noticed you have this kind of code at the beginning of a couple of your tests:

wd <- getwd()
if (grepl("testthat", wd)) {
  data_dir <- file.path("data")
} else {
  ## for running test at package level
  data_dir <- file.path("tests", "testthat", "data")
}

tc <- TaxonClient$new(basePath = "http://api.biodiversitydata.nl/v2")
if (!tc$ping()) {
  skip("NBA not available, skipping test")
}

I recently learned about testthat's setup and teardown functionalities. For shared variables like data_dir you could add a setup-vars.R script in your testthat/ with the above code. This script would then be run at the beginning of each test script -- avoiding duplicate code.

@sckott
Copy link
Contributor

sckott commented May 15, 2019

thanks for your feedback @DomBennett

also for package size, if you have any datasets in thepkg, you can resave datasets with different compression options to see what is the smallest.

hettling added a commit to ropensci-archive/nbaR that referenced this issue May 16, 2019
@hettling
Copy link
Author

@mbjoseph Thanks!

@hettling
Copy link
Author

Hi @DomBennett ,

Strangely I do not get the note from goodpractice about package size. I run it as follows:

gp = goodpractice::gp()

and then if I look for the checks that contain 'size':

goodpractice::results(gp)[c(164, 165, 211),]

  check result
164            rcmdcheck_pdf_file_sizes   TRUE
165         rcmdcheck_pdf_file_sizes_gs   TRUE
211 rcmdcheck_reasonable_installed_size   TRUE`

If I check the size of the package in my library location, I get 4.0K. During the revisions I excluded the shark and tomato examples from the vignettes, so now there are only three relatively small vignettes, without any figures. Am I missing something in calling goodpractice (maybe vignettes are not built etc.)?

@sckott
Copy link
Contributor

sckott commented May 20, 2019

any comment @DomBennett ?

@DomBennett
Copy link

@hettling @sckott

Sorry slow reply -- all looks good!

@sckott
Copy link
Contributor

sckott commented May 21, 2019

thanks @DomBennett

@sckott
Copy link
Contributor

sckott commented May 21, 2019

Approved! Thanks again for your submission @hettling ! And thanks for your reviews @DomBennett and @mbjoseph 👌

A few comments:

  • i think you want to add paper.md and tic.R to your .Rbuildignore file
  • also maybe add vignettes/sharks.Rmd and vignettes/tomato.Rmd to your .Rbuildignore file as r cmd check complains about those files missing a vignette engine, which CRAN checks would also catch I think

To-dos:

  • Please transfer the package to ropensci- I've invited you to a team (you need to accept that invitation first, before transferring). You'll be made admin once you transfer
  • add rOpenSci footer to README
    [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)
  • add ropensci review badge to your readme [![](https://badges.ropensci.org/257_status.svg)](https://github.com/ropensci/onboarding/issues/257)
  • Change any needed links, such those for CI badges
  • Travis CI should update to the new location automatically - you may have to update other CI systems manually
  • If you use Appveyor, manage that under your own account
  • it's a good idea to add issue and PR template files to your .github folder, egs here https://github.com/ropensci/dotgithubfiles/ - and can be added with https://ropenscilabs.github.io/rodev/reference/use_ro_github.html

We've started putting together a bookdown with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved. The repo is at https://github.com/ropensci/dev_guide

Are you interested in doing a blog post for our blog https://ropensci.org/blog/ ? either a short-form intro to it (https://ropensci.org/technotes/) or long-form post with more narrative about its development (https://ropensci.org/blog/). If so, we'll have our community manager @stefaniebutland get in touch with you on that

@hettling
Copy link
Author

Thanks @sckott, that's great news!
I'm working on the Todo's now, and transferred the ownership, waiting to get access.

@sckott
Copy link
Contributor

sckott commented May 22, 2019

you should have admin now. do you walso want Rutger and Maarten to have admin access? or just write?

@hettling
Copy link
Author

Thanks! Admin rights would be good for Rutger and Maarten.

@sckott
Copy link
Contributor

sckott commented May 26, 2019

they've been added

@sckott sckott closed this as completed Jun 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants