New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdpla package #71

Closed
sckott opened this Issue Aug 16, 2016 · 23 comments

Comments

Projects
None yet
4 participants
@sckott
Copy link
Member

sckott commented Aug 16, 2016

Summary

  • What does this package do? (explain in 50 words or less):

Client for the Digital Public Library of America (DPLA), which holds metadata from lots of museums, focusing on the US.

  • Paste the full DESCRIPTION file inside a code block below:
Package: rdpla
Type: Package
Title: Client for the Digital Public Library of America ('DPLA')
Description: Interact with the Digital Public Library of America
    ('DPLA') 'REST' 'API' from R, including search and more.
Version: 0.0.7.9300
Authors@R: c(person("Scott", "Chamberlain", role = c("aut", "cre"),
    email = "myrmecocystus@gmail.com"))
License: MIT + file LICENSE
URL: https://github.com/ropensci/rdpla
BugReports: https://github.com/ropensci/rdpla/issues
LazyData: yes
VignetteBuilder: knitr
Imports:
    httr (>= 1.2.0),
    jsonlite (>= 1.0),
    tibble,
    data.table
Suggests:
    roxygen2 (>= 5.0.1),
    knitr,
    rmarkdown,
    testthat,
    ggplot2,
    scales
RoxygenNote: 5.0.1
  • URL for the package (the development repository, not a stylized html page):

https://github.com/ropensci/rdpla

  • Who is the target audience?

Academics that can use the data DPLA has for their research on a variety of topics (e..g, a certain type of photograph through time, or works by a certain artist, or a spatial analysis of works), or collection managers that submit metadata to DPLA so they can analyze data on their own collections.

  • Are there other R packages that accomplish the same thing? If so, what is different about yours?

No

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration with Travis CI and/or another service.

Publication options

  • Do you intend for this package to go on CRAN? YES
  • Do you wish to automatically submit to the Journal of Open Source Software? If so: no
    • The package contains a paper.md with a high-level description.
    • The package is deposited in a long-term repository with the DOI:

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

Note that Travis build is failing right now, but b/c of some key server error - not fault of the pkg pretty sure, so should be working again soon. Passes check fine locally.

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
  • If this is a resubmission following rejection, please explain the change in circumstances:
  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Aug 26, 2016

Editor checks:

  • Fit: The package meets or criteria fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Will be seeking reviewers, but in the meantime:

There was a typo in the README in the appropriate API key variable name that precluded local testing. I sent a PR: ropensci/rdpla#19

Automated review tools gave these results (I recommend both authors and reviewers examine these more closely):

goodpractice::gp():

  ✖ write unit tests for all functions, and all package code in general. At
    least some lines are not covered in this package.

    R/collections.r:34:NA
    R/collections.r:35:NA
    R/collections.r:36:NA
    R/collections.r:37:NA
    R/collections.r:38:NA
    ... and 195 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer
    editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    inst/ignore/dpla_items_old.r:76:1
    inst/ignore/dpla_items_old.r:76:1
    inst/ignore/dpla_items_old.r:76:1
    inst/ignore/dpla_items_old.r:76:1
    inst/ignore/dpla_items_old.r:91:1
    ... and 15 more lines

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list,
    depending on the input data. Consider using vapply() instead.

    inst/ignore/dpla_items_old.r:87:21
    inst/ignore/dpla_plot.r:45:29
    R/collections.r:46:12
    R/items.r:220:11
    R/items.r:326:11

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...)
    expressions. They are error prone and result 1:0 if the expression on the
    right hand side is zero. Use seq_len() or seq_along() instead.

    inst/ignore/dpla_plot.r:27:21

While test are running locally, covr::package_coverage() yields 0% test coverage. I suggest you look into why.


Reviewers: @jsonbecker @fmichonneau
Due date: 2016-09-19

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Sep 14, 2016

Hello @jsonbecker and @fmichonneau, just a friendly reminder that reviews are due next Monday, the 2016-09-19.

@jsonbecker

This comment has been minimized.

Copy link

jsonbecker commented Sep 15, 2016

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally

I think it would be helpful here to describe a bit more about the API. For example, the main way the data are accessed appears to be through dpla_items that are items in dpla_collections. I think discussing this up front, their contents, and how they interact would provide useful information to new users in addition to the solid examples provided.

  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

No Maintainer in DESCRIPTION.

Installation:

  • Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
    • Using the example for dpla_items, the following warning is issued:
Warning message:
In if (class(output2) %in% "factor") { :
  the condition has length > 1 and only the first element will be used

I think this is due to the code here. This check is being done to output2 which has more than one class due to the use of dplyr and tibble, e.g. class(output2) results in "tbl_df" "tbl" "data.frame". To reproduce, use dpla_items(q="fruit", fields=c("id","publisher","format"), ...).

In order to address this, you can do if (any(class(output2) %in% 'factor').

  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.

I do think there can be some improvement here. Not all cases are covered for an API with a lot of options. One thing that could be tested without a skip_on_cran is that the API requests are properly formed without actually submitting them. This way you can try one call with all parameters and see that the GET request is properly formatted as expected. I don't have particularly great advise on how to run this test authentically from the dpla_items or dpla_collections calls, but I am relatively certain that I've seen this style of test in other packages. Maybe there's a return condition that can be applied only if running a test.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Sep 16, 2016

Thanks @jsonbecker!

@fmichonneau

This comment has been minimized.

Copy link
Contributor

fmichonneau commented Sep 18, 2016

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

General comments

This is a very nice and concise package that does what it says it does. Not
knowing the existence of this resource before reviewing this package (and
probably also because it's not my field of expertise), I wasn't sure how this
package could be used until I saw the use cases described in #3. Maybe
implementing the church visualization example would have helped me earlier to
understand how the package could be used.

Tests and checks pass on my system. goodpractice detected the same things as
Noam noted.

> session_info("rdpla")
Session info ------------------------------------------------------------------
 setting  value
 version  R version 3.3.1 (2016-06-21)
 system   x86_64, linux-gnu
 ui       X11
 language en_US:en
 collate  en_US.UTF-8
 tz       posixrules
 date     2016-09-18

Packages ----------------------------------------------------------------------
 package    * version    date       source
 assertthat   0.1        2013-12-06 CRAN (R 3.3.0)
 chron        2.3-47     2015-06-24 CRAN (R 3.3.0)
 curl         1.2        2016-08-13 CRAN (R 3.3.1)
 data.table   1.9.6      2015-09-19 CRAN (R 3.3.0)
 httr         1.2.1      2016-07-03 CRAN (R 3.3.1)
 jsonlite     1.1        2016-09-14 CRAN (R 3.3.1)
 lazyeval     0.2.0      2016-06-12 CRAN (R 3.3.0)
 mime         0.5        2016-07-07 CRAN (R 3.3.1)
 openssl      0.9.4      2016-05-25 CRAN (R 3.3.0)
 R6           2.1.3      2016-08-19 CRAN (R 3.3.1)
 Rcpp         0.12.7     2016-09-05 CRAN (R 3.3.1)
 rdpla      * 0.0.7.9500 2016-09-18 local (fmichonneau/rdpla@NA)
 tibble       1.2        2016-08-26 CRAN (R 3.3.1)

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed
    to solve and its target audience in README

    Not knowing about this resource, I found that the README (and vignette) could
    include a little more information about the kind of information and documents
    the library holds, and information about what constitute a "collection" or an
    "item".

  • Installation instructions: for the development version of package and
    any non-standard dependencies in README

    No problem with the installation of the package, and getting the API
    key. If the vignette includes the installation instructions, it might be worth
    also adding how to obtain an API key there.

  • Vignette(s) demonstrating major functionality that runs successfully
    locally

    The vignette illustrates many of the functionalities of the package. Several
    examples were producing warnings. I think that the pull request ropensci/rdpla#20 should fix
    them. Unless I missed something, it might also be worth noting that the
    examples and the vignette can only run locally if the API key is included in
    the .Rprofile. Checks on Travis-CI seem to currently be failing because of
    that (at least for the PR I submitted).

  • Function Documentation: for all exported functions in R help

    There are help files for the 3 exported functions.

    • For dpla_items(), under the Value header, only the contents of meta
      and data are listed, and not facets.
    • For dpla_collections(), in the list of arguments, for the parameter
      sort_by, a sentence is incomplete ("is all fields."), and indicates to see
      the "Details" section that is not present.
    • For get_key(), you may want to remove your email address (if it's a real
      one) to avoid getting a notification every time someone run
      `example(get_key)

    There are also help pages for country_codes and language_codes but neither
    exist as functions (i.e., defined as NULL in rdpla-package.R file).

  • Examples for all exported functions in R Help that run successfully locally

  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and URL, Maintainer and BugReports fields in DESCRIPTION

Functionality

  • Installation: Installation succeeds as documented.

  • Functionality: Any functional claims of the software been confirmed.

  • Performance: Any performance claims of the software been confirmed.

  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.

  • Packaging guidelines: The package conforms to the rOpenSci packaging
    guidelines

    You may want to consider adding a NEWS file and integration for code coverage with covr.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2.5 hours (+ 1.5 post-review)

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Sep 18, 2016

Thanks for the review, @fmichonneau!

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Sep 20, 2016

thanks for the reviews @jsonbecker and @fmichonneau will get to this asap

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Sep 29, 2016

response to @jsonbecker review

I think it would be helpful here to describe a bit more about the API. For example, the main way the data are accessed appears to be through dpla_items that are items in dpla_collections. I think discussing this up front, their contents, and how they interact would provide useful information to new users in addition to the solid examples provided.

Added more discussion of items vs. collections in the readme and vignette.

No Maintainer in DESCRIPTION.

The Authors@R syntax is in place of that.

Warning message in if (class(output2) %in% "factor") { : ...

this was fixed by a PR ropensci/rdpla#20 - thanks @fmichonneau !

Tests: I do think there can be some improvement here. Not all cases are covered for an API with a lot of options. One thing that could be tested without a skip_on_cran is that the API requests are properly formed without actually submitting them. This way you can try one call with all parameters and see that the GET request is properly formatted as expected. I don't have particularly great advise on how to run this test authentically from the dpla_items or dpla_collections calls, but I am relatively certain that I've seen this style of test in other packages. Maybe there's a return condition that can be applied only if running a test.

I've re-worked the two functions to separate them into 2 parts for each of the two main exported functions:

  • an unexported internal fxn that does query preparation and http request
  • the exported fxn does further parsing and cleanup

And added tests for the internal fxns. Which means that HTTP requests are being tested twice, but only the exported fxn tests additionally test the parsing to data.frame, etc.


response to @fmichonneau review:

I wasn't sure how this package could be used until I saw the use cases described in #3

good point, i'll try to implement that as a vignette

Not knowing about this resource, I found that the README (and vignette) could
include a little more information about the kind of information and documents
the library holds, and information about what constitute a "collection" or an
"item".

Thanks. I added more info on what DPLA has, and diff. btw collections and items

If the vignette includes the installation instructions, it might be worth
also adding how to obtain an API key there.

done, thanks!

it might also be worth noting that the examples and the vignette can only run locally if the API key is included in the .Rprofile.

added note to readme, thanks

For dpla_items(), under the Value header, only the contents of meta and data are listed, and not facets.

fixed, thanks!

For dpla_collections(), in the list of arguments, for the parameter sort_by, a sentence is incomplete ("is all fields."), and indicates to see the "Details" section that is not present.

fixed, thanks!

For get_key(), you may want to remove your email address (if it's a real one) to avoid getting a notification every time someone run `example(get_key)

done, thanks!

There are also help pages for country_codes and language_codes but neither
exist as functions (i.e., defined as NULL in rdpla-package.R file).

country_codes and language_codes are datasets (another is dpla_fields), so the documentation has NULL at bottom. I thought this was standard for documenting datasets. Is something else better?

You may want to consider adding a NEWS file and integration for code coverage with covr.

thanks, adding NEWS.md file. covr added

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Sep 29, 2016

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Sep 30, 2016

Thanks @sckott! @jsonbecker and @fmichonneau, please review changes and let us know if they address your comments. If they do, let us know and check off any remaining boxes in your review. Let us know of any outstanding issues.

@jsonbecker

This comment has been minimized.

Copy link

jsonbecker commented Oct 2, 2016

These improvements look good to me. 👍

@fmichonneau

This comment has been minimized.

Copy link
Contributor

fmichonneau commented Oct 7, 2016

Thanks @sckott! It all looks good to me. I'd just suggest you add a little bit of documentation for the datasets. I think it's best when there is at least some info about them. You can check the documentation for iris or diamonds for some ideas. There are some details in Hadley's book: http://r-pkgs.had.co.nz/data.html

You may also want to update the README to reflect that the vignette demonstrating the use case it available.

Nice work!

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 7, 2016

thanks @jsonbecker and @fmichonneau for your reviews! noted about documenting the datasets and the vignette, will do

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 7, 2016

okay, updated docs for datasets and mentioned the vignettes., will link to them once it's on cran

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 10, 2016

anything else I should do @noamross ?

@fmichonneau

This comment has been minimized.

Copy link
Contributor

fmichonneau commented Oct 10, 2016

everything looks good to me

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Oct 10, 2016

Thanks, all. @sckott, two small things from a last check:

  • From goodpractice::gp() I still see several in-function sapply() calls, and a 1:nrow()
  • Nice new vignette format! Question - will these be suitable for CRAN, or will we also need html_vignette()-formatted versions?
@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 11, 2016

  • will fix
  • I'm pretty sure that vignette format is fine, just that it's a bit larger file size than the vignette specific format

sckott added a commit to ropensci/rdpla that referenced this issue Oct 11, 2016

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 11, 2016

@noamross the sapply usage has been changed. the 1:nrow is in an old file in inst/ignore, so is not used in the package.

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Oct 13, 2016

Looks good, approved!

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 13, 2016

sweet, thanks @jsonbecker and @fmichonneau for reviews, and @noamross for handling.

@sckott

This comment has been minimized.

Copy link
Member Author

sckott commented Oct 24, 2016

@noamross should we close now?

@noamross

This comment has been minimized.

Copy link
Collaborator

noamross commented Oct 25, 2016

👍

@noamross noamross closed this Oct 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment