/ software-review Public

DataPackageR submission#230

Closed
opened this issue Jun 18, 2018 · 31 comments
Closed

DataPackageR submission #230

opened this issue Jun 18, 2018 · 31 comments
Assignees
Labels

Summary

• What does this package do? (explain in 50 words or less):

A package to reproducibly process raw data into packaged, analysis-ready data sets.

• Paste the full DESCRIPTION file inside a code block below:
Package: DataPackageR
Type: Package
Title: Construct Reproducible Analytic Data Sets as R Packages
Authors@R:
c(person(given = "Greg Finak", role=c("aut","cre","cph"), email="gfinak@fredhutch.org"),
person(given = "Paul Obrecht", role=c("ctb")))
Version: 0.13.2
Date: 2017-10-13
Description: Construct reproducible analytic data sets as R packages.
Depends: R (>= 3.5.0)
Imports:
optparse,
digest,
knitr,
utils,
rmarkdown,
desc,
yaml,
purrr,
here,
roxygen2 (>= 6.0.1),
devtools (>= 1.12.0),
assertthat,
stringr,
futile.logger,
rprojroot,
data.tree,
DT
VignetteBuilder: knitr
RoxygenNote: 6.0.1
Collate: autodoc.R
build.R
dataversion.r
digests.R
processData.R
skeleton.R
devtool_functions.R
rmarkdown_functions.R
roxygen2_functions.R
mergeDocumentation.R
parseDocumentation.R
yamlR.R
Suggests:
testthat,
covr

• URL for the package (the development repository, not a stylized html page):

https://github.com/RGLab/DataPackageR

• Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
[e.g., "data extraction, because the package parses a scientific data file format"]

reproducibility, because the package provides a framework for reproducibly processing raw data into analysis-ready data sets in R data packages.

•   Who is the target audience and what are scientific applications of this package?

The target audience are data analysts, data scientists and any users working with diverse, large, raw data sets that need significant preprocessing to transform them into analysis-ready data sets. This processing may be time consuming and the raw data too large to include in a package. DataPackageR simplifies the process of ensuring that this data processing is done reproducibly by ensuring vignettes are constructed that track how data is processed, ensure data set objects are documented, verifies checksums of individual objects and bumps data sets versions automatically, and decouples the data transformation from the usual build and installation process. The latter is particularly useful when raw data cannot be shared with the package or if processing such data is too time consuming to be re-run each time the package is build and installed using the usual R CMD BUILD process. The tool is useful for preparing analysis-ready data for publication with manuscripts, or sharing it for collaborative data analysis.

The drake and workflowr packages are similar, in that they allow one to build reproducible workflows. DataPackageR is different in that its aim is to provide tool to help users implement the ideas found in ropensci/rrrpkg and cboettig/template and elsewhere, using their existing code with minimal effort. That code may leverage tools like workflowr and drake, but does not have to. DataPackageR provides the infrastructure to automate building, and documentation, and tracking data provenance via automated construction of vignettes documenting the transformation of raw data sets to R data objects ready for analysis, and packaging those into R data packages that can be shared.

•   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

• has a CRAN and OSI accepted license.
• contains a README with instructions for installing the development version.
• includes documentation with examples for all functions.
• contains a vignette with examples of its essential functions and uses.
• has a test suite.
• has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
• I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

• Do you intend for this package to go on CRAN?
• Do you wish to automatically submit to the Journal of Open Source Software? If so:
• The package has an obvious research application according to JOSS's definition.
• The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
• The package is deposited in a long-term repository with the DOI:
• (Do not submit your package separately to JOSS)
• Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
• The package is novel and will be of interest to the broad readership of the journal.
• The manuscript describing the package is no longer than 3000 words.
• You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
• (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
• (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
• (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

• Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

• Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

The package name uses camel case as it has been around for several years, used internally by our research group.

• If this is a resubmission following rejection, please explain the change in circumstances:

• If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Suggested reviewers

Jenny Bryan (jennybc)
Carl Boettiger (cboettig)

maelle commented Jun 25, 2018 • edited

Thanks for your submission @gfinak! 😺 Below are the results of my editorial checks and comments. Please tackle the changes, I'll then look for and assign reviewers. 🙏

Editor checks:

• Fit: The package meets criteria for fit and overlap
• Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
• Repository: The repository link resolves correctly
• Archive (JOSS only, may be post-review): The repository DOI resolves correctly
• Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

• Please put codecov and Travis config files in .gitignore. Use e.g. usethis::use_build_ignore(c(".travis.yml", "codecov.yml")) (in the future when developing a package you can actually use usethis::use_travis() and usethis::use_coverage() to help with such setup steps).

• Edit the top of the README.md that's currently weirdly formatted (remains of a Rmd header I think).

• Given that your package interacts with the file system, please add CI for Windows too via Appveyor, cf this chapter in our packaging guide. There's an usethis::use_appveyor() function.

• Please delete the Maintainer field in DESCRIPTION, it'll get populated automatically from Authors@R when building the package.

• Why having the collate field in DESCRIPTION?

• Please add a code of conduct (e.g. via usethis::usethis::use_code_of_conduct()) and contributing guidelines to your repo. We're working on a template for that, cf this chapter.

• goodpractice output

It is good practice to

✖ omit "Date" in DESCRIPTION. It is not
required and it gets invalid quite often. A build
date will be added to the package when you perform
R CMD build on it.
  ✖ add a "URL" field to DESCRIPTION. It
online. If your package does not have a homepage,
add an URL to GitHub, or the CRAN package package
page.
✖ add a "BugReports" field to DESCRIPTION,
and point it to a bug tracker. Many online code
https://github.com, https://gitlab.com, etc.

Just run usethis::use_github_links() 😸

  ✖ use '<-' for assignment instead of '='.
'<-' is the standard, and R users and developers
are used it and it is easier to read your code for
them if you use '<-'.

R\processData.R:241:31
R\processData.R:242:48
tests\testthat\test-skeleton.R:274:10
tests\testthat\test-skeleton.R:275:10
tests\testthat\test-skeleton.R:288:10

For that I recommend using styler.

  ✖ avoid long code lines, it is bad for
readability. Also, many people prefer editor
windows that are about 80 characters wide. Try
make your lines shorter than 80 characters

R\processData.R:64:1
R\processData.R:142:1
R\processData.R:145:1
R\processData.R:236:1
R\processData.R:242:1
... and 10 more lines


I think styler will help with that too.

  ✖ avoid calling setwd(), it changes the
global environment. If you need it, consider using
on.exit() to restore the working directory.

R\processData.R:53:20
R\processData.R:62:9
R\processData.R:77:26
R\processData.R:88:26
R\processData.R:97:26
... and 7 more lines

Well unless it's needed for the specific context of your package's use case!

  ✖ avoid sapply(), it is not type safe. It
might return a vector, or a list, depending on the
input data. Consider using vapply() instead.

R\digests.R:10:13
R\digests.R:11:13
R\processData.R:317:17
R\processData.R:330:13

✖ avoid 1:length(...), 1:nrow(...),
1:ncol(...), 1:NROW(...) and 1:NCOL(...)
expressions. They are error prone and result 1:0
if the expression on the right hand side is zero.

R\parseDocumentation.R:16:20
  ✖ not import packages as a whole, as this
can cause name clashes between the imported
packages. Instead, import only the specific
functions you need.

Ask me any question you might have!

Reviewers: @karawoo @wlandau
Due date: 2018-07-23

gfinak commented Jun 26, 2018

 @maelle , thanks for the package review. I believe I've addressed all the issues with the latest commits. Best, Greg

maelle commented Jun 27, 2018

 Thanks @gfinak! I see much progress indeed! Here are a few more points from goodpractice and me Please replace the use of devtools unexported functions by the use of desc. In general, there's a devtools diaspora going on, so check the different small packages of the r-lib organization to see if some of the ones that are on CRAN contain functions you're using now. Maybe usethis has some of the functions you need for other things? Regarding imports it'd be better to do as stated in Hadley Wickham's book " In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). ". It's a bit a pain when not doing it right away so you'll have a bit of work but it makes things clearer for reviewers (who'd see right away where a function comes from), and even you later. One case could be your wanting to replace a dependency, it'd be much easier to just search for all "dependency::" occurrences. It is good practice to ✖ write unit tests for all functions, and all package code in general. 78% of code lines are covered by test cases. R/build.R:27:NA R/build.R:29:NA R/build.R:30:NA R/build.R:32:NA R/build.R:33:NA ... and 250 more lines Please increase the coverage a bit by testing Your error messages (and their logging?) e.g. I see such potential in https://codecov.io/gh/RGLab/DataPackageR/src/5acdc5650b4a5813b9fbc7e338556f4a7f86c18c/R/build.R The different options, e.g. the behaviour when an argument is null.  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than 80 characters R\processData.R:56:1 R\processData.R:85:1 R\processData.R:147:1 R\processData.R:428:1 R\yamlR.R:180:1 ... and 10 more lines  Please shorten the lines unless they contain URLs or so. In which case add lintr exceptions using # nolint cf https://github.com/jimhester/lintr#project-configuration  ✖ avoid calling setwd(), it changes the global environment. If you need it, consider using on.exit() to restore the working directory. R\processData.R:44:11 R\processData.R:82:13 R\processData.R:83:5 tests\testthat\test-skeleton.R:38:11 tests\testthat\test-skeleton.R:39:3  Can you confirm they're needed?  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead. R\load_save.R:27:3  Please replace the use of sapply.  ✖ fix this R CMD check NOTE: Namespace in Imports field not imported from: 'optparse' All declared Imports should be used.  Unnecessary dependency?  ✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd problems. LaTeX errors found: ! LaTeX Error: File inconsolata.sty' not found. Type X to quit or to proceed, or enter new name. (Default extension: sty) ! Emergency stop. l.276 ^^M ! ==> Fatal error occurred, no output PDF file produced! Not sure what that is, can you please investigate?

gfinak commented Jun 29, 2018

 @maelle I've gone through the remaining issues. goodpractice passes: there are a few setwd() used in my tests to cover some test cases, but they are no longer in the package code. I've addressed the NAMESPACE imports, and use desc and usethis over devtools unexported functions. Test coverage is now at 100%. I cannot reproduce the R CMD check WARNING about the inconsolata.sty LaTeX file when I build on my system or any of the CI builds. Could it be a local LaTeX installation issue?

maelle commented Jun 29, 2018

 yes it might be since I use tinytex? Will start looking for reviewers now. Thanks for all your efforts!

maelle commented Jun 30, 2018

 Reviewers now assigned! Thanks @karawoo and @wlandau for accepting to review this package. 😺 Your reviews are due on 2018-17-23. Our brand-new and WIP gitbook includes the reviewer's guide as a chapter that you'll find here. 📖

mentioned this issue Jul 1, 2018

Package Review

• As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

• A statement of need clearly stating problems the software is designed to solve and its target audience in README
• Installation instructions: for the development version of package and any non-standard dependencies in README
• Vignette(s) demonstrating major functionality that runs successfully locally
• Function Documentation: for all exported functions in R help
• Examples for all exported functions in R Help that run successfully locally
• Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

• A short summary describing the high-level functionality of the software
• Authors: A list of authors with their affiliations
• A statement of need clearly stating problems the software is designed to solve and its target audience.
• References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

• Installation: Installation succeeds as documented.
• Functionality: Any functional claims of the software been confirmed.
• Performance: Any performance claims of the software been confirmed.
• Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
• Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

• The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4

DataPackageR addresses the unresolved need for tools that properly package and version datasets. I believe it is a worthy contribution to reproducibility and collaboration. This is how I understand the appeal:

1. Data packages have a mixed reputation because the practice does not tend to suit very large datasets. But now, we have a method to
1. Only include the condensed/processed data in the built package, and
2. Maintain a reproducible record of the data processing in the package source.
2. DataPackageR has an automated mechanism to increment the version of the package when the fingerprint of any of the datasets changes.
3. Setting up a data package is just smoother and easier. I think the role of DataPackageR is a bit like that of devtools in this space.

Relative to the potential impact, I believe the documentation is a bit modest and detail-oriented. It would be great to take a step back, explain the general concept of a data package, and sell DataPackageR on the above points. My other feedback is minor by comparison. Issues 21 through 32 detail the changes I would like to see.

• : Discuss datastorr, fst, and similar work in data packaging.
• : Consider a new assert_data_version() function.
• : Use a separate NEWS.md file.
• : Consider focusing the README on a high-level introduction, with the details left to the vignette.
• : Consider using the vignette to dive deeper into usage.
• : Include the here package in the NAMESPACE file (via roxygen2 of course).
• : Fix directory conflicts so that devtools::run_examples("DataPackageR", run = FALSE) runs smoothly.
• : In your example Rmd file that produces cars_over_20, consider explaining how DataPackageR detects and saves cars_over_20.
• : Synchronize the package help file with the README and vignette (e.g. datasets.R).
• : Make sure all exported functions have examples in the help files.
• : Consider standardizing the style of function names.
• : Consider a separate vignette on the details of working with YAML configuration files.

mentioned this issue Jul 2, 2018

gfinak commented Jul 5, 2018

 Dear @wlandau I believe the latest changes to DataPackageR (ending at commit ropensci/DataPackageR@ee6ee6c4e54) address all the issues you have raised. I have left them open, will let you verify that you are satisfied and can close them. Thanks again for your feedback, it's been extremely productive.

wlandau commented Jul 10, 2018

 @gfinak I am glad this process helped. That has certainly been my experience with rOpenSci reviews. You have indeed addressed all the issues referenced in my review, including my later follow-up requests in the various threads. Pending approval from @karawoo and @maelle, I recommend the acceptance of DataPackageR into rOpenSci.

maelle commented Jul 16, 2018

 @karawoo 👋 friendly reminder that your review is due on 2018-07-23 😺

Package Review

• As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

• A statement of need clearly stating problems the software is designed to solve and its target audience in README
• Installation instructions: for the development version of package and any non-standard dependencies in README
• Vignette(s) demonstrating major functionality that runs successfully locally
• Function Documentation: for all exported functions in R help
• Examples for all exported functions in R Help that run successfully locally
• Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

##### For packages co-submitting to JOSS
~~
- [ ] The package has an obvious research application according to JOSS's definition
~~
The package contains a paper.md matching JOSS's requirements with:
~~
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

• Installation: Installation succeeds as documented.
• Functionality: Any functional claims of the software been confirmed.
• Performance: Any performance claims of the software been confirmed.
• Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
• Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

• The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5.5

DataPackageR generates R package templates for processed datasets, and includes data processing code as vignettes to support reproducibility.

Packaging raw data, processing code, and tidied results into an R package is one way to provide a reproducible record of data processing. DataPackageR is also designed to provide a convenient method of distributing smaller processed data when the original files are too large. Distributing processed data without the original runs somewhat counter to the goal of reproducibility; from that perspective I'd be hesitant to use DataPackageR for larger datasets unless they were publicly accessible outside of the package.

I have unfortunately not been able to test many of the finer details of this package, because I've been unable to get some of the fundamental elements working locally (see more details in the Package functionality section below). I have not yet been able to generate a complete data package with vignettes the way the DataPackageR documentation describes, so I feel some significant work is needed here to make the package more robust.

In addition to resolving some of the issues described below, one of the biggest things I would like to see DataPackageR do is give guidance to the user on what they need to do next after using DataPackageR to set up their package. DataPackageR creates a lot of placeholder text in many different locations within the package, and for R users who have never created a package before (part of DataPackageR's target audience, according to the README) there is a lack of information about what steps come next.

Vignettes

I can't build vignettes for DataPackageR locally:

devtools::build_vignettes("~/projects/forks/DataPackageR/")
#> Building DataPackageR vignettes
#> * checking for file ‘/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2zWYal/mtcars20/DESCRIPTION’ ... OK
#> * preparing ‘mtcars20’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * looking to see if a ‘data/datalist’ file should be added
#> * building ‘mtcars20_1.0.tar.gz’
#>
#> * installing *source* package ‘mtcars20’ ...
#> ** R
#> ** data
#> ** inst
#> ** help
#> *** installing help indices
#> ** building package indices
#> ** installing vignettes
#> ‘subsetCars.Rmd’ using ‘UTF-8’
#> ** testing if installed package can be loaded
#> * DONE (mtcars20)
#> Quitting from lines 205-215 (usingDataPackageR.Rmd)
#> Error: processing vignette 'usingDataPackageR.Rmd' failed with diagnostics:
#> namespace is already attached

I get the same error with install_github("RGLab/DataPackageR", build_vignettes = TRUE)

Examples

There are no examples for several of the exported functions in yamlR.R.

Package functionality

I've not been able to generate vignettes very successfully in the packages generated by DataPackageR. Below is a reprex (behind <details> because of the long output) that goes through essentially the same steps described in the DataPackageR vignette, but the generated package produces no vignette that I can find. Am I missing something here, or should this work?


library("DataPackageR")
processing_code <- system.file(
"extdata", "tests", "subsetCars.Rmd", package = "DataPackageR"
)
datapackage_skeleton(
"mtcars20",
force = TRUE,
code_files = processing_code,
r_object_names = "cars_over_20",
path = "."
)
#> ✔ Changing active project to '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> ✔ Creating 'R/'
#> ✔ Creating 'man/'
#> ✔ Writing 'DESCRIPTION'
#> ✔ Writing 'NAMESPACE'
#> Adding DataVersion string to DESCRIPTION
#> Creating data and data-raw directories
#> ✔ Creating 'data-raw/'
#> ✔ Creating 'data/'
#> ✔ Creating 'inst/extdata/'
#> configuring yaml file
package_build(packageName = "mtcars20", vignettes = TRUE)
#> ✔ Changing active project to '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> INFO [2018-07-16 22:43:10] Logging to /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/inst/extdata/Logfiles/processing.log
#> INFO [2018-07-16 22:43:10] Processing data
#> INFO [2018-07-16 22:43:10] Reading yaml configuration
#> INFO [2018-07-16 22:43:10] Found /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/data-raw/subsetCars.Rmd
#> INFO [2018-07-16 22:43:10] Processing 1 of 1: /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/data-raw/subsetCars.Rmd
#> processing file: subsetCars.Rmd
#> output file: subsetCars.knit.md
#> /usr/local/bin/pandoc +RTS -K512m -RTS subsetCars.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/inst/extdata/Logfiles/subsetCars.html --email-obfuscation none --self-contained --standalone --section-divs --template /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:bootstrap' --include-in-header /var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T//RtmpF2W385/rmarkdown-str6d227dd960ab.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
#>
#> Output created: mtcars20/inst/extdata/Logfiles/subsetCars.html
#> INFO [2018-07-16 22:43:10] 1 required data objects created by subsetCars.Rmd
#> INFO [2018-07-16 22:43:10] Saving to data
#> INFO [2018-07-16 22:43:10] Copied documentation to /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/R/mtcars20.R
#> ✔ Changing active project to '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> ✔ Creating 'vignettes/'
#> ✔ Creating 'inst/doc/'
#> INFO [2018-07-16 22:43:10] Done
#> INFO [2018-07-16 22:43:10] DataPackageR succeeded
#> INFO [2018-07-16 22:43:10] Building documentation
#> First time using roxygen2. Upgrading automatically...
#> Updating roxygen version in /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/DESCRIPTION
#> Writing NAMESPACE
#> Writing mtcars20.Rd
#> Writing cars_over_20.Rd
#> INFO [2018-07-16 22:43:10] Building package
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file
#>   --no-environ --no-save --no-restore --quiet CMD build
#>   '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#>   --no-resave-data --no-manual
#>
#> [1] "/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20_1.0.tar.gz"
install.packages("mtcars20_1.0.tar.gz", type = "source", repos = NULL)
library("mtcars20")
data("cars_over_20")
vignette(package = "mtcars20")
#> no vignettes found
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value
#>  version  R version 3.5.0 (2018-04-23)
#>  system   x86_64, darwin15.6.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  tz       America/Los_Angeles
#>  date     2018-07-16
#> Packages -----------------------------------------------------------------
#>  package        * version    date
#>  assertthat       0.2.0      2017-04-11
#>  backports        1.1.2      2017-12-13
#>  base           * 3.5.0      2018-04-24
#>  callr            2.0.4      2018-05-15
#>  clisymbols       1.2.0      2017-05-21
#>  commonmark       1.5        2018-04-28
#>  compiler         3.5.0      2018-04-24
#>  crayon           1.3.4      2017-09-16
#>  DataPackageR   * 0.14.2     2018-07-17
#>  datasets       * 3.5.0      2018-04-24
#>  desc             1.2.0      2018-05-01
#>  devtools         1.13.6     2018-06-27
#>  digest           0.6.15     2018-01-28
#>  evaluate         0.10.1     2017-06-24
#>  formatR          1.5        2017-04-25
#>  fs               1.2.3      2018-06-08
#>  futile.logger    1.4.3      2016-07-10
#>  futile.options   1.0.1      2018-04-20
#>  glue             1.3.0      2018-07-17
#>  graphics       * 3.5.0      2018-04-24
#>  grDevices      * 3.5.0      2018-04-24
#>  htmltools        0.3.6      2017-04-28
#>  knitr            1.20       2018-02-20
#>  lambda.r         1.2.3      2018-05-17
#>  magrittr         1.5        2014-11-22
#>  memoise          1.1.0      2017-04-21
#>  methods        * 3.5.0      2018-04-24
#>  mtcars20       * 1.0
#>  pkgbuild         1.0.0      2018-07-17
#>  processx         3.1.0      2018-05-15
#>  purrr            0.2.5      2018-05-29
#>  R6               2.2.2      2017-06-17
#>  Rcpp             0.12.17    2018-05-18
#>  rlang            0.2.1.9000 2018-07-17
#>  rmarkdown        1.10       2018-06-11
#>  roxygen2         6.0.1.9000 2018-07-17
#>  rprojroot        1.3-2      2018-01-03
#>  stats          * 3.5.0      2018-04-24
#>  stringi          1.2.3      2018-06-12
#>  stringr          1.3.1      2018-05-10
#>  testthat         2.0.0      2017-12-13
#>  tools            3.5.0      2018-04-24
#>  usethis          1.3.0.9000 2018-07-17
#>  utils          * 3.5.0      2018-04-24
#>  whisker          0.3-2      2013-04-28
#>  withr            2.1.2      2018-03-15
#>  xml2             1.2.0      2018-01-24
#>  yaml             2.1.19     2018-05-01
#>  source
#>  cran (@0.2.0)
#>  cran (@1.1.2)
#>  local
#>  CRAN (R 3.5.0)
#>  cran (@1.2.0)
#>  CRAN (R 3.5.0)
#>  local
#>  CRAN (R 3.5.0)
#>  Github (RGLab/DataPackageR@806cb49)
#>  local
#>  cran (@1.2.0)
#>  cran (@1.13.6)
#>  CRAN (R 3.5.0)
#>  cran (@0.10.1)
#>  CRAN (R 3.5.0)
#>  cran (@1.2.3)
#>  cran (@1.4.3)
#>  cran (@1.0.1)
#>  Github (tidyverse/glue@66de125)
#>  local
#>  local
#>  cran (@0.3.6)
#>  CRAN (R 3.5.0)
#>  cran (@1.2.3)
#>  CRAN (R 3.5.0)
#>  CRAN (R 3.5.0)
#>  local
#>  local
#>  Github (r-lib/pkgbuild@94d2f0d)
#>  cran (@3.1.0)
#>  cran (@0.2.5)
#>  CRAN (R 3.5.0)
#>  CRAN (R 3.5.0)
#>  Github (r-lib/rlang@d97e73d)
#>  cran (@1.10)
#>  Github (klutometis/roxygen@a8ed64b)
#>  cran (@1.3-2)
#>  local
#>  cran (@1.2.3)
#>  CRAN (R 3.5.0)
#>  CRAN (R 3.5.0)
#>  local
#>  Github (r-lib/usethis@b58920a)
#>  local
#>  CRAN (R 3.5.0)
#>  CRAN (R 3.5.0)
#>  CRAN (R 3.5.0)
#>  cran (@2.1.19)


Created on 2018-07-16 by the reprex package (v0.2.0).

I also find that when using an R script rather than an Rmd in code_files, the resulting vignette doesn't build due to an invalid YAML header. DataPackageR creates Rmd files in at least 3 locations: inst/doc/, vignettes/, and data-raw/. The versions in the first two directories have a header that look like so:

---
{}
vignette: >
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---


The version in data-raw/ has its different YAML header at the bottom of the file (?):

---
title: "processing_script.R"
author: "Kara"
date: "Mon Jul 16 23:25:14 2018"
---


I have tried adding a title to the Rmd files in both inst/doc/ and vignettes/, as well as moving the data-raw/ version's YAML header to the top of the file, but when I call package_build() again my changes get reverted and I always get the same error:

* creating vignettes ... ERROR
Error: processing vignette 'processing_script.spin.Rmd' failed with diagnostics:
Parser error: did not find expected <document start> at line 2, column 1
Execution halted
Error: Command failed (1)


Automated tests

• No test failures on my machine, but 16 warnings about roxygen2 requires Encoding: UTF-8.
• I couldn't find any tests for the auto-generated vignettes.
• I recommend having Travis check the package on the current and recent versions of R, not just devel, as these are the ones most users of the package will be on.
• I recommend breaking up tests into multiple files, rather than one long one, for readability. I'd also recommend some more informative names -- test_that("edge cases work as expected") makes it difficult to tell what the test is actually testing.

• Docs for datapackage_skeleton() say: Creates a package skeleton for use with preprpocessData. What is preprocessData? I don't see any reference to it anywhere else in the package except the .gitignore.
• Can you clarify why you're using requireNamespace() here and here?
• This section of the README could use some explanation of when one might need to access each of these locations. I found it confusing as written.
• Would it be possible to support generating the roxygen2 comments with markdown?
• There's a lot of DataPackageR:: prefixing in the README, it's a minor thing but it does make it harder to read the example code, and it's not necessary since the code examples include library(DataPackageR).
• DataPackageR can automatically increment the data version, but doesn't currently provide a mechanism to document the changes. Perhaps including a NEWS.md file and prompting users to describe changes would be helpful. Just an idea though, not something I'd need to see before recommending the package be accepted.

maelle commented Jul 17, 2018

 Thanks a lot for your thorough review @karawoo! 😸 @gfinak now both reviews are in! 🙂

gfinak commented Jul 17, 2018

 Thanks! I'm out of town until the end of the month. I'll address the second review when I return. … On Tue, Jul 17, 2018, 01:07 Maëlle Salmon ***@***.***> wrote: Thanks a lot for your thorough review @karawoo ! 😸 @gfinak now both reviews are in! 🙂 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#230 (comment)>, or mute the thread .

maelle commented Jul 18, 2018

 Thanks for the updated @gfinak!

gfinak commented Jul 30, 2018

 @karawoo Thanks for the review, I'm starting to go through everything to address the issues you found. So far I've addressed the error: #> Quitting from lines 205-215 (usingDataPackageR.Rmd) #> Error: processing vignette 'usingDataPackageR.Rmd' failed with diagnostics: #> namespace is already attached  with a conditional attach of the mtcars20 package namespace . The usual library() or require() call there was producing NOTEs about a dependency on mtcars20 that was not in the DESCRIPTION file. I've added missing examples for the yaml functions. Regarding the failure to build vignettes for packages: install.packages("mtcars20_1.0.tar.gz", type = "source", repos = NULL) library("mtcars20") data("cars_over_20") vignette(package = "mtcars20") #> no vignettes found  I can reproduce this, but find it's an issue that is resolved by restarting the R session after installing the package. The vignettes are indeed produced, but can only be accessed after loading the installed package in a fresh R session. I'm addressing this by reloading the built package in the current session after a successful call to DataPackageR::package_build() I'm working on the remaining issues and will post as I make progress.

gfinak commented Jul 31, 2018

 A few more updates here for the sake of documentation. .R scripts are now processed correctly into vignettes. Tests added. When a package is built with package_build it is also installed and loaded automatically, so that vignettes and data sets are immediately accessible. I may add an install argument to package_build to turn this behavior on/off. UTF-8 encoding warnings have been addressed. devel and release versions of R are now checked by travis. oldrel is not supported. tests have been broken up into multiple files and given some more informative names. docs for datapackage_skeleton are updated. preprocessData reference removed, not relevant anymore. requireNamespace have been removed, they were no longer necessary. I'm opening issues at http://github.com/RGLab/DataPackageR/issues for some of the remaining changes.

This was referenced Aug 1, 2018
This was referenced Aug 1, 2018

gfinak commented Aug 1, 2018

 @karawoo I believe I've addressed all the issues you've raised, with the exception of: Would it be possible to support generating the roxygen2 comments with markdown? As I understand there is a package maxygen that supports this, but it registers some callbacks with roxygen so that it can only support markdown roxygen2 markup or the regular roxygen2 markup. Switching between the two is not possible without an R session restart. I don't think it's something I'm going to support at the moment, but I'll keep the issue open since it would be simpler for the user to be able to document their data objects in their R scripts and Rmd files. I'll aim to support this in the future.

wlandau commented Aug 1, 2018

 I think you can enable Markdown in roxygen2 comments just by adding Roxygen: list(markdown = TRUE) to the DESCRIPTION file.

gfinak commented Aug 1, 2018

 Oh, cool, thanks!

maelle commented Aug 2, 2018

 @gfinak when you've solved this, please update this thread with an answer to @karawoo (summary of your changes) and I'll ask reviewers for their final ok. Thank you!

gfinak commented Aug 2, 2018

 @karawoo Here is a summary of all the changes: I've added markdown support for roxygen comments and added tests to that effect. References to preprocessData removed, this was old text no longer relevant. requireNamespace() removed, not needed. I've tried to clarify the README about when one needs to access inst/extdata or data in the source package from their processing scripts. Removed DataPackageR:: prefixing from the README. I've added a prompt for the user to enter text describing the changes to the data sets when the DataVersion string is incremented. This occurs only if interactive() mode is detected. I've addressed the roxygen2 requires Encoding: UTF-8 warnings. Added Encoding: UTF-8 to the DESCRIPTION I have some tests for autogenerated vignettes, mainly for the processing of R scripts to vignettes. The failure of vignettes to appear in the R session was a different issue. I've got 100% code coverage so they are generated. Travis now checks devel and release. oldrel is not supported. Tests are broken up into multiple files and named more informatively. R scripts are now processed properly into vignettes with a correct yaml header (when one is missin). In order to include custom yaml info, the vingettes now describe that it should be added as #'  commented code to the beginning of the R script. Vignettes now appear in the R session when a package is built. This had to do with the package not being installed and reloaded properly. This is now done automatically by package_build() and can be turned off with install=FALSE. An error in the usingDataPackageR vignette was causing a build failure. This has been corrected. I've added an option for the user to point package_skeleton() to a directory of raw data as well as additional dependency files (e.g., R scripts that should not be processed by the yaml config) that will be copied into inst/extdata and data-raw respectively, to ease package setup further. I've added examples for the missing function in yaml.R I've added tests for all new features. Should have 100% coverage. Thanks for taking the time to review, and let me know if there are any further changes or improvements. Greg

karawoo commented Aug 2, 2018

 Thanks @gfinak! I'm on vacation at the moment but will try to look at this in more detail in a few days.

gfinak commented Aug 2, 2018

 No worries, no hurry!

karawoo commented Aug 8, 2018

 This is looking really great! Everything is working for me now. My only last comment is that it is difficult to run the test suite in an interactive session because of how many times it prompts me for description for the NEWS file. This might become irksome during development but as it doesn't really affect the package functionality and isn't user-facing, I've gone ahead and checked all the remaining checkboxes in my review. I recommend DataPackageR gets accepted into rOpenSci 👍

gfinak commented Aug 8, 2018

 @karawoo that's great news. It's a good idea re: test suite, I'll see about having a package option to turn off interactive testing.

added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Turn on / off interactive mode for tests. 
 617ad22 
add .onLoad to set DataPackageR_interact option to interactive().
Option set to FALSE in tests and back to default after tests complete.
Addresses the final comments in ropensci/software-review#230

maelle commented Aug 8, 2018

 Approved! Thanks @gfinak for all your work and @karawoo @wlandau for your reviews! 😺 To-dos: Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do. Add the rOpenSci footer to the bottom of your README " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)" Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname). We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package. Should you want to awknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here. Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing. We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

gfinak commented Aug 8, 2018

 @karawoo @wlandau if it's okay with you, I would like to acknowledge you as "rev"iewers to the DataPackageR DESCRIPTION.

karawoo commented Aug 8, 2018 • edited

 Fine by me! Thanks!

added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Add Kara Woo as reviewer. 
 a227a34 
ropensci/software-review#230
added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Update README. 
 19d26ad 
Add ropensci footer.
Update references.
ropensci/software-review#230
added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Update badges to point to ropensci accounts on travis and codecov and… 
 5297ba7 
… appveyor according to instructions.

ropensci/software-review#230
added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Add repostatus badge. 
 3fdfc58 
Run codemetar
ropensci/software-review#230
added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Fix appveyor badge 
 ab1fbe2 
ropensci/software-review#230

wlandau commented Aug 8, 2018

 That would be great, Greg. Thanks for all your hard work on the package.

added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Acknowledge @wlandau as reviewer ropensci/software-review#230 
 adcc118 

gfinak commented Aug 8, 2018

 @maelle I've addressed the following: Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do. Add the rOpenSci footer to the bottom of your README "[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)" Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be . We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

maelle commented Aug 8, 2018 • edited

 Perfect! I had forgotten to recommend you also add this badge at the top of the README [![](https://badges.ropensci.org/230_status.svg)](https://github.com/ropensci/onboarding/issues/230)  Thanks! closing the issue now, but this doesn't prevent the discussion of a blog post.

closed this as completed Aug 8, 2018
added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018
 Add peer review badge to README. 
 d2551ae 
ropensci/software-review#230`