Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataPackageR submission #230

Closed
gfinak opened this issue Jun 18, 2018 · 31 comments

Comments

Projects
None yet
5 participants
@gfinak
Copy link
Member

commented Jun 18, 2018

Summary

  • What does this package do? (explain in 50 words or less):

A package to reproducibly process raw data into packaged, analysis-ready data sets.

  • Paste the full DESCRIPTION file inside a code block below:
Package: DataPackageR
Type: Package
Title: Construct Reproducible Analytic Data Sets as R Packages
Authors@R:
    c(person(given = "Greg Finak", role=c("aut","cre","cph"), email="gfinak@fredhutch.org"),
      person(given = "Paul Obrecht", role=c("ctb")))
Version: 0.13.2
Date: 2017-10-13
Description: Construct reproducible analytic data sets as R packages. 
License: MIT + file LICENSE
Depends: R (>= 3.5.0)
Imports:
    optparse,
    digest,
    knitr,
    utils,
    rmarkdown,
    desc,
    yaml,
    purrr,
    here,
    roxygen2 (>= 6.0.1),
    devtools (>= 1.12.0),
    assertthat,
    stringr,
    futile.logger,
    rprojroot,
    data.tree,
    DT
VignetteBuilder: knitr
RoxygenNote: 6.0.1
Collate: autodoc.R
    build.R
    dataversion.r
    digests.R
    load_save.R
    processData.R
    skeleton.R
    devtool_functions.R
    rmarkdown_functions.R
    roxygen2_functions.R
    mergeDocumentation.R
    parseDocumentation.R
    yamlR.R
Suggests:
    testthat,
    covr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/RGLab/DataPackageR

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
    [e.g., "data extraction, because the package parses a scientific data file format"]

reproducibility, because the package provides a framework for reproducibly processing raw data into analysis-ready data sets in R data packages.

  •   Who is the target audience and what are scientific applications of this package?  

The target audience are data analysts, data scientists and any users working with diverse, large, raw data sets that need significant preprocessing to transform them into analysis-ready data sets. This processing may be time consuming and the raw data too large to include in a package. DataPackageR simplifies the process of ensuring that this data processing is done reproducibly by ensuring vignettes are constructed that track how data is processed, ensure data set objects are documented, verifies checksums of individual objects and bumps data sets versions automatically, and decouples the data transformation from the usual build and installation process. The latter is particularly useful when raw data cannot be shared with the package or if processing such data is too time consuming to be re-run each time the package is build and installed using the usual R CMD BUILD process. The tool is useful for preparing analysis-ready data for publication with manuscripts, or sharing it for collaborative data analysis.

The drake and workflowr packages are similar, in that they allow one to build reproducible workflows. DataPackageR is different in that its aim is to provide tool to help users implement the ideas found in ropensci/rrrpkg and cboettig/template and elsewhere, using their existing code with minimal effort. That code may leverage tools like workflowr and drake, but does not have to. DataPackageR provides the infrastructure to automate building, and documentation, and tracking data provenance via automated construction of vignettes documenting the transformation of raw data sets to R data objects ready for analysis, and packaging those into R data packages that can be shared.

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

The package name uses camel case as it has been around for several years, used internally by our research group.

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Suggested reviewers

Jenny Bryan (jennybc)
Carl Boettiger (cboettig)
Ted Laderas (laderast)

@maelle

This comment has been minimized.

Copy link
Member

commented Jun 25, 2018

Thanks for your submission @gfinak! 😺 Below are the results of my editorial checks and comments. Please tackle the changes, I'll then look for and assign reviewers. 🙏

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

  • Please put codecov and Travis config files in .gitignore. Use e.g. usethis::use_build_ignore(c(".travis.yml", "codecov.yml")) (in the future when developing a package you can actually use usethis::use_travis() and usethis::use_coverage() to help with such setup steps).

  • Edit the top of the README.md that's currently weirdly formatted (remains of a Rmd header I think).

  • Given that your package interacts with the file system, please add CI for Windows too via Appveyor, cf this chapter in our packaging guide. There's an usethis::use_appveyor() function.

  • Please delete the Maintainer field in DESCRIPTION, it'll get populated automatically from Authors@R when building the package.

  • Why having the collate field in DESCRIPTION?

  • Please add a code of conduct (e.g. via usethis::usethis::use_code_of_conduct()) and contributing guidelines to your repo. We're working on a template for that, cf this chapter.

  • goodpractice output

It is good practice toomit "Date" in DESCRIPTION. It is not
    required and it gets invalid quite often. A build
    date will be added to the package when you perform
    `R CMD build` on it.
add a "URL" field to DESCRIPTION. It
    helps users find information about your package
    online. If your package does not have a homepage,
    add an URL to GitHub, or the CRAN package package
    page.
  ✖ add a "BugReports" field to DESCRIPTION,
    and point it to a bug tracker. Many online code
    hosting services provide bug trackers for free,
    https://github.com, https://gitlab.com, etc.

Just run usethis::use_github_links() 😸

use '<-' for assignment instead of '='.
    '<-' is the standard, and R users and developers
    are used it and it is easier to read your code for
    them if you use '<-'.

    R\processData.R:241:31
    R\processData.R:242:48
    tests\testthat\test-skeleton.R:274:10
    tests\testthat\test-skeleton.R:275:10
    tests\testthat\test-skeleton.R:288:10

For that I recommend using styler.

avoid long code lines, it is bad for
    readability. Also, many people prefer editor
    windows that are about 80 characters wide. Try
    make your lines shorter than 80 characters

    R\processData.R:64:1
    R\processData.R:142:1
    R\processData.R:145:1
    R\processData.R:236:1
    R\processData.R:242:1
    ... and 10 more lines

I think styler will help with that too.

avoid calling setwd(), it changes the
    global environment. If you need it, consider using
    on.exit() to restore the working directory.

    R\processData.R:53:20
    R\processData.R:62:9
    R\processData.R:77:26
    R\processData.R:88:26
    R\processData.R:97:26
    ... and 7 more lines

Well unless it's needed for the specific context of your package's use case!

avoid sapply(), it is not type safe. It
    might return a vector, or a list, depending on the
    input data. Consider using vapply() instead.

    R\digests.R:10:13
    R\digests.R:11:13
    R\load_save.R:25:5
    R\processData.R:317:17
    R\processData.R:330:13avoid 1:length(...), 1:nrow(...),
    1:ncol(...), 1:NROW(...) and 1:NCOL(...)
    expressions. They are error prone and result 1:0
    if the expression on the right hand side is zero.
    Use seq_len() or seq_along() instead.

    R\parseDocumentation.R:16:20
not import packages as a whole, as this
    can cause name clashes between the imported
    packages. Instead, import only the specific
    functions you need.

See http://r-pkgs.had.co.nz/namespace.html#imports

Ask me any question you might have!


Reviewers: @karawoo @wlandau
Due date: 2018-07-23

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Jun 26, 2018

@maelle , thanks for the package review. I believe I've addressed all the issues with the latest commits.
Best,

Greg

@maelle

This comment has been minimized.

Copy link
Member

commented Jun 27, 2018

Thanks @gfinak! I see much progress indeed! Here are a few more points from goodpractice and me

  • Please replace the use of devtools unexported functions by the use of desc. In general, there's a devtools diaspora going on, so check the different small packages of the r-lib organization to see if some of the ones that are on CRAN contain functions you're using now. Maybe usethis has some of the functions you need for other things?

  • Regarding imports it'd be better to do as stated in Hadley Wickham's book " In fact, this is what I recommend: list the package in DESCRIPTION so that it’s installed, then always refer to it explicitly with pkg::fun(). ". It's a bit a pain when not doing it right away so you'll have a bit of work but it makes things clearer for reviewers (who'd see right away where a function comes from), and even you later. One case could be your wanting to replace a dependency, it'd be much easier to just search for all "dependency::" occurrences.

It is good practice towrite unit tests for all functions, and all
    package code in general. 78% of code lines are
    covered by test cases.

    R/build.R:27:NA
    R/build.R:29:NA
    R/build.R:30:NA
    R/build.R:32:NA
    R/build.R:33:NA
    ... and 250 more lines

Please increase the coverage a bit by testing


  ✖ avoid long code lines, it is bad for
    readability. Also, many people prefer editor windows
    that are about 80 characters wide. Try make your
    lines shorter than 80 characters

    R\processData.R:56:1
    R\processData.R:85:1
    R\processData.R:147:1
    R\processData.R:428:1
    R\yamlR.R:180:1
    ... and 10 more lines

Please shorten the lines unless they contain URLs or so. In which case add lintr exceptions using # nolint cf https://github.com/jimhester/lintr#project-configuration

  ✖ avoid calling setwd(), it changes the
    global environment. If you need it, consider using
    on.exit() to restore the working directory.

    R\processData.R:44:11
    R\processData.R:82:13
    R\processData.R:83:5
    tests\testthat\test-skeleton.R:38:11
    tests\testthat\test-skeleton.R:39:3

Can you confirm they're needed?

  ✖ avoid sapply(), it is not type safe. It
    might return a vector, or a list, depending on the
    input data. Consider using vapply() instead.

    R\load_save.R:27:3

Please replace the use of sapply.

  ✖ fix this R CMD check NOTE: Namespace in
    Imports field not imported from: 'optparse' All
    declared Imports should be used.

Unnecessary dependency?

fix this R CMD check WARNING: LaTeX errors
    when creating PDF version. This typically indicates
    Rd problems. LaTeX errors found: ! LaTeX Error: File
    `inconsolata.sty' not found. Type X to quit or
    <RETURN> to proceed, or enter new name. (Default
    extension: sty) ! Emergency stop. <read *> l.276 ^^M
    !  ==> Fatal error occurred, no output PDF file
    produced!

Not sure what that is, can you please investigate?

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Jun 29, 2018

@maelle
I've gone through the remaining issues.
goodpractice passes: there are a few setwd() used in my tests to cover some test cases, but they are no longer in the package code.
I've addressed the NAMESPACE imports, and use desc and usethis over devtools unexported functions.
Test coverage is now at 100%.
I cannot reproduce the R CMD check WARNING about the inconsolata.sty LaTeX file when I build on my system or any of the CI builds. Could it be a local LaTeX installation issue?

@maelle

This comment has been minimized.

Copy link
Member

commented Jun 29, 2018

yes it might be since I use tinytex? Will start looking for reviewers now. Thanks for all your efforts!

@maelle

This comment has been minimized.

Copy link
Member

commented Jun 30, 2018

Reviewers now assigned! Thanks @karawoo and @wlandau for accepting to review this package. 😺 Your reviews are due on 2018-17-23.

Our brand-new and WIP gitbook includes the reviewer's guide as a chapter that you'll find here. 📖

@wlandau

This comment has been minimized.

Copy link

commented Jul 2, 2018

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 4


Review Comments

DataPackageR addresses the unresolved need for tools that properly package and version datasets. I believe it is a worthy contribution to reproducibility and collaboration. This is how I understand the appeal:

  1. Data packages have a mixed reputation because the practice does not tend to suit very large datasets. But now, we have a method to
    1. Only include the condensed/processed data in the built package, and
    2. Maintain a reproducible record of the data processing in the package source.
  2. DataPackageR has an automated mechanism to increment the version of the package when the fingerprint of any of the datasets changes.
  3. Setting up a data package is just smoother and easier. I think the role of DataPackageR is a bit like that of devtools in this space.

Relative to the potential impact, I believe the documentation is a bit modest and detail-oriented. It would be great to take a step back, explain the general concept of a data package, and sell DataPackageR on the above points. My other feedback is minor by comparison. Issues 21 through 32 detail the changes I would like to see.

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Jul 5, 2018

Dear @wlandau
I believe the latest changes to DataPackageR (ending at commit ropensci/DataPackageR@ee6ee6c) address all the issues you have raised. I have left them open, will let you verify that you are satisfied and can close them.
Thanks again for your feedback, it's been extremely productive.

@wlandau

This comment has been minimized.

Copy link

commented Jul 10, 2018

@gfinak I am glad this process helped. That has certainly been my experience with rOpenSci reviews. You have indeed addressed all the issues referenced in my review, including my later follow-up requests in the various threads. Pending approval from @karawoo and @maelle, I recommend the acceptance of DataPackageR into rOpenSci.

@maelle

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

@karawoo 👋 friendly reminder that your review is due on 2018-07-23 😺

@karawoo

This comment has been minimized.

Copy link

commented Jul 17, 2018

Package Review

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

##### For packages co-submitting to JOSS
~~
- [ ] The package has an obvious research application according to JOSS's definition
~~
The package contains a paper.md matching JOSS's requirements with:
~~
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 5.5


Review Comments

DataPackageR generates R package templates for processed datasets, and includes data processing code as vignettes to support reproducibility.

Packaging raw data, processing code, and tidied results into an R package is one way to provide a reproducible record of data processing. DataPackageR is also designed to provide a convenient method of distributing smaller processed data when the original files are too large. Distributing processed data without the original runs somewhat counter to the goal of reproducibility; from that perspective I'd be hesitant to use DataPackageR for larger datasets unless they were publicly accessible outside of the package.

I have unfortunately not been able to test many of the finer details of this package, because I've been unable to get some of the fundamental elements working locally (see more details in the Package functionality section below). I have not yet been able to generate a complete data package with vignettes the way the DataPackageR documentation describes, so I feel some significant work is needed here to make the package more robust.

In addition to resolving some of the issues described below, one of the biggest things I would like to see DataPackageR do is give guidance to the user on what they need to do next after using DataPackageR to set up their package. DataPackageR creates a lot of placeholder text in many different locations within the package, and for R users who have never created a package before (part of DataPackageR's target audience, according to the README) there is a lack of information about what steps come next.

Vignettes

I can't build vignettes for DataPackageR locally:

devtools::build_vignettes("~/projects/forks/DataPackageR/")
#> Building DataPackageR vignettes
#> * checking for file ‘/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2zWYal/mtcars20/DESCRIPTION’ ... OK
#> * preparing ‘mtcars20’:
#> * checking DESCRIPTION meta-information ... OK
#> * checking for LF line-endings in source and make files and shell scripts
#> * checking for empty or unneeded directories
#> * looking to see if a ‘data/datalist’ file should be added
#> * building ‘mtcars20_1.0.tar.gz’
#> 
#> * installing *source* package ‘mtcars20’ ...
#> ** R
#> ** data
#> ** inst
#> ** byte-compile and prepare package for lazy loading
#> ** help
#> *** installing help indices
#> ** building package indices
#> ** installing vignettes
#> ‘subsetCars.Rmd’ using ‘UTF-8’ 
#> ** testing if installed package can be loaded
#> * DONE (mtcars20)
#> Quitting from lines 205-215 (usingDataPackageR.Rmd) 
#> Error: processing vignette 'usingDataPackageR.Rmd' failed with diagnostics:
#> namespace is already attached

I get the same error with install_github("RGLab/DataPackageR", build_vignettes = TRUE)

Examples

There are no examples for several of the exported functions in yamlR.R.

Package functionality

I've not been able to generate vignettes very successfully in the packages generated by DataPackageR. Below is a reprex (behind <details> because of the long output) that goes through essentially the same steps described in the DataPackageR vignette, but the generated package produces no vignette that I can find. Am I missing something here, or should this work?


library("DataPackageR")

processing_code <- system.file(
"extdata", "tests", "subsetCars.Rmd", package = "DataPackageR"
)

datapackage_skeleton(
"mtcars20",
force = TRUE,
code_files = processing_code,
r_object_names = "cars_over_20",
path = "."
)
#> ✔ Changing active project to '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> ✔ Creating 'R/'
#> ✔ Creating 'man/'
#> ✔ Writing 'DESCRIPTION'
#> ✔ Writing 'NAMESPACE'
#> Adding DataVersion string to DESCRIPTION
#> Creating data and data-raw directories
#> ✔ Creating 'data-raw/'
#> ✔ Creating 'data/'
#> ✔ Creating 'inst/extdata/'
#> configuring yaml file

package_build(packageName = "mtcars20", vignettes = TRUE)
#> ✔ Changing active project to '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> INFO [2018-07-16 22:43:10] Logging to /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/inst/extdata/Logfiles/processing.log
#> INFO [2018-07-16 22:43:10] Processing data
#> INFO [2018-07-16 22:43:10] Reading yaml configuration
#> INFO [2018-07-16 22:43:10] Found /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/data-raw/subsetCars.Rmd
#> INFO [2018-07-16 22:43:10] Processing 1 of 1: /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/data-raw/subsetCars.Rmd
#> processing file: subsetCars.Rmd
#> output file: subsetCars.knit.md
#> /usr/local/bin/pandoc +RTS -K512m -RTS subsetCars.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/inst/extdata/Logfiles/subsetCars.html --email-obfuscation none --self-contained --standalone --section-divs --template /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable 'theme:bootstrap' --include-in-header /var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T//RtmpF2W385/rmarkdown-str6d227dd960ab.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
#>
#> Output created: mtcars20/inst/extdata/Logfiles/subsetCars.html
#> INFO [2018-07-16 22:43:10] 1 required data objects created by subsetCars.Rmd
#> INFO [2018-07-16 22:43:10] Saving to data
#> INFO [2018-07-16 22:43:10] Copied documentation to /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/R/mtcars20.R
#> ✔ Changing active project to '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> ✔ Creating 'vignettes/'
#> ✔ Creating 'inst/doc/'
#> INFO [2018-07-16 22:43:10] Done
#> INFO [2018-07-16 22:43:10] DataPackageR succeeded
#> INFO [2018-07-16 22:43:10] Building documentation
#> First time using roxygen2. Upgrading automatically...
#> Updating roxygen version in /private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20/DESCRIPTION
#> Writing NAMESPACE
#> Loading mtcars20
#> Writing mtcars20.Rd
#> Writing cars_over_20.Rd
#> INFO [2018-07-16 22:43:10] Building package
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file
#> --no-environ --no-save --no-restore --quiet CMD build
#> '/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20'
#> --no-resave-data --no-manual
#>
#> [1] "/private/var/folders/48/xj0m1tkj5kd_smyyb1t9n2lc0000gq/T/Rtmp2Arisg/mtcars20_1.0.tar.gz"
install.packages("mtcars20_1.0.tar.gz", type = "source", repos = NULL)
library("mtcars20")
data("cars_over_20")
vignette(package = "mtcars20")
#> no vignettes found

devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz America/Los_Angeles
#> date 2018-07-16
#> Packages -----------------------------------------------------------------
#> package * version date
#> assertthat 0.2.0 2017-04-11
#> backports 1.1.2 2017-12-13
#> base * 3.5.0 2018-04-24
#> callr 2.0.4 2018-05-15
#> clisymbols 1.2.0 2017-05-21
#> commonmark 1.5 2018-04-28
#> compiler 3.5.0 2018-04-24
#> crayon 1.3.4 2017-09-16
#> DataPackageR * 0.14.2 2018-07-17
#> datasets * 3.5.0 2018-04-24
#> desc 1.2.0 2018-05-01
#> devtools 1.13.6 2018-06-27
#> digest 0.6.15 2018-01-28
#> evaluate 0.10.1 2017-06-24
#> formatR 1.5 2017-04-25
#> fs 1.2.3 2018-06-08
#> futile.logger 1.4.3 2016-07-10
#> futile.options 1.0.1 2018-04-20
#> glue 1.3.0 2018-07-17
#> graphics * 3.5.0 2018-04-24
#> grDevices * 3.5.0 2018-04-24
#> htmltools 0.3.6 2017-04-28
#> knitr 1.20 2018-02-20
#> lambda.r 1.2.3 2018-05-17
#> magrittr 1.5 2014-11-22
#> memoise 1.1.0 2017-04-21
#> methods * 3.5.0 2018-04-24
#> mtcars20 * 1.0
#> pkgbuild 1.0.0 2018-07-17
#> pkgload 1.0.0 2018-07-17
#> processx 3.1.0 2018-05-15
#> purrr 0.2.5 2018-05-29
#> R6 2.2.2 2017-06-17
#> Rcpp 0.12.17 2018-05-18
#> rlang 0.2.1.9000 2018-07-17
#> rmarkdown 1.10 2018-06-11
#> roxygen2 6.0.1.9000 2018-07-17
#> rprojroot 1.3-2 2018-01-03
#> stats * 3.5.0 2018-04-24
#> stringi 1.2.3 2018-06-12
#> stringr 1.3.1 2018-05-10
#> testthat 2.0.0 2017-12-13
#> tools 3.5.0 2018-04-24
#> usethis 1.3.0.9000 2018-07-17
#> utils * 3.5.0 2018-04-24
#> whisker 0.3-2 2013-04-28
#> withr 2.1.2 2018-03-15
#> xml2 1.2.0 2018-01-24
#> yaml 2.1.19 2018-05-01
#> source
#> cran (@0.2.0)
#> cran (@1.1.2)
#> local
#> CRAN (R 3.5.0)
#> cran (@1.2.0)
#> CRAN (R 3.5.0)
#> local
#> CRAN (R 3.5.0)
#> Github (RGLab/DataPackageR@806cb49)
#> local
#> cran (@1.2.0)
#> cran (@1.13.6)
#> CRAN (R 3.5.0)
#> cran (@0.10.1)
#> CRAN (R 3.5.0)
#> cran (@1.2.3)
#> cran (@1.4.3)
#> cran (@1.0.1)
#> Github (tidyverse/glue@66de125)
#> local
#> local
#> cran (@0.3.6)
#> CRAN (R 3.5.0)
#> cran (@1.2.3)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> local
#> local
#> Github (r-lib/pkgbuild@94d2f0d)
#> Github (r-lib/pkgload@c699550)
#> cran (@3.1.0)
#> cran (@0.2.5)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> Github (r-lib/rlang@d97e73d)
#> cran (@1.10)
#> Github (klutometis/roxygen@a8ed64b)
#> cran (@1.3-2)
#> local
#> cran (@1.2.3)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> local
#> Github (r-lib/usethis@b58920a)
#> local
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> CRAN (R 3.5.0)
#> cran (@2.1.19)

Created on 2018-07-16 by the reprex package (v0.2.0).

I also find that when using an R script rather than an Rmd in code_files, the resulting vignette doesn't build due to an invalid YAML header. DataPackageR creates Rmd files in at least 3 locations: inst/doc/, vignettes/, and data-raw/. The versions in the first two directories have a header that look like so:

---
{}
vignette: >
  %\VignetteIndexEntry{Default Vignette Title. Add yaml title: to your document}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

The version in data-raw/ has its different YAML header at the bottom of the file (?):

---
title: "processing_script.R"
author: "Kara"
date: "Mon Jul 16 23:25:14 2018"
---

I have tried adding a title to the Rmd files in both inst/doc/ and vignettes/, as well as moving the data-raw/ version's YAML header to the top of the file, but when I call package_build() again my changes get reverted and I always get the same error:

* creating vignettes ... ERROR
Error: processing vignette 'processing_script.spin.Rmd' failed with diagnostics:
Parser error: did not find expected <document start> at line 2, column 1
Execution halted
Error: Command failed (1)

Automated tests

  • No test failures on my machine, but 16 warnings about roxygen2 requires Encoding: UTF-8.
  • I couldn't find any tests for the auto-generated vignettes.
  • I recommend having Travis check the package on the current and recent versions of R, not just devel, as these are the ones most users of the package will be on.
  • I recommend breaking up tests into multiple files, rather than one long one, for readability. I'd also recommend some more informative names -- test_that("edge cases work as expected") makes it difficult to tell what the test is actually testing.

Other comments

  • Docs for datapackage_skeleton() say: Creates a package skeleton for use with preprpocessData. What is preprocessData? I don't see any reference to it anywhere else in the package except the .gitignore.
  • Can you clarify why you're using requireNamespace() here and here?
  • This section of the README could use some explanation of when one might need to access each of these locations. I found it confusing as written.
  • Would it be possible to support generating the roxygen2 comments with markdown?
  • There's a lot of DataPackageR:: prefixing in the README, it's a minor thing but it does make it harder to read the example code, and it's not necessary since the code examples include library(DataPackageR).
  • DataPackageR can automatically increment the data version, but doesn't currently provide a mechanism to document the changes. Perhaps including a NEWS.md file and prompting users to describe changes would be helpful. Just an idea though, not something I'd need to see before recommending the package be accepted.
@maelle

This comment has been minimized.

Copy link
Member

commented Jul 17, 2018

Thanks a lot for your thorough review @karawoo! 😸

@gfinak now both reviews are in! 🙂

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Jul 17, 2018

@maelle

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

Thanks for the updated @gfinak!

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2018

@karawoo
Thanks for the review, I'm starting to go through everything to address the issues you found.
So far I've addressed the error:

#> Quitting from lines 205-215 (usingDataPackageR.Rmd) 
#> Error: processing vignette 'usingDataPackageR.Rmd' failed with diagnostics:
#> namespace is already attached

with a conditional attach of the mtcars20 package namespace . The usual library() or require() call there was producing NOTEs about a dependency on mtcars20 that was not in the DESCRIPTION file.

I've added missing examples for the yaml functions.

Regarding the failure to build vignettes for packages:

install.packages("mtcars20_1.0.tar.gz", type = "source", repos = NULL)
library("mtcars20")
data("cars_over_20")
vignette(package = "mtcars20")
#> no vignettes found

I can reproduce this, but find it's an issue that is resolved by restarting the R session after installing the package.
The vignettes are indeed produced, but can only be accessed after loading the installed package in a fresh R session.

I'm addressing this by reloading the built package in the current session after a successful call to DataPackageR::package_build()

I'm working on the remaining issues and will post as I make progress.

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2018

A few more updates here for the sake of documentation.

  • .R scripts are now processed correctly into vignettes. Tests added.
  • When a package is built with package_build it is also installed and loaded automatically, so that vignettes and data sets are immediately accessible.
    • I may add an install argument to package_build to turn this behavior on/off.
  • UTF-8 encoding warnings have been addressed.
  • devel and release versions of R are now checked by travis. oldrel is not supported.
  • tests have been broken up into multiple files and given some more informative names.
  • docs for datapackage_skeleton are updated. preprocessData reference removed, not relevant anymore.
  • requireNamespace have been removed, they were no longer necessary.

I'm opening issues at http://github.com/RGLab/DataPackageR/issues for some of the remaining changes.

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2018

@karawoo
I believe I've addressed all the issues you've raised, with the exception of:

Would it be possible to support generating the roxygen2 comments with markdown?

As I understand there is a package maxygen that supports this, but it registers some callbacks with roxygen so that it can only support markdown roxygen2 markup or the regular roxygen2 markup. Switching between the two is not possible without an R session restart.

I don't think it's something I'm going to support at the moment, but I'll keep the issue open since it would be simpler for the user to be able to document their data objects in their R scripts and Rmd files.
I'll aim to support this in the future.

@wlandau

This comment has been minimized.

Copy link

commented Aug 1, 2018

I think you can enable Markdown in roxygen2 comments just by adding Roxygen: list(markdown = TRUE) to the DESCRIPTION file.

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 1, 2018

Oh, cool, thanks!

@maelle

This comment has been minimized.

Copy link
Member

commented Aug 2, 2018

@gfinak when you've solved this, please update this thread with an answer to @karawoo (summary of your changes) and I'll ask reviewers for their final ok. Thank you!

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

@karawoo

Here is a summary of all the changes:

  1. I've added markdown support for roxygen comments and added tests to that effect.
  2. References to preprocessData removed, this was old text no longer relevant.
  3. requireNamespace() removed, not needed.
  4. I've tried to clarify the README about when one needs to access inst/extdata or data in the source package from their processing scripts.
  5. Removed DataPackageR:: prefixing from the README.
  6. I've added a prompt for the user to enter text describing the changes to the data sets when the DataVersion string is incremented. This occurs only if interactive() mode is detected.
  7. I've addressed the roxygen2 requires Encoding: UTF-8 warnings. Added Encoding: UTF-8 to the DESCRIPTION
  8. I have some tests for autogenerated vignettes, mainly for the processing of R scripts to vignettes. The failure of vignettes to appear in the R session was a different issue. I've got 100% code coverage so they are generated.
  9. Travis now checks devel and release. oldrel is not supported.
  10. Tests are broken up into multiple files and named more informatively.
  11. R scripts are now processed properly into vignettes with a correct yaml header (when one is missin). In order to include custom yaml info, the vingettes now describe that it should be added as #' commented code to the beginning of the R script.
  12. Vignettes now appear in the R session when a package is built. This had to do with the package not being installed and reloaded properly. This is now done automatically by package_build() and can be turned off with install=FALSE.
  13. An error in the usingDataPackageR vignette was causing a build failure. This has been corrected.
  14. I've added an option for the user to point package_skeleton() to a directory of raw data as well as additional dependency files (e.g., R scripts that should not be processed by the yaml config) that will be copied into inst/extdata and data-raw respectively, to ease package setup further.
  15. I've added examples for the missing function in yaml.R
  16. I've added tests for all new features. Should have 100% coverage.

Thanks for taking the time to review, and let me know if there are any further changes or improvements.
Greg

@karawoo

This comment has been minimized.

Copy link

commented Aug 2, 2018

Thanks @gfinak! I'm on vacation at the moment but will try to look at this in more detail in a few days.

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 2, 2018

No worries, no hurry!

@karawoo

This comment has been minimized.

Copy link

commented Aug 8, 2018

This is looking really great! Everything is working for me now. My only last comment is that it is difficult to run the test suite in an interactive session because of how many times it prompts me for description for the NEWS file. This might become irksome during development but as it doesn't really affect the package functionality and isn't user-facing, I've gone ahead and checked all the remaining checkboxes in my review. I recommend DataPackageR gets accepted into rOpenSci 👍

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2018

@karawoo that's great news. It's a good idea re: test suite, I'll see about having a package option to turn off interactive testing.

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

Turn on / off interactive mode for tests.
add .onLoad to set `DataPackageR_interact` option to `interactive()`.
Option set to FALSE in tests and back to default after tests complete.
Addresses the final comments in ropensci/software-review#230
@maelle

This comment has been minimized.

Copy link
Member

commented Aug 8, 2018

Approved! Thanks @gfinak for all your work and @karawoo @wlandau for your reviews! 😺

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Add the rOpenSci footer to the bottom of your README
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to awknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2018

@karawoo @wlandau if it's okay with you, I would like to acknowledge you as "rev"iewers to the DataPackageR DESCRIPTION.

@karawoo

This comment has been minimized.

Copy link

commented Aug 8, 2018

Fine by me! Thanks!

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

Update README.
Add ropensci footer.
Update references.
ropensci/software-review#230

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

@wlandau

This comment has been minimized.

Copy link

commented Aug 8, 2018

That would be great, Greg. Thanks for all your hard work on the package.

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

@gfinak

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2018

@maelle I've addressed the following:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Add the rOpenSci footer to the bottom of your README
    "[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be AppVeyor Build Status.
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
@maelle

This comment has been minimized.

Copy link
Member

commented Aug 8, 2018

Perfect! I had forgotten to recommend you also add this badge at the top of the README

[![](https://badges.ropensci.org/230_status.svg)](https://github.com/ropensci/onboarding/issues/230)

Thanks!
closing the issue now, but this doesn't prevent the discussion of a blog post.

@maelle maelle closed this Aug 8, 2018

gfinak added a commit to ropensci/DataPackageR that referenced this issue Aug 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.