Summary
- What does this package do? (explain in 50 words or less):
A package to reproducibly process raw data into packaged, analysis-ready data sets.
- Paste the full DESCRIPTION file inside a code block below:
Package: DataPackageR
Type: Package
Title: Construct Reproducible Analytic Data Sets as R Packages
Authors@R:
c(person(given = "Greg Finak", role=c("aut","cre","cph"), email="gfinak@fredhutch.org"),
person(given = "Paul Obrecht", role=c("ctb")))
Version: 0.13.2
Date: 2017-10-13
Description: Construct reproducible analytic data sets as R packages.
License: MIT + file LICENSE
Depends: R (>= 3.5.0)
Imports:
optparse,
digest,
knitr,
utils,
rmarkdown,
desc,
yaml,
purrr,
here,
roxygen2 (>= 6.0.1),
devtools (>= 1.12.0),
assertthat,
stringr,
futile.logger,
rprojroot,
data.tree,
DT
VignetteBuilder: knitr
RoxygenNote: 6.0.1
Collate: autodoc.R
build.R
dataversion.r
digests.R
load_save.R
processData.R
skeleton.R
devtool_functions.R
rmarkdown_functions.R
roxygen2_functions.R
mergeDocumentation.R
parseDocumentation.R
yamlR.R
Suggests:
testthat,
covr
- URL for the package (the development repository, not a stylized html page):
https://github.com/RGLab/DataPackageR
- Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
[e.g., "data extraction, because the package parses a scientific data file format"]
reproducibility, because the package provides a framework for reproducibly processing raw data into analysis-ready data sets in R data packages.
- Who is the target audience and what are scientific applications of this package?
The target audience are data analysts, data scientists and any users working with diverse, large, raw data sets that need significant preprocessing to transform them into analysis-ready data sets. This processing may be time consuming and the raw data too large to include in a package. DataPackageR simplifies the process of ensuring that this data processing is done reproducibly by ensuring vignettes are constructed that track how data is processed, ensure data set objects are documented, verifies checksums of individual objects and bumps data sets versions automatically, and decouples the data transformation from the usual build and installation process. The latter is particularly useful when raw data cannot be shared with the package or if processing such data is too time consuming to be re-run each time the package is build and installed using the usual R CMD BUILD process. The tool is useful for preparing analysis-ready data for publication with manuscripts, or sharing it for collaborative data analysis.
The drake and workflowr packages are similar, in that they allow one to build reproducible workflows. DataPackageR is different in that its aim is to provide tool to help users implement the ideas found in ropensci/rrrpkg and cboettig/template and elsewhere, using their existing code with minimal effort. That code may leverage tools like workflowr and drake, but does not have to. DataPackageR provides the infrastructure to automate building, and documentation, and tracking data provenance via automated construction of vignettes documenting the transformation of raw data sets to R data objects ready for analysis, and packaging those into R data packages that can be shared.
- If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
Detail
The package name uses camel case as it has been around for several years, used internally by our research group.
-
If this is a resubmission following rejection, please explain the change in circumstances:
-
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Suggested reviewers
Jenny Bryan (jennybc)
Carl Boettiger (cboettig)
Ted Laderas (laderast)
Summary
A package to reproducibly process raw data into packaged, analysis-ready data sets.
https://github.com/RGLab/DataPackageR
[e.g., "data extraction, because the package parses a scientific data file format"]
reproducibility, because the package provides a framework for reproducibly processing raw data into analysis-ready data sets in R data packages.
The target audience are data analysts, data scientists and any users working with diverse, large, raw data sets that need significant preprocessing to transform them into analysis-ready data sets. This processing may be time consuming and the raw data too large to include in a package. DataPackageR simplifies the process of ensuring that this data processing is done reproducibly by ensuring vignettes are constructed that track how data is processed, ensure data set objects are documented, verifies checksums of individual objects and bumps data sets versions automatically, and decouples the data transformation from the usual build and installation process. The latter is particularly useful when raw data cannot be shared with the package or if processing such data is too time consuming to be re-run each time the package is build and installed using the usual R CMD BUILD process. The tool is useful for preparing analysis-ready data for publication with manuscripts, or sharing it for collaborative data analysis.
yours differ or meet our criteria for best-in-category?
The drake and workflowr packages are similar, in that they allow one to build reproducible workflows. DataPackageR is different in that its aim is to provide tool to help users implement the ideas found in ropensci/rrrpkg and cboettig/template and elsewhere, using their existing code with minimal effort. That code may leverage tools like workflowr and drake, but does not have to. DataPackageR provides the infrastructure to automate building, and documentation, and tracking data provenance via automated construction of vignettes documenting the transformation of raw data sets to R data objects ready for analysis, and packaging those into R data packages that can be shared.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.mdmatching JOSS's requirements with a high-level description in the package root or ininst/.Detail
Does
R CMD check(ordevtools::check()) succeed? Paste and describe any errors or warnings:Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
The package name uses camel case as it has been around for several years, used internally by our research group.
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Suggested reviewers
Jenny Bryan (jennybc)
Carl Boettiger (cboettig)
Ted Laderas (laderast)