Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drake (R package) #156

Closed
wlandau-lilly opened this issue Nov 9, 2017 · 104 comments

Comments

@wlandau-lilly
Copy link

commented Nov 9, 2017

Summary

The drake package is an R-focused pipeline toolkit. It reproducibly brings results up to date and automatically arranges computations into successive parallelizable stages. It has a Tidyverse-friendly front-end, powerful interactive visuals, and a vast arsenal of multicore and distributed computing backends.

Package: drake
Title: Data Frames in R for Make
Version: 4.4.1.9000
Authors@R: c(
  person(
    family = "Landau",
    given = c("William", "Michael"),
    email = "will.landau@lilly.com",
    role = c("aut", "cre")),
  person(
    family = "Axthelm",
    given = "Alex",
    email = "aaxthelm@che.IN.gov",
    role = "ctb"),
  person(
    family = "Clarkberg",
    given = "Jasper",
    email = "jasper@clarkberg.org",
    role = "ctb"),
  person(
    family = "Eli Lilly and Company",
    role = "cph"))
Description: A solution for reproducible code and 
  high-performance computing.
License: GPL-3
Depends:
  R (>= 3.2.0)
Imports:
  codetools,
  crayon,
  eply,
  evaluate,
  digest,
  formatR,
  future,
  grDevices,
  igraph,
  knitr,
  lubridate,
  magrittr,
  parallel,
  plyr,
  R.utils,
  rprojroot,
  stats,
  storr (>= 1.1.0),
  stringi,
  stringr,
  testthat,
  utils,
  visNetwork,
  withr
Suggests: 
  abind,
  DBI,
  future.batchtools,
  MASS,
  methods,
  RSQLite,
  rmarkdown,
  tibble
VignetteBuilder: knitr
URL: https://github.com/wlandau-lilly/drake
BugReports: https://github.com/wlandau-lilly/drake/issues
RoxygenNote: 6.0.1
  • URL: https://github.com/wlandau-lilly/drake

  • Fit: drake falls easily within reproducibility and high-performance computing.

  • Target audience: anyone who uses R for medium-to-long computations for which the results need to stay up to date with the dependencies.

Similar work

Remake

Drake overlaps with its direct predecessor, remake. In fact, drake owes its core ideas to remake and @richfitz, and explicit acknowledgements are in the documentation. However, drake surpasses remake in several important ways, including but not limited to the following.

  1. High-performance computing. Remake has no native parallel computing support. Drake, on the other hand, has a vast arsenal of parallel computing options, from local multicore computing to serious distributed computing. Thanks to future, future.batchtools, and batchtools, it is straightforward to configure a drake project for most popular job schedulers, such as SLURM, TORQUE, and the Sun/Univa Grid Engine, as well as systems contained in Docker images.
  2. A friendly interface. In remake, the user must manually write a YAML configuration file to arrange the steps of a workflow. In drake, this configuration is based on data frames that built-in wildcard templating functionality easily generates at scale.
  3. Thorough documentation. Drake contains eight vignettes, a comprehensive README, examples in the help files of user-side functions, and accessible example code that users can write with drake::example_drake().
  4. Active maintenance. Drake is actively developed and maintained, and issues are usually solved promptly.

Factual's drake

Factual's drake is similar in concept, but the development effort is completely unrelated to the R package of the same name.

Other pipeline toolkits

There are many other successful pipeline toolkits, and the drake package distinguishes itself with its R-focused approach, Tidyverse-friendly interface, and parallel computing flexibility.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coeveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)

I plan to submit to JOSS in the future, but the manuscript is not currently ready.

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
@sckott sckott added the package label Nov 9, 2017
@sckott

This comment has been minimized.

Copy link
Member

commented Nov 9, 2017

Thanks for your submission @wlandau-lilly ! Editors are discussing now

@wlandau-lilly

This comment has been minimized.

Copy link
Author

commented Nov 9, 2017

Thanks, @sckott.

Edit: I think @richfitz would be an excellent reviewer due to the similarity of remake, but I understand if a potential conflict of interest precludes his participation.

@wlandau-lilly

This comment has been minimized.

Copy link
Author

commented Nov 10, 2017

I just ran goodpractice::gp() on ropensci/drake@ee475f1:

It is good practice to

  x avoid the attach() and detach() functions, they are
    fragile and code that uses them will probably break sooner than
    later.

    tests/testthat/test-Makefile.R:165:3
    tests/testthat/test-Makefile.R:167:3
    tests/testthat/test-Makefile.R:192:3
    tests/testthat/test-Makefile.R:194:3

  x avoid calling setwd(), it changes the global environment.
    If you need it, consider using on.exit() to restore the working
    directory.

    tests/testthat/test-cache.R:43:3
    tests/testthat/test-cache.R:153:3
    tests/testthat/test-cache.R:216:3
    tests/testthat/test-cache.R:225:3
    tests/testthat/test-cache.R:241:3

I can explain these idiosyncrasies.

Calls to detach() in tests/testthat/test-Makefile.R

Drake promises to load the user's packages, which is especially important for distributed computing across multiple nodes on a cluster. To test, I occasionally need to call detach() to remove packages from search(). Unfortunately, unloadNamespace() does not have the desired effect.

Calls to setwd() in tests/testthat/test-cache.R

By default, drake searches through parent directories to find the current drake project's storr cache. To test, I need to change directories. But rest assured: every test is wrapped in a call to test_with_dir(), which uses withr::with_dir() to ensure that the original working directory is restored. Nested calls to withr::with_dir() give me errors.

@maelle

This comment has been minimized.

Copy link
Member

commented Nov 11, 2017

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @wlandau-lilly. Running goodpractice::gp() is actually my role but now we have your comments (and I get the same flags) so all is good. 😉

devtool::spell_check identified:

  • chioces make.Rd:193

I'm now looking for reviewers.


Reviewers: @jules32 @benmarwick @gothub
Due date: 2017-01-04

@maelle

This comment has been minimized.

Copy link
Member

commented Nov 11, 2017

@wlandau-lilly I forgot to mention you can now add this review badge to the README

[![](https://badges.ropensci.org/156_status.svg)](https://github.com/ropensci/onboarding/issues/156)

@wlandau-lilly

This comment has been minimized.

Copy link
Author

commented Nov 11, 2017

Thanks, @maelle! I appreciate your forgiveness regarding goodpractice::gp(), and I just fixed the spelling mistake in ropensci/drake@5c9388a. I will add the badge soon, and I am excited for the review process!

@maelle

This comment has been minimized.

Copy link
Member

commented Nov 13, 2017

@wlandau-lilly, good news, the reviewers are now assigned!

@jules32 and @benmarwick thanks a lot for accepting to review this package! 😸 Your reviews are due on the 2017-12-04.

@wlandau-lilly

This comment has been minimized.

Copy link
Author

commented Nov 13, 2017

Yes, thank you @jules32 and @benmarwick! I look forward to your feedback.

@maelle

This comment has been minimized.

Copy link
Member

commented Nov 28, 2017

@jules32 and @benmarwick friendly reminder that your reviews are due on the 2017-12-04 😉

@wlandau-lilly

This comment has been minimized.

Copy link
Author

commented Dec 5, 2017

@jules32 and @benmarwick, could we touch base about timing? Drake is large and developing fast, so I do understand that reviews may be more difficult than is typical.

@maelle

This comment has been minimized.

Copy link
Member

commented Dec 5, 2017

I forgot to update the thread when @jules32 contacted me to say she'd get the review in before Dec the 11th, sorry.

@benmarwick any update?

Thanks to both reviewers and thanks @wlandau-lilly for your patience. :-)

@jules32

This comment has been minimized.

Copy link

commented Dec 6, 2017

@benmarwick

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2017

Thanks for the reminders, here's my review:

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

p = partial
x = complete

  • [p] A statement of need clearly stating problems the software is designed to solve and its target audience in README
    BM: It could be more clear what drake does that no other package already does, Or why a user should use drake rather than make/remake/code chunks in Rmd/etc. The four points in 'similar work' section of the ropensci submission should also be in the readme. That said, some of these are a matter of style, e.g. YAML vs data frames for config, rather that outright novel function. Are there any projects using drake that a user can inspect to see real-world applications? Publications that can be cited? My sense is that many packages aimed at workflow management or improving reproducibility are quite idiomatic. This makes it hard for the common-or-garden variety R-user to see how they fit into their own ways of working. If you can help to bridge that gap between your idioms and the average user, that would make this pkg much more useful and valuable to the community.
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
    BM: I had no problem running these. However, I found the quickstart and examples difficult to relate to. For example, who writes their Rmd report in the console and passes it to an object as a character vector? That seems unnatural, to me at least, where the Rmd file is my main notebook and workbench. It would be easier to follow if the substance of the analysis was narrated in a little more detail. Perhaps a tiny actual research question with actual data would make this example more accessible? Perhaps also a comparison with a simple makefile to show makefile users (the main audience for this pkg) how to accomplish the same with drake (and why drake would be preferable). This would help a reader see how their existing workflow could be translated to the drake system. The drake system is a very comprehensive universe of functions, and new users will need a bit more guidance to see analogues between what drake does and what they're already using.
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • [p] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
    BM: Could CONTRIBUTING.md be located at the top level of the repo for better visibility?
For packages co-submitting to JOSS
  • [p] The package has an obvious research application according to JOSS's definition
    BM: I see that you do not plan to submit to JOSS at the moment, so this is just an incidental comment: It is easy to imagine research applications for drake, it is a very solid contribution to an active area of workflow and provenance tracking tools. However, the research application would be more obvious if the readme referred to some actual real-world uses of the package. For example, a list of domain-specific research project repos where drake is used (i.e. by biologists, economists, whatever), or a list of publications reporting results that were generated or enabled using drake. Currently it looks like drake has great potential, but hasn't actually been used in any real-world applications. Perhaps it has, but it's not clear from the pkg docs. Examples of use would help potential users better understand how drake can help them.

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • [p] Functionality: Any functional claims of the software been confirmed.
    BM: I have not tested the high-performance computing claims for this pkg
  • [p] Performance: Any performance claims of the software been confirmed.
    BM: yes, except for the parallel and high-performance claims.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 2


@wlandau-lilly

This comment has been minimized.

Copy link
Author

commented Jan 25, 2018

Wow, @maelle! Thank you so much! This has been an extremely rewarding and gratifying review process, and I am super grateful for your hard work!

One note before I begin the transition: now that drake is moving to rOpenSci, I want to take this opportunity to shift my open source work to my personal account, @wlandau. My colleagues and I agree this is for the best. Would you invite @wlandau to rOpenSci?

@maelle

This comment has been minimized.

Copy link
Member

commented Jan 25, 2018

Done!

@wlandau

This comment has been minimized.

Copy link

commented Jan 25, 2018

Wonderful! I transferred the repo, which was super exciting. One small thing: I do not have access to the "Settings" tab, and I cannot change the link to the documentation website at the top. Am I just over-eager, or is there another step?

capture

@wlandau

This comment has been minimized.

Copy link

commented Jan 25, 2018

I think I found the problem, and it is probably a common issue. @maelle, would you add me as a maintainer at https://github.com/orgs/ropensci/teams/drake/members?

maintainer

@maelle

This comment has been minimized.

Copy link
Member

commented Jan 25, 2018

Done! 😸

@wlandau

This comment has been minimized.

Copy link

commented Jan 25, 2018

Fantastic! Now, all the links reference rOpenSci rather than @wlandau-lilly. Before I activate Zenodo and submit to JOSS, I will make my final preparations for the release of 5.0.0 (soon to go to CRAN).

I have prepared a narrative-form blog post, and I am excited to share it. Is there somewhere I can submit a PR?

Also, could I append an rOpenSci footer to the README?

@maelle

This comment has been minimized.

Copy link
Member

commented Jan 25, 2018

@wlandau

This comment has been minimized.

Copy link

commented Jan 25, 2018

Also, before I create a release for Zenodo, I think I should confirm with @jules32 and @benmarwick about being listed as reviewers in the package itself.

@jules32

This comment has been minimized.

Copy link

commented Jan 25, 2018

@wlandau

This comment has been minimized.

Copy link

commented Jan 25, 2018

Thanks for letting me know, Julie. Your suggestions absolutely helped, and I am glad you were part of this. Please let me know if you change your mind.

@maelle

This comment has been minimized.

Copy link
Member

commented Jan 25, 2018

@jules32 as an editor I found your review very good and will most certainly try and recruit you again!

@stefaniebutland

This comment has been minimized.

Copy link

commented Jan 26, 2018

Hello @wlandau-lilly. I'm rOpenSci's Community Manager.

I have prepared a narrative-form blog post, and I am excited to share it. Is there somewhere I can submit a PR?
We'd love to have a post about this. Please submit a PR according to: https://github.com/ropensci/roweb2#contributing-a-blog-post. I will review it, provide some feedback and a suggested date to publish it.

Thank you!

@wlandau

This comment has been minimized.

Copy link

commented Jan 26, 2018

Thanks, @stefaniebutland! I just submitted ropensci/roweb2#140. Really looking forward to the next steps.

wlandau added a commit to ropensci/drake that referenced this issue Jan 26, 2018
Permission granted via email
wlandau added a commit to ropensci/drake that referenced this issue Jan 26, 2018
@wlandau

This comment has been minimized.

Copy link

commented Jan 26, 2018

@maelle, I have added @benmarwick as a reviewer in the DESCRIPTION (permission granted via email), and I have completed the 5 to-dos from here. I also submitted drake 5.0.0 to CRAN, added an rOpenSci footer to the README, and volunteered as a reviewer. This is all super exciting!

Besides the separate processes for the blog entry and the JOSS paper, there are a couple minor loose ends that will probably resolve themselves.

  1. AppVeyor CI is still deactivated.
  2. The rOpenSci badge still says "Under Review" in the GitHub README, but the same badge says "Peer Reviewed" on the documentation website.
@maelle

This comment has been minimized.

Copy link
Member

commented Jan 26, 2018

  1. Now activated, here is the badge code [![Build status](https://ci.appveyor.com/api/projects/status/4ypc9xnmqt70j94e?svg=true)](https://ci.appveyor.com/project/ropensci/drake)
  2. Yes they take a while to propagate, but tell us if it's still the case on Monday for instance.
@wlandau

This comment has been minimized.

Copy link

commented Jan 26, 2018

Thanks, @maelle. The AppVeyor builds appear to be working. I will check back on the badge in a few days.

@wlandau

This comment has been minimized.

Copy link

commented Jan 26, 2018

Another minor thing: I am getting emails like this one on every commit:

Build ropensci/drake@810b97d successful: https://ropensci.ocpu.io/drake/

NEW COMMITS:
[810b97d5ba]
Time: 2018-01-26 08:39:16
...

Is there a way to attenuate them? Are the other watchers receiving these emails, or just the maintainers? I do not see an OpenCPU webhook for drake, so I am guessing it is at the org level.

@jeroen

This comment has been minimized.

Copy link
Member

commented Jan 26, 2018

Those emails are from the CI system, they are sent to the pusher only. You can ignore/filter them.

@wlandau

This comment has been minimized.

Copy link

commented Jan 26, 2018

Good enough for me, thanks @jeroen.

@wlandau

This comment has been minimized.

Copy link

commented Jan 26, 2018

And JOSS just accepted drake! Blindingly fast!

@wlandau

This comment has been minimized.

Copy link

commented Jan 27, 2018

The README badge now says "Peer Reviewed", and the blog entry at ropensci/roweb2#140 is going well. I think we can safely close this issue.

@maelle maelle closed this Jan 27, 2018
@wlandau

This comment has been minimized.

Copy link

commented Jan 27, 2018

I sincerely thank you all for helping me take my favorite project to the next level. Your advice and coaching were eye-opening. I feel much more able to help people do their work.

@maelle

This comment has been minimized.

Copy link
Member

commented Jan 27, 2018

Thank you for your enthusiasm! Looking forward to reading your blog post!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.