targets and tarchetypes #401

wlandau · 2020-10-02T13:29:38Z

Submitting Author: Name (@wlandau)
Repository:

targets: https://github.com/wlandau/targets
tarchetypes: https://github.com/wlandau/tarchetypes. (tarchetypes is a small companion package that only serves to extend targets, so thought it appropriate to submit it for review along with targets.)
The manual: https://github.com/wlandau/targets-manual. (For https://github.com/ropensci-books.)

Version submitted:

0.0.0.9002 (targets)
0.0.0.9000 (tarchetypes and the manual)

Editor: @maurolepore
Reviewer 1: @limnoliver
Reviewer 2: @tjmahr
Archive: TBD
Version accepted: TBD

DESCRIPTION files

targets

Package: targets
Title: Dynamic Function-Oriented Make-Like Declarative Pipelines for R
Description: The targets package is a pipeline toolkit that brings together
  function-oriented programming and Make-like declarative workflows for
  Statistics and data science in R. It implements a workflow as collection of
  interconnected tasks, analyzes the dependency relationships among these
  tasks, skips steps that are already up to date, runs the necessary
  computations with optional parallel workers, abstracts files as
  R objects, and provides tangible evidence that the results match
  the underlying code and data. The methodology in this package
  borrows from GNU Make by Richard Stallman (2015, ISBN:978-9881443519)
  and drake by Will Landau (2018) <doi:10.21105/joss.00550>.
Version: 0.0.0.9001
License: MIT + file LICENSE
URL: https://wlandau.github.io/targets/, https://github.com/wlandau/targets
BugReports: https://github.com/wlandau/targets/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ),
  person(
    given = c("Matthew", "T."),
    family = "Warkentin",
    role = "ctb"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  callr (>= 3.4.3),
  cli (>= 2.0.2),
  codetools (>= 0.2.16),
  data.table (>= 1.12.8),
  digest (>= 0.6.25),
  igraph (>= 1.2.5),
  R6 (>= 2.4.1),
  rlang (>= 0.4.5),
  tibble (>= 3.0.1),
  tidyselect (>= 1.1.0),
  utils,
  vctrs (>= 0.2.4),
  withr (>= 2.1.2)
Suggests:
  aws.s3 (>= 0.3.21),
  clustermq (>= 0.8.9),
  curl (>= 4.3),
  dplyr (>= 1.0.0),
  fst (>= 0.9.2),
  future (>= 1.17.0),
  keras (>= 2.2.5.0),
  knitr (>= 1.30),
  rmarkdown (>= 2.4),
  qs (>= 0.23.2),
  rstudioapi (>= 0.11),
  testthat (>= 2.3.2),
  torch (>= 0.1.0),
  usethis (>= 1.6.3),
  visNetwork (>= 2.0.9)
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1.9000
VignetteBuilder: knitr

tarchetypes

Package: tarchetypes
Title: Archetypes for Targets
Description: The targets package is a pipeline toolkit that brings together
  function-oriented programming and Make-like declarative workflows for
  Statistics and data science in R. The tarchetypes package provides
  convenient user-side functions to create specialized targets,
  making pipelines easier to create and read. The methods in this package
  were influenced by the drake R package by Will Landau (2018)
  <doi:10.21105/joss.00550>.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/tarchetypes/, https://github.com/wlandau/tarchetypes
BugReports: https://github.com/wlandau/tarchetypes/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  fs (>= 1.4.2),
  rlang (>= 0.4.7),
  targets,
  tidyselect (>= 1.1.0),
  utils,
  vctrs (>= 0.3.4),
  withr (>= 2.1.2)
Suggests:
  digest (>= 0.6.25),
  knitr (>= 1.28),
  rmarkdown (>= 2.1),
  testthat (>= 2.3.2)
Remotes:
  wlandau/targets
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1.9000

Manual

Package: targets.manual
Title: Targets R Package User Manual
Description: This repository contains the source files of the targets R
  package user manual.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/targets-manual,
  https://github.com/wlandau/targets-manual
BugReports: https://github.com/wlandau/targets-manual/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  biglm (>= 0.9.2),
  bookdown (>= 0.19),
  fs (>= 1.4.1),
  purrr (>= 0.3.4),
  tarchetypes,
  targets,
  tidyverse (>= 1.3.0),
  visNetwork (>= 2.0.9),
  withr (>= 2.2.0)
Remotes:
  wlandau/tarchetypes,
  wlandau/targets
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.0
VignetteBuilder: knitr

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- data munging
- data deposition
- workflow automation
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

targets is an R-focused pipeline toolkit for Make-like declarative workflows. It resolves the dependency relationships among steps of a data analysis workflow and skips steps that are already up to date.

Who is the target audience and what are scientific applications of this package?

targets is for R users who maintain computationally intense function-oriented data analysis projects (with large codebases and/or long runtimes). Such projects may include but are not limited to Bayesian statistics, simulation, machine learning, PK/PD, and spatial statistics.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

targets is the long-term successor to drake. After four years of development, drake has improved so much that its insurmountable problems have become its most pressing ones. A new package is necessary to advance the capability further. So while I still believe drake is thriving, and even though I will continue to maintain drake indefinitely, I created targets to try to break new ground. At https://wlandau.github.io/targets/articles/need.html#drake, I take a detailed dive into the ways that targets surpasses drake's permanent limitations.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

N/A

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

Technical checks

Confirm each of the following by checking the box.

I have read the guide for authors and rOpenSci packaging guide.

This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions, created with roxygen2.
contains a vignette with examples of its essential functions and uses. To increase modularity and reduce package check time, all the user-side vignettes actually live at https://github.com/wlandau/targets-manual (deployed to https://wlandau.github.io/targets-manual). Repos at https://github.com/wlandau/targets-minimal, https://github.com/wlandau/targets-stan, and https://github.com/wlandau/targets-keras have the complete source code for example use cases. To avoid encumbering core targets and to avoid maintaining duplicated documentation, the vignettes of the actual package only include the statement of need and design documents. The README is deliberately short and links to all this existing documentation.
has a test suite. Not all of the tests in targets can be automated, especially when it comes to visualization and profiling, so many of the tests live in non-testthat folders in https://github.com/wlandau/targets/tree/master/tests. Whenever I use a #nocov block, I always include a comment with a justification and/or a reference to one of these semi-automated tests.
has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Options

The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/. ~~I have written a paper.md, but I need to run it through my company's scientific disclosure process before I share it. That could take a few weeks.~~ paper.md and paper.bib now disclosed and included inside inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

The package is novel and will be of interest to the broad readership of the journal.
The manuscript describing the package is no longer than 3000 words.
You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

The text was updated successfully, but these errors were encountered:

melvidoni · 2020-10-06T00:57:57Z

Hello @wlandau for the submission. Before I address it further, could you elaborate on the need for submitting both packages on a single submission? Could this be split into two different submissions? (Thinking about potential reviewer workload). Please, be thorough.

wlandau · 2020-10-06T01:41:02Z

tarchetypes is a collection of user-level utilities for targets, and it does not stand on its own. Its purpose is to make targets easier to use, and it requires prior familiarity with the latter. I originally considered implementing both in a single package, and I still view their physical separation as an implementation detail. So I thought that separate submissions would potentially duplicate the overall reviewer-side overhead of learning targets, especially compared to the relatively small size of tarchetypes. In addition, tarchetypes helpers like tar_render() also appear in the manual.

wlandau · 2020-10-06T11:59:59Z

But I am still happy to split up the submission if you think it best. Please let me know what you decide.

melvidoni · 2020-10-07T19:31:34Z

Hello @wlandau, thanks for the clarifications. @maurolepore will be your handling editor.

maurolepore · 2020-10-17T00:20:38Z

@wlandau, thanks for the high quality of this submission. I'm pleased to be the
editor.

Here I first give an overview of what I see, then ask for more information to
better understand if this or another structure is the best for this submission,
and finally comment on the editor checks. Please note the comments preceded by
a bullet point and respond to those preceded by a check-box; you may refer to their
labels.

In this submission I see three packages: targets, tarchetypes, and
targets.manual. I understand that (a) targets supersedes the popular package
drake, also maintained by you; (b) tarchetyes extends targets; and (c)
targets.manual would be published as an rOpenSci book containing the manual
for both targets and tarchetypes. Please correct me if necessary.

With three packages, this submission is more complex than the typical 1-package
submission. This may add extra pressure on resources such as rOpenSci reviewers,
and rOpenSci and CRAN servers. Please help me understand why this is the best
alternative (if possible, point me to the source of information that guided your
decision, e.g. books, articles, conversations with other developers):

(ml01) How did you decide to supersede drake instead of extend it? Did
you considered any other option, such as the "edition" approach of testthat 3?
(ml02) How did you decide that targets and tarchetypes should be two
separate packages?
(ml03) Who would you suggest as a reviewer for this submission? Please
consider people with skills that complement yours, specially if they have
experience restructuring popular R packages.

Your answers will clarify the fit -- which I leave unchecked.

`targets`

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

README

(ml04) Under What about
drake? I see "It [is] has
become ...". You may remove "is".
(ml05) Under Help, where it
says "Post to the [GitHub issue tracker]", the link seems incorrect:
https://github.com/wlandau/issues.

(ml06) In general, I see an opportunity to remove duplication and thus make
the package easier to maintain. For example, the file "README.Rmd" links to
multiple pages under the same website; a single link to the website might be
enough and easier to maintain. Similarly, under
Help I see links to repositories
other than targets. This may be hard to maintain. In isolation, this may seem
like a minor issue, but minor issues can compound quickly.

CI

(ml07) In the file "DESCRIPTION" I see Depends: R (>= 3.5.0), but the
file ".github/workflows/check.yaml" covers R-release only. Please extend the
continuous-integration workflow to run R CMD check from R-devel to the R
version stated in the file "DESCRIPTION"
(?usethis::use_github_action_check_full()).

`tarchetypes`

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

(ml08) Same as ml06.

(ml09) Same as ml07.

`targets.manual`

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

(ml10) The name of the package and GitHub repository don't match: "targets.manual" (with ".") and "target-manual" (with "-"). Because the symbol "-" is invalid in the name of an R package, consider renaming the repository to "targets.manual". The inconsistency might confuse some users.
(ml11) devtools::install_github() failed with upgrade = "default" and passed with upgrade = "never"`. The error seems unrelated to the package but please check.

devtools::install_github("wlandau/targets-manual", dependencies = TRUE, 
    build_vignettes = TRUE)
#> Using github PAT from envvar GITHUB_PAT
#> Downloading GitHub repo wlandau/targets-manual@HEAD
#> readr (1.3.1 -> 1.4.0) [CRAN]
#> Installing 1 packages: readr
#> Installing package into '/home/mauro/R/x86_64-pc-linux-gnu-library/4.0'
#> (as 'lib' is unspecified)
#> Error: Failed to install 'targets.manual' from GitHub:
#>   (converted from warning) installation of package 'readr' had non-zero exit status

(ml12) R CMD check found 1 ERROR. Please fix the error or explain if R CMD check is not applicable to this type of package.

#> > checking package dependencies ... ERROR
#>   VignetteBuilder package not declared: ‘knitr’

(ml13) Tests seem not applicable to the book manual. I see no test in other
books, e.g. https://github.com/ropensci-books/http-testing and
https://github.com/ropensci-books/taxize.

(ml14) I see a mismatch between the copy right holder in DESCRIPTION and
LICENSE (but the copy right holder in DESCRIPTION and LICENSE.md do match). Please ensure they match.
(ml15) Please update the words list.

devtools::spell_check()
#>   WORD        FOUND IN
#> ’s        index.Rmd:55
#> indivdual   index.Rmd:91
#> learnings   index.Rmd:55
#> repo        README.md:5
#> toolkits    index.Rmd:45,47,51
#> zenodo      README.md:3

Reviewers:

Reviewer: @limnoliver
Reviewer: @tjmahr
Due date: 2020-12-19*

* This is four weeks from the day a second reviewer confirmed. I discussed with the chief editor @melvidoni to allow one week more than usual, because of COVID and because this submission includes 2 packages.

wlandau · 2020-10-17T04:59:18Z

Thank you for your insightful feedback, @maurolepore. Below, I address ml04-ml15. (I will respond to ml01-ml03 in a separate post.)

ml04: odd, I thought I removed that grammatical typo earlier. Fixed in ropensci/targets@454ddf5.
ml05: link fixed in ropensci/targets@a3491b1.
ml06: I agree, and avoiding duplication is one of my major documentation goals for targets. Addressed in ropensci/targets@a3491b1.
ml07: I added new CI checks under R 3.5 and 3.6 in ropensci/targets@b1a3261
ml08: I deduplicated some links in ropensci/tarchetypes@d2915e9. I prefer to keep the existing links to individual package functions.
ml09: addressed in ropensci/tarchetypes@c2e94ea.
ml10-ml13: I intend targets-manual to just be a bookdown manual, not an installable package. The only purpose of the DESCRIPTION was to make CI easier. To reduce confusion, in ropensci-books/targets@a2092f9 and ropensci-books/targets@5a69d0c, I removed the DESCRIPTION and moved the relevant content into README.md and packages.R.
ml14: also fixed in ropensci-books/targets@a2092f9 and ropensci-books/targets@5a69d0c. DESCRIPTION and LICENSE are gone, and README.md is consistent with LICENSE.md.
ml15: without a DESCRIPTION, devtools::spell_check() no longer works. However, I did fix a spelling issue in ropensci-books/targets@7ed0388 ("indivdual" => "individual").

maurolepore · 2020-10-17T14:32:31Z

Thanks @wlandau for addressing my comments.

ml10-13 and ml15:

The only purpose of the DESCRIPTION was to make CI easier. To reduce confusion, in ropensci-books/targets@a2092f9 and ropensci-books/targets@5a69d0c, I removed the DESCRIPTION and moved the relevant content into README.md and packages.R.

(ml16) I support this (more on ml18 below). Now that I understand it, I prefer your original take -- it aligns with the idea of "research compendium". Please revert to it, then explain the goal, structure, and usage of the repository in the field "Description" of the file DESCRIPTION.

(ml17) Even if targets.manual is not meant to be built and installed as a package, consider adding the minimum infrastructure to the repository for devtools::check() to pass cleanly -- that seems like the path of least resistance. If this is a bad idea, maybe write a broken test with the error message you want someone like me to read.

--

(ml18) I support anything that makes your work less error prone and easier to maintain. My comments ml01-ml03 focused on external resources, but I forgot to stress that my main concern is maintainability. Your work is outstanding; the R community will benefit from you investing time in the important things instead of mundane tasks. If those mundane tasks are avoidable (e.g. with continuous integration, an "edition" approach, or rethinking the architecture of you system), then my goal is to ensure you consider those options. Collectively, the rOpenSci community has a lot of experience that should help you make informed, good choices.

wlandau · 2020-10-17T16:08:44Z

ml18: Thanks, @maurolepore. This helps me answer your questions. The way I am structuring things now, including the separation into multiple repositories, is largely to support maintainability. I will elaborate in my subsequent posts.

wlandau · 2020-10-17T16:10:43Z

Below, I answer ml02 and ml03, and I address the physical separation of targets-manual and other documentation. ml01 requires a much deeper answer, so I will address it in its own post.

ml02

I have had more time to reflect since @melvidoni first brought this up, and I will try to elaborate.

I could have easily implemented targets and tarchetypes in the same package. Both are essentially the same system, and tarchetypes is currently very small, supporting only a limited handful of superficial extensions to targets' deeper capabilities. So for the present, the physical separation is a trivial implementation detail, and I do not believe it will exacerbate the burden of software review.

The separation of tarchetypes is about planning for the future. From what I learned maintaining drake and observing use cases, tarchetypes will enhance maintainability, the quality of the infrastructure and user-side freedom for years to come.

Sustainable infrastructure

Interface development incurs additional challenges, code volume, bugs, tests, and documentation. I learned this the hard way while developing static branching in drake. Because drake's design did not allow me to implement static branching in a separate package, drake itself became more difficult to maintain, more prone to feature creep, and more prone to errors (see ropensci/drake#1199, ropensci/drake#1262, ropensci/drake#1010, ropensci/drake#1009, ropensci/drake#1008, and many more issues like these). And although I fixed all the known and reported bugs, the deeper underlying causes only worsened as the years went by, and the fundamental design of the internals and the interface made it impossible to eliminate these problems in drake itself.

As its own package, tarchetypes has plenty of room to explore alternative interfaces and shorthand for targets and pipelines. That way,targets itself will stay light, elegant, and sharply delineated in scope, and the architecture will remain clean, resilient, and sustainable in the long run.

A precedent for extensibility

In addition, tarchetypes deliberately sets a precedent for enhanced automation through third-party interface development. tarchetypes functions like tar_render() and tar_map() are easy to implement using targets::tar_target_raw(), and even now, developers are already borrowing the pattern to develop their own specialized interfaces. Internally at work, for example, one of my colleagues maintains a package for domain-specific simulation studies in the life sciences, and it has custom target archetypes tailored to the specific needs of these internal pipelines. An earlier version of this same package used drake, and the metaprogramming on top of drake_plan() was messy and nearly impossible.

`targets-manual`

In the early days of drake, all the documentation lived in the README and vignettes. Over time, the vignettes steadily accrued volume, length, and R package dependencies, and drake became difficult to install. So I moved the vignettes to their own bookdown project, and the problem was solved.

With targets-manual, I am trying to do the same thing right from the start, even though it is much shorter than drake's manual at the moment. So targets-manual is just a collection of user-side vignettes that happens to be in its own GitHub repository. As with tarchetypes, I do not believe the physical separation of targets and targets-manual increases the burden of software review relative to unifying them in a single package. Conceptually, they still belong together.

ml03

My top reviewer recommendations are power users who have already provided extensive feedback and expressed strong enthusiasm about targets. Their prior familiarity could reduce the workload.

Otherwise, it may help to reach out to folks who have spread drake throughout the community. Here are some names that come to mind.

@krlmlr
@thebioengineer
@aedobbyn
@matt-dray
@matthewstrasiotto
@tjmahr
@sinarueeger
@cstawitz
@bpbond
@gadenbuie
@billdenney
@tiernanmartin
@b-rodrigues
@DominikRafacz
@pat-s
@maelle (editor of drake's software review: drake (R package) #156)
@karthik (though I am not sure if co-founders also review packages)

tjmahr · 2020-10-17T16:43:41Z

I have been using targets/tarchetypes for one my projects for the past month. I would be able to provide feedback as a desktop/local machine workflow user.

mattwarkentin · 2020-10-17T16:52:18Z

Happy to be considered as a possible reviewer of targets and tarchetypes. I use both packages on a near-daily basis for many on-going projects, including projects that are developed and built as purely local, purely remote (live and run on HPC) and a mixture of local and remote (live local and selectively run locally or on HPC via SSH).

wlandau · 2020-10-17T17:11:37Z

Thank you both for your eagerness to help!

@maurolepore, I addressed ml16 and ml17 in commits ropensci-books/targets@f792c14 through ropensci-books/targets@2ee3725.

Do you still think it is necessary to move https://github.com/wlandau/targets-manual to https://github.com/wlandau/targets.manual? If the manual is accepted into rOpenSci, I expect the hyphen to go away on its own when the URLs to move to https://github.com/ropensci-books/targets and https://books.ropensci.org/targets.

Other than that, I believe only ml01 remains for now.

strazto · 2020-10-17T17:37:07Z

I'm deeply appreciative to be considered to help out with targets.

I've been following it's development with interest as a daily drake user on remote systems/hpc environments with an interest in usability and discoverability in data pipelines.

I've been hesitant to migrate my workflow over. From your description, it'll be valuable to do so and I look forward to trying it out.

I'll also be making an effort to update my extension package for drake, mandrake to support targets.

billdenney · 2020-10-17T19:45:49Z

While I'm very interested in targets, I unfortunately can't help as a reviewer in the next month or two.

wlandau · 2020-10-17T22:08:48Z

Almost forgot: @mattwarkentin is actually a formal contributor because of ropensci/targets#170, and @noamross has a PR at ropensci/tarchetypes#9 that I meant to keep open. @melvidoni and @maurolepore, does this mean they might have to recuse themselves?

wlandau · 2020-10-18T01:30:37Z

Now, I will address ml01. I think it is the most pressing question from this thread so far, and it is definitely the most difficult to explain in complete depth. So please let me know if you are skeptical or if anything remains unclear or unresolved.

What all this means for drake

drake is in a position of strength, and this is exactly what allowed me to create targets in the first place. After four years of steady development, I feel that we have solved most of the solvable problems worth mentioning, which finally forced me to reckon with the unsolvable ones. targets is not about remediation, it is about breaking new ground.

Maintenance

I will maintain drake indefinitely. I will continue to provide one-on-one help with use cases, fix known bugs, address known inefficiencies, and consider new community-requested features. However, since I am not going to propose new feature ideas of my own or strike out on Odyssean refactoring adventures, maintenance will be far easier and less time-consuming. In fact, because targets has a much cleaner design, the combined maintenance of drake, targets, tarchetypes, and the docs could prove far less demanding than that of drake alone up to this point.

Why not an edition?

I have been hearing quite a lot about testthat editions lately. I read the vignette, and I just updated targets and tarchetypes so their test suites are compatible with both editions 2 and 3. So although I did not actually inspect the changes to the testthat code base, I think I get the idea. From the last section of the vignette:

You might wonder why we came up with the idea of an “edition”, rather than creating a new package like testthat3. We decided against making a new package because the 2nd and 3rd edition share a very large amount of code, so making a new package would have substantially increased the maintenance burden: the majority of bugs would’ve needed to be fixed in two places.

I think Hadley says it well. The breaking changes in testthat would ordinarily be disruptive to users because of the thousands of reverse dependencies. However, the impact on the testthat package itself is minor, and a totally new package would be overkill.

With drake, similar changes have occurred a small handful of times in the past. In version 6, I restructured the cache in order to improve efficiency. The change was not back-compatible, and I configured drake to error out at the right time, preserve existing data, and walk users through their available options. Similarly, version 7 removed all the obsolete parallel backends and threw a deprecation warning. These small but disruptive changes were manageable, and they did not require a new package.

This time, however, the change is tectonic. drake's goals have transformed, and the package is on the brink of outgrowing its own architecture and design principles. More than that, my entire philosophy of programming has shifted, and I think completely differently about what a pipeline tool should do and how it should be designed. A new package seemed like the only way to make meaningful progress, and I am glad I went that direction.

Motivation

drake is the largest and most fruitful software development undertaking I have ever attempted. Just working through the technical issues more than doubled my proficiency in R, and the success of the tool granted me access to a fountain of community wisdom. To say I learned a lot would be an understatement. I am not the same developer I was four years ago, and I think totally differently about software.

For the purposes of targets, if I had to identify a single tipping point, it would probably be a book recommendation from Jim Hester: Design Patterns: Elements of Reusable Object-Oriented Software, the classic text by Gamma, Helm, Johnson, and Vlissides. At the time, I already had some experience with traditional message-passing OOP, but I was still confused about the problems it solved, and I was unsure how to apply it properly. The first few chapters finally answered questions about when to use object composition instead of inheritance, the importance of mutable objects for certain use cases, and how to choose a mental model that fits the task at hand.

I began to see the limitations of drake's design. drake uses simple immutable data structures such as ad hoc lists, some of them god objects. As a result, it has serious trouble expressing features such as static branching, dynamic branching, and dynamic files. Seamless targets-style cloud integration was beyond reach, and compatibility between dynamic files and data recovery was nearly impossible and super messy to enforce.

Finally, I realized I would no longer be able to make nontrivial improvements to drake as long as it relied on the old infrastructure. The path forward needed not only a complete rewrite and a complete reevaluation of every feature, but also an entirely different programming paradigm and philosophy.

A new design

Internally, targets is a collection of interdependent, carefully constructed, sharply defined, lightweight R6 and S3 classes with mutable instances. I built these 74 classes from the ground up in order to express the complicated reasoning that happens both within and among targets, and the mental model easily captures the behavior of a dynamic pipeline tool. This internal harmonization paved the way for several improvements that were impossible in drake, some of which appear in the statement of need, notably cloud storage integration, interface extensibility, and parallel-efficient flexible dynamic branching.

Let's take dynamic branching as an example. drake struggles with this because it does not have a formal structure to express what it means to be a target. targets, on the other hand, supports a formal inheritance hierarchy of classes to express the unique role of each target in the dynamic branching process. For example, "stems" are unbranched targets capable of producing "buds", and "patterns" are special templates that generate the actual branches. (This design document overviews the different kinds of targets and their role in dynamic branching.) All instances are mutable, and the mental model completely aligns with the realities of implementation. There is rigidness where we need rigidness, and there is flexibility where we need flexibility. This is what allows targets to overcome the permanent map-reduce efficiency limitation of drake.

wlandau · 2020-10-18T01:38:40Z

And with that, I think I am all caught up on editor requests for the moment. When the time comes, please let me know what else I can do.

wlandau · 2020-10-18T19:23:31Z

An aside: you can read a lot into the package names here. "drake" is an acronym: "data frames in R for Make". In the first few versions, drake::make(parallelism = "Makefile") would actually take a data frame of commands (a drake plan) and turn it into a real Makefile. drake has long since moved beyond this approach, and the "Makefile" backend is no longer supported. In other words, drake outgrew its own name.

The name "targets" represents a different way of thinking about the design. Whereas drake's data structures operate at the level of the entire pipeline, nearly all the structure and all the reasoning of targets happens on an individual target-by-target basis. In targets, target objects are first-class citizens, and they express almost all of the behavior through method dispatch and through the nested OOP objects that compose them. This reductionism really fits the situation, and it ensures the mental model aligns with the realities of implementation.

DominikRafacz · 2020-10-19T07:56:41Z

Thanks for thinking of me as a reviewer! I don't have much time to offer, but if I'm useful, I'll be happy to get involved, especially if there's a specific task to do.

bpbond · 2020-10-19T10:56:34Z

Odyssean refactoring adventures

🚣

That's a thoughtful and interesting comment @wlandau . Happy to help review if useful.

maurolepore · 2020-10-19T13:47:02Z

Thanks @wlandau for addressing my comments. I'll come back to you likely next week. I would like to read your comments in greater detail, and discuss with other editors what might be the best two reviewers for this submission. But right now I'm on a short vacation with limited access to internet. I'm excited about how interesting this discussion already is. Thanks! 👍

maurolepore · 2020-10-27T05:15:48Z

Thanks @wlandau for your patience. I can now confirm this submission satisfies editor checks. I'll start seeking reviewers (thanks for your suggestoins).

Please ensure the README files of each package in this submission has an rOpenSci review badge -- maybe via --rodev::use_review_badge(), rodev::use_review_badge(<issue_number>). Badge URL is https://badges.ropensci.org/<issue_id>_status.svg. Full link should be:

maurolepore · 2020-10-27T05:29:38Z

RE #401 (comment)
Thanks for the heads up. If a potential conflict of interest is unclearly defined in the guidelines, I'll discuss with other editors. For now you should not worry.

wlandau · 2021-01-12T12:36:32Z

Before I forget: @limnoliver, @tjmahr, and @maurolepore, would you like to be listed as "reviewers" in the DESCRIPTION files of the repos under review? If so, would you like me to include your ORCIDs?

It is amazing how much better the documentation has become as a result of your input. I am more optimistic now that new users will find targets accessible. Even though I feel like I already went down this road with drake, there is always much more to learn.

limnoliver · 2021-01-12T18:03:10Z

I approve as well. I like the additions of the "Overview" vignette and the "Getting Started" section of the README. Overall, I think the new users of targets will have a much easier time navigating these resources now and getting jump started.

wlandau · 2021-01-12T18:19:33Z

Fantastic, thank you so much @limnoliver and @tjmahr!

maurolepore · 2021-01-12T18:25:49Z

Approved! Thanks @wlandau for submitting and @limnoliver and @tjmahr for your reviews! 🥇

To-dos:

Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
Fix all links to the GitHub repo to point to the repo under the ropensci organization.
Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current pkgdown website with a redirecting page
- replace your package docs URL with https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname). If Appveyor does not pick up new commits after transfer, you might need to delete and re-create the Appveyor project. (Repo transfers are smoother with Travis CI and GitHub Actions)
We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

…conduct file

maurolepore · 2021-01-12T18:42:46Z

DONE

Create a two-person team in rOpenSci’s “ropensci” GitHub organization, named for the package, with yourself and the package author as members.

Go to the repository settings in rOpenSci’s “ropensci” GitHub organization and give the author “Admin” access to the repository.

If authors maintain a gitbook that is at least partly about their package, contact an rOpenSci staff member so they might contact the authors about transfer to the ropensci-books GitHub organisation.**

NOMINATION

Nominate a package to be featured in an rOpenSci blog post or tech note if you think it might be of high interest. Please note in the software review issue one or two things the author could highlight, and tag @ropensci/blog-editors

@ropensci/blog-editors, I think these packages might be of high interest. I suspect many readers will be familiar with the drake package and may wonder how it compares to targets and how to migrate; they may also be interested in the relationship with tarchetypes.

ropensci/software-review#401 (comment)

stefaniebutland · 2021-01-12T20:40:24Z

Agreed. @wlandau we must have a post about this 😉. We're lucky to have this in the rOpenSci organization.

Guidelines: https://blogguide.ropensci.org/. When you're ready Will, please suggest a date to submit a draft post and my colleague @steffilazerte will review it.

wlandau · 2021-01-12T20:48:14Z

Thanks, I would love to submit a post! I did write a draft, but much has happened since then, and it needs a lot of work. I will submit a PR when it is ready.

wlandau · 2021-01-13T20:55:06Z

My apologies @tjmahr and @limnoliver, somehow I managed to miss this from your reviews:

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

I will add you both.

wlandau · 2021-01-13T21:12:38Z

"rev" roles now added.

maelle · 2021-01-14T11:41:41Z

@wlandau I've sent you an invite to the ropensci-books organization so you might transfer your book repo there. Thank you!

wlandau · 2021-01-14T12:43:50Z

Thanks so much, @maelle! As I discussed with @maurolepore, I transferred wlandau/targets-manual and wlandau/targets-design to ropensci-books.

Just one last thing: would you grant me access to the settings of both ropensci-books/targets-manual and ropensci-books/targets-design? I would like to rename ropensci-books/targets-manual to the more concise ropensci-books/targets.

maelle · 2021-01-14T12:48:45Z

@wlandau I've now made you an admin of both repos. 🙂

wlandau · 2021-01-14T12:51:46Z

Amazing, thank you! And the URLs are working!

wlandau · 2021-01-14T13:11:50Z

@maurolepore, I believe I addressed the items in #401 (comment) (except the URL redirects, which are currently not working for the ropensci-books repos). Please let me know if there is anything I missed.

wlandau · 2021-01-14T13:59:07Z

I just submitted targets to CRAN, and I will submit tarchetypes as soon as targets is accepted and the binaries are available. My colleagues and I look forward to the upcoming cascade of internal production releases that will soon follow.

In the near future, I would like to post an rOpenSci submission for jagstargets, an R Targetopia package for Bayesian data analysis with JAGS. jagstargets should be a nice warmup for stantargets, a similar but larger package built on targets and Stan.

This review process was incredibly rewarding, and targets is much more accessible as a result. I cannot thank you enough!

maurolepore · 2021-01-14T19:11:48Z

@maurolepore, I believe I addressed the items in #401 (comment) (except the URL redirects, which are currently not working for the ropensci-books repos).

Thanks! I'm sure you'll keep on top of it. Let me know on Slack if you need help.

Following https://devguide.ropensci.org/editorguide.html I added a "peer-reviewed" topic to targets and tarchetypes, and I'm now closing this software-review issue.

Thank you to our amazing reviewers @limnoliver and @tjmahr and amazing author @wlandau. I look forward to seeing the R community thrive with these new packages. Best luck with the submission to CRAN.

wlandau · 2021-01-14T21:20:01Z

Fantastic, @maurolepore! Thank you so much for your conscientiousness and diligence during this whole process.

An update: the redirects and 404 pages are now working. The remaining issues are purely cosmetic and probably have to do with how R Markdown sites get rendered: https://community.rstudio.com/t/trouble-customizing-404-page-of-r-markdown-website/93142.

melvidoni assigned maurolepore Oct 7, 2020

melvidoni added 1/editor-checks topic:workflow-automation labels Oct 7, 2020

This was referenced Oct 12, 2020

Mention targets package in README? ropensci/drake#1332

Closed

Static branching exercises error wlandau/learndrake#35

Closed

maurolepore added 2/seeking-reviewer(s) and removed 1/editor-checks labels Oct 27, 2020

maurolepore added 6/approved and removed 5/awaiting-reviewer(s)-response labels Jan 12, 2021

wlandau-lilly pushed a commit to ropensci/targets that referenced this issue Jan 12, 2021

As requested (ropensci/software-review#401 (comment)) delete code of …

f5de894

…conduct file

wlandau-lilly pushed a commit to ropensci/tarchetypes that referenced this issue Jan 12, 2021

As requested (ropensci/software-review#401 (comment)) delete code of …

33ece28

…conduct file

wlandau-lilly pushed a commit to ropensci/tarchetypes that referenced this issue Jan 12, 2021

Link to rOpenSci code of conduct

0d2f65b

ropensci/software-review#401 (comment)

wlandau-lilly pushed a commit to ropensci/targets that referenced this issue Jan 12, 2021

Link to rOpenSci code of conduct

3ecbeaa

ropensci/software-review#401 (comment)

wlandau mentioned this issue Jan 12, 2021

Is second element in map treated as a list? ropensci/targets#266

Closed

maurolepore closed this as completed Jan 14, 2021

wlandau mentioned this issue Jan 23, 2021

targets technote ropensci/roweb3#122

Merged

13 tasks

wlandau mentioned this issue Mar 3, 2021

stantargets: reproducible Stan pipelines at scale #430

Closed

27 tasks

joelnitta mentioned this issue Oct 7, 2022

Add documentation about how to handle closely related packages ropensci/dev_guide#562

Open

targets and tarchetypes #401

targets and tarchetypes #401

Comments

wlandau commented Oct 2, 2020 • edited by maurolepore Loading

DESCRIPTION files

Scope

Technical checks

Publication options

Code of conduct

melvidoni commented Oct 6, 2020

wlandau commented Oct 6, 2020 • edited Loading

wlandau commented Oct 6, 2020 • edited Loading

melvidoni commented Oct 7, 2020

maurolepore commented Oct 17, 2020 • edited Loading

targets

Editor checks:

Editor comments

tarchetypes

Editor checks:

Editor comments

targets.manual

Editor checks:

Editor comments

wlandau commented Oct 17, 2020

maurolepore commented Oct 17, 2020 • edited Loading

wlandau commented Oct 17, 2020

wlandau commented Oct 17, 2020 • edited Loading

ml02

Sustainable infrastructure

A precedent for extensibility

targets-manual

Other documentation

ml03

tjmahr commented Oct 17, 2020 via email

mattwarkentin commented Oct 17, 2020

wlandau commented Oct 17, 2020

strazto commented Oct 17, 2020 • edited Loading

billdenney commented Oct 17, 2020

wlandau commented Oct 17, 2020

wlandau commented Oct 18, 2020 • edited Loading

What all this means for drake

Maintenance

Why not an edition?

Motivation

A new design

wlandau commented Oct 18, 2020

wlandau commented Oct 18, 2020 • edited Loading

DominikRafacz commented Oct 19, 2020

bpbond commented Oct 19, 2020

maurolepore commented Oct 19, 2020

maurolepore commented Oct 27, 2020

maurolepore commented Oct 27, 2020

wlandau commented Jan 12, 2021 • edited Loading

limnoliver commented Jan 12, 2021

wlandau commented Jan 12, 2021

maurolepore commented Jan 12, 2021 • edited Loading

maurolepore commented Jan 12, 2021 • edited Loading

stefaniebutland commented Jan 12, 2021

wlandau commented Jan 12, 2021

wlandau commented Jan 13, 2021

wlandau commented Jan 13, 2021

maelle commented Jan 14, 2021

wlandau commented Jan 14, 2021 • edited Loading

maelle commented Jan 14, 2021

wlandau commented Jan 14, 2021

wlandau commented Jan 14, 2021

wlandau commented Jan 14, 2021 • edited Loading

maurolepore commented Jan 14, 2021

wlandau commented Jan 14, 2021

wlandau commented Oct 2, 2020 •

edited by maurolepore

Loading

wlandau commented Oct 6, 2020 •

edited

Loading

wlandau commented Oct 6, 2020 •

edited

Loading

maurolepore commented Oct 17, 2020 •

edited

Loading

`targets`

`tarchetypes`

`targets.manual`

maurolepore commented Oct 17, 2020 •

edited

Loading

wlandau commented Oct 17, 2020 •

edited

Loading

`targets-manual`

strazto commented Oct 17, 2020 •

edited

Loading

wlandau commented Oct 18, 2020 •

edited

Loading

wlandau commented Oct 18, 2020 •

edited

Loading

wlandau commented Jan 12, 2021 •

edited

Loading

maurolepore commented Jan 12, 2021 •

edited

Loading

maurolepore commented Jan 12, 2021 •

edited

Loading

wlandau commented Jan 14, 2021 •

edited

Loading

wlandau commented Jan 14, 2021 •

edited

Loading