Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

targets and tarchetypes #401

Closed
15 of 31 tasks
wlandau opened this issue Oct 2, 2020 · 71 comments
Closed
15 of 31 tasks

targets and tarchetypes #401

wlandau opened this issue Oct 2, 2020 · 71 comments

Comments

@wlandau
Copy link

wlandau commented Oct 2, 2020

Submitting Author: Name (@wlandau)
Repository:

Version submitted:

  • 0.0.0.9002 (targets)
  • 0.0.0.9000 (tarchetypes and the manual)

Editor: @maurolepore
Reviewer 1: @limnoliver
Reviewer 2: @tjmahr
Archive: TBD
Version accepted: TBD


DESCRIPTION files

  • targets
Package: targets
Title: Dynamic Function-Oriented Make-Like Declarative Pipelines for R
Description: The targets package is a pipeline toolkit that brings together
  function-oriented programming and Make-like declarative workflows for
  Statistics and data science in R. It implements a workflow as collection of
  interconnected tasks, analyzes the dependency relationships among these
  tasks, skips steps that are already up to date, runs the necessary
  computations with optional parallel workers, abstracts files as
  R objects, and provides tangible evidence that the results match
  the underlying code and data. The methodology in this package
  borrows from GNU Make by Richard Stallman (2015, ISBN:978-9881443519)
  and drake by Will Landau (2018) <doi:10.21105/joss.00550>.
Version: 0.0.0.9001
License: MIT + file LICENSE
URL: https://wlandau.github.io/targets/, https://github.com/wlandau/targets
BugReports: https://github.com/wlandau/targets/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ),
  person(
    given = c("Matthew", "T."),
    family = "Warkentin",
    role = "ctb"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  callr (>= 3.4.3),
  cli (>= 2.0.2),
  codetools (>= 0.2.16),
  data.table (>= 1.12.8),
  digest (>= 0.6.25),
  igraph (>= 1.2.5),
  R6 (>= 2.4.1),
  rlang (>= 0.4.5),
  tibble (>= 3.0.1),
  tidyselect (>= 1.1.0),
  utils,
  vctrs (>= 0.2.4),
  withr (>= 2.1.2)
Suggests:
  aws.s3 (>= 0.3.21),
  clustermq (>= 0.8.9),
  curl (>= 4.3),
  dplyr (>= 1.0.0),
  fst (>= 0.9.2),
  future (>= 1.17.0),
  keras (>= 2.2.5.0),
  knitr (>= 1.30),
  rmarkdown (>= 2.4),
  qs (>= 0.23.2),
  rstudioapi (>= 0.11),
  testthat (>= 2.3.2),
  torch (>= 0.1.0),
  usethis (>= 1.6.3),
  visNetwork (>= 2.0.9)
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1.9000
VignetteBuilder: knitr
  • tarchetypes
Package: tarchetypes
Title: Archetypes for Targets
Description: The targets package is a pipeline toolkit that brings together
  function-oriented programming and Make-like declarative workflows for
  Statistics and data science in R. The tarchetypes package provides
  convenient user-side functions to create specialized targets,
  making pipelines easier to create and read. The methods in this package
  were influenced by the drake R package by Will Landau (2018)
  <doi:10.21105/joss.00550>.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/tarchetypes/, https://github.com/wlandau/tarchetypes
BugReports: https://github.com/wlandau/tarchetypes/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  fs (>= 1.4.2),
  rlang (>= 0.4.7),
  targets,
  tidyselect (>= 1.1.0),
  utils,
  vctrs (>= 0.3.4),
  withr (>= 2.1.2)
Suggests:
  digest (>= 0.6.25),
  knitr (>= 1.28),
  rmarkdown (>= 2.1),
  testthat (>= 2.3.2)
Remotes:
  wlandau/targets
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1.9000
  • Manual
Package: targets.manual
Title: Targets R Package User Manual
Description: This repository contains the source files of the targets R
  package user manual.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/targets-manual,
  https://github.com/wlandau/targets-manual
BugReports: https://github.com/wlandau/targets-manual/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  biglm (>= 0.9.2),
  bookdown (>= 0.19),
  fs (>= 1.4.1),
  purrr (>= 0.3.4),
  tarchetypes,
  targets,
  tidyverse (>= 1.3.0),
  visNetwork (>= 2.0.9),
  withr (>= 2.2.0)
Remotes:
  wlandau/tarchetypes,
  wlandau/targets
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.0
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

targets is an R-focused pipeline toolkit for Make-like declarative workflows. It resolves the dependency relationships among steps of a data analysis workflow and skips steps that are already up to date.

  • Who is the target audience and what are scientific applications of this package?

targets is for R users who maintain computationally intense function-oriented data analysis projects (with large codebases and/or long runtimes). Such projects may include but are not limited to Bayesian statistics, simulation, machine learning, PK/PD, and spatial statistics.

targets is the long-term successor to drake. After four years of development, drake has improved so much that its insurmountable problems have become its most pressing ones. A new package is necessary to advance the capability further. So while I still believe drake is thriving, and even though I will continue to maintain drake indefinitely, I created targets to try to break new ground. At https://wlandau.github.io/targets/articles/need.html#drake, I take a detailed dive into the ways that targets surpasses drake's permanent limitations.

N/A

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you intend for this package to go on Bioconductor?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/. I have written a paper.md, but I need to run it through my company's scientific disclosure process before I share it. That could take a few weeks. paper.md and paper.bib now disclosed and included inside inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@melvidoni
Copy link
Contributor

Hello @wlandau for the submission. Before I address it further, could you elaborate on the need for submitting both packages on a single submission? Could this be split into two different submissions? (Thinking about potential reviewer workload). Please, be thorough.

@wlandau
Copy link
Author

wlandau commented Oct 6, 2020

tarchetypes is a collection of user-level utilities for targets, and it does not stand on its own. Its purpose is to make targets easier to use, and it requires prior familiarity with the latter. I originally considered implementing both in a single package, and I still view their physical separation as an implementation detail. So I thought that separate submissions would potentially duplicate the overall reviewer-side overhead of learning targets, especially compared to the relatively small size of tarchetypes. In addition, tarchetypes helpers like tar_render() also appear in the manual.

@wlandau
Copy link
Author

wlandau commented Oct 6, 2020

But I am still happy to split up the submission if you think it best. Please let me know what you decide.

@melvidoni
Copy link
Contributor

Hello @wlandau, thanks for the clarifications. @maurolepore will be your handling editor.

@maurolepore
Copy link
Member

maurolepore commented Oct 17, 2020

@wlandau, thanks for the high quality of this submission. I'm pleased to be the
editor.

Here I first give an overview of what I see, then ask for more information to
better understand if this or another structure is the best for this submission,
and finally comment on the editor checks. Please note the comments preceded by
a bullet point and respond to those preceded by a check-box; you may refer to their
labels.

In this submission I see three packages: targets, tarchetypes, and
targets.manual. I understand that (a) targets supersedes the popular package
drake, also maintained by you; (b) tarchetyes extends targets; and (c)
targets.manual would be published as an rOpenSci book containing the manual
for both targets and tarchetypes. Please correct me if necessary.

With three packages, this submission is more complex than the typical 1-package
submission. This may add extra pressure on resources such as rOpenSci reviewers,
and rOpenSci and CRAN servers. Please help me understand why this is the best
alternative (if possible, point me to the source of information that guided your
decision, e.g. books, articles, conversations with other developers):

  • (ml01) How did you decide to supersede drake instead of extend it? Did
    you considered any other option, such as the "edition" approach of testthat 3?

  • (ml02) How did you decide that targets and tarchetypes should be two
    separate packages?

  • (ml03) Who would you suggest as a reviewer for this submission? Please
    consider people with skills that complement yours, specially if they have
    experience restructuring popular R packages.

Your answers will clarify the fit -- which I leave unchecked.

targets

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

README

  • (ml06) In general, I see an opportunity to remove duplication and thus make
    the package easier to maintain. For example, the file "README.Rmd" links to
    multiple pages under the same website; a single link to the website might be
    enough and easier to maintain. Similarly, under
    Help I see links to repositories
    other than targets. This may be hard to maintain. In isolation, this may seem
    like a minor issue, but minor issues can compound quickly.

CI

  • (ml07) In the file "DESCRIPTION" I see Depends: R (>= 3.5.0), but the
    file ".github/workflows/check.yaml" covers R-release only. Please extend the
    continuous-integration workflow to run R CMD check from R-devel to the R
    version stated in the file "DESCRIPTION"
    (?usethis::use_github_action_check_full()).
tarchetypes

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

  • (ml08) Same as ml06.
  • (ml09) Same as ml07.
targets.manual

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

  • (ml10) The name of the package and GitHub repository don't match: "targets.manual" (with ".") and "target-manual" (with "-"). Because the symbol "-" is invalid in the name of an R package, consider renaming the repository to "targets.manual". The inconsistency might confuse some users.

  • (ml11) devtools::install_github() failed with upgrade = "default" and passed with upgrade = "never"`. The error seems unrelated to the package but please check.

devtools::install_github("wlandau/targets-manual", dependencies = TRUE, 
    build_vignettes = TRUE)
#> Using github PAT from envvar GITHUB_PAT
#> Downloading GitHub repo wlandau/targets-manual@HEAD
#> readr (1.3.1 -> 1.4.0) [CRAN]
#> Installing 1 packages: readr
#> Installing package into '/home/mauro/R/x86_64-pc-linux-gnu-library/4.0'
#> (as 'lib' is unspecified)
#> Error: Failed to install 'targets.manual' from GitHub:
#>   (converted from warning) installation of package 'readr' had non-zero exit status
  • (ml12) R CMD check found 1 ERROR. Please fix the error or explain if R CMD check is not applicable to this type of package.
#> > checking package dependencies ... ERROR
#>   VignetteBuilder package not declared: ‘knitr’
  • (ml14) I see a mismatch between the copy right holder in DESCRIPTION and
    LICENSE (but the copy right holder in DESCRIPTION and LICENSE.md do match). Please ensure they match.

  • (ml15) Please update the words list.

devtools::spell_check()
#>   WORD        FOUND IN
#> ’s        index.Rmd:55
#> indivdual   index.Rmd:91
#> learnings   index.Rmd:55
#> repo        README.md:5
#> toolkits    index.Rmd:45,47,51
#> zenodo      README.md:3

Reviewers:

Reviewer: @limnoliver
Reviewer: @tjmahr
Due date: 2020-12-19*

* This is four weeks from the day a second reviewer confirmed. I discussed with the chief editor @melvidoni to allow one week more than usual, because of COVID and because this submission includes 2 packages.

@wlandau
Copy link
Author

wlandau commented Oct 17, 2020

Thank you for your insightful feedback, @maurolepore. Below, I address ml04-ml15. (I will respond to ml01-ml03 in a separate post.)

@maurolepore
Copy link
Member

maurolepore commented Oct 17, 2020

Thanks @wlandau for addressing my comments.

  • ml10-13 and ml15:

The only purpose of the DESCRIPTION was to make CI easier. To reduce confusion, in ropensci-books/targets@a2092f9 and ropensci-books/targets@5a69d0c, I removed the DESCRIPTION and moved the relevant content into README.md and packages.R.

  • (ml16) I support this (more on ml18 below). Now that I understand it, I prefer your original take -- it aligns with the idea of "research compendium". Please revert to it, then explain the goal, structure, and usage of the repository in the field "Description" of the file DESCRIPTION.
  • (ml17) Even if targets.manual is not meant to be built and installed as a package, consider adding the minimum infrastructure to the repository for devtools::check() to pass cleanly -- that seems like the path of least resistance. If this is a bad idea, maybe write a broken test with the error message you want someone like me to read.

--

  • (ml18) I support anything that makes your work less error prone and easier to maintain. My comments ml01-ml03 focused on external resources, but I forgot to stress that my main concern is maintainability. Your work is outstanding; the R community will benefit from you investing time in the important things instead of mundane tasks. If those mundane tasks are avoidable (e.g. with continuous integration, an "edition" approach, or rethinking the architecture of you system), then my goal is to ensure you consider those options. Collectively, the rOpenSci community has a lot of experience that should help you make informed, good choices.

@wlandau
Copy link
Author

wlandau commented Oct 17, 2020

  • ml18: Thanks, @maurolepore. This helps me answer your questions. The way I am structuring things now, including the separation into multiple repositories, is largely to support maintainability. I will elaborate in my subsequent posts.

@wlandau
Copy link
Author

wlandau commented Oct 17, 2020

Below, I answer ml02 and ml03, and I address the physical separation of targets-manual and other documentation. ml01 requires a much deeper answer, so I will address it in its own post.

ml02

I have had more time to reflect since @melvidoni first brought this up, and I will try to elaborate.

I could have easily implemented targets and tarchetypes in the same package. Both are essentially the same system, and tarchetypes is currently very small, supporting only a limited handful of superficial extensions to targets' deeper capabilities. So for the present, the physical separation is a trivial implementation detail, and I do not believe it will exacerbate the burden of software review.

The separation of tarchetypes is about planning for the future. From what I learned maintaining drake and observing use cases, tarchetypes will enhance maintainability, the quality of the infrastructure and user-side freedom for years to come.

Sustainable infrastructure

Interface development incurs additional challenges, code volume, bugs, tests, and documentation. I learned this the hard way while developing static branching in drake. Because drake's design did not allow me to implement static branching in a separate package, drake itself became more difficult to maintain, more prone to feature creep, and more prone to errors (see ropensci/drake#1199, ropensci/drake#1262, ropensci/drake#1010, ropensci/drake#1009, ropensci/drake#1008, and many more issues like these). And although I fixed all the known and reported bugs, the deeper underlying causes only worsened as the years went by, and the fundamental design of the internals and the interface made it impossible to eliminate these problems in drake itself.

As its own package, tarchetypes has plenty of room to explore alternative interfaces and shorthand for targets and pipelines. That way,targets itself will stay light, elegant, and sharply delineated in scope, and the architecture will remain clean, resilient, and sustainable in the long run.

A precedent for extensibility

In addition, tarchetypes deliberately sets a precedent for enhanced automation through third-party interface development. tarchetypes functions like tar_render() and tar_map() are easy to implement using targets::tar_target_raw(), and even now, developers are already borrowing the pattern to develop their own specialized interfaces. Internally at work, for example, one of my colleagues maintains a package for domain-specific simulation studies in the life sciences, and it has custom target archetypes tailored to the specific needs of these internal pipelines. An earlier version of this same package used drake, and the metaprogramming on top of drake_plan() was messy and nearly impossible.

targets-manual

In the early days of drake, all the documentation lived in the README and vignettes. Over time, the vignettes steadily accrued volume, length, and R package dependencies, and drake became difficult to install. So I moved the vignettes to their own bookdown project, and the problem was solved.

With targets-manual, I am trying to do the same thing right from the start, even though it is much shorter than drake's manual at the moment. So targets-manual is just a collection of user-side vignettes that happens to be in its own GitHub repository. As with tarchetypes, I do not believe the physical separation of targets and targets-manual increases the burden of software review relative to unifying them in a single package. Conceptually, they still belong together.

Other documentation

targets has additional repositories with extra documentation. They help people learn the package, but I am not submitting them to rOpenSci.

  1. https://github.com/wlandau/targets-minimal
  2. https://github.com/wlandau/targets-stan
  3. https://github.com/wlandau/targets-keras
  4. https://github.com/wlandau/targets-tutorial

(1)-(3) are example projects that use targets, and (4) is a half-day short course. Again, I find the physical separation to be important. For drake, I made the mistake of curating multiple example projects in https://github.com/wlandau/drake-examples, and the repo grew without bound. With different projects in different repos now, everything is easier to maintain. The code is easier to fix, and it is easier to set up RStudio Cloud workspaces to help people learn. In addition, the pattern demonstrates to users that each targets-powered data analysis workflow should live in its own repo, which is exactly how I have observed most successful projects to be implemented in practice.

ml03

My top reviewer recommendations are power users who have already provided extensive feedback and expressed strong enthusiasm about targets. Their prior familiarity could reduce the workload.

Otherwise, it may help to reach out to folks who have spread drake throughout the community. Here are some names that come to mind.

@tjmahr
Copy link

tjmahr commented Oct 17, 2020 via email

@mattwarkentin
Copy link

Happy to be considered as a possible reviewer of targets and tarchetypes. I use both packages on a near-daily basis for many on-going projects, including projects that are developed and built as purely local, purely remote (live and run on HPC) and a mixture of local and remote (live local and selectively run locally or on HPC via SSH).

@wlandau
Copy link
Author

wlandau commented Oct 17, 2020

Thank you both for your eagerness to help!

@maurolepore, I addressed ml16 and ml17 in commits ropensci-books/targets@f792c14 through ropensci-books/targets@2ee3725.

Do you still think it is necessary to move https://github.com/wlandau/targets-manual to https://github.com/wlandau/targets.manual? If the manual is accepted into rOpenSci, I expect the hyphen to go away on its own when the URLs to move to https://github.com/ropensci-books/targets and https://books.ropensci.org/targets.

Other than that, I believe only ml01 remains for now.

@strazto
Copy link

strazto commented Oct 17, 2020

I'm deeply appreciative to be considered to help out with targets.

I've been following it's development with interest as a daily drake user on remote systems/hpc environments with an interest in usability and discoverability in data pipelines.

I've been hesitant to migrate my workflow over. From your description, it'll be valuable to do so and I look forward to trying it out.

I'll also be making an effort to update my extension package for drake, mandrake to support targets.

@billdenney
Copy link

While I'm very interested in targets, I unfortunately can't help as a reviewer in the next month or two.

@wlandau
Copy link
Author

wlandau commented Oct 17, 2020

Almost forgot: @mattwarkentin is actually a formal contributor because of ropensci/targets#170, and @noamross has a PR at ropensci/tarchetypes#9 that I meant to keep open. @melvidoni and @maurolepore, does this mean they might have to recuse themselves?

@wlandau
Copy link
Author

wlandau commented Oct 18, 2020

Now, I will address ml01. I think it is the most pressing question from this thread so far, and it is definitely the most difficult to explain in complete depth. So please let me know if you are skeptical or if anything remains unclear or unresolved.

What all this means for drake

drake is in a position of strength, and this is exactly what allowed me to create targets in the first place. After four years of steady development, I feel that we have solved most of the solvable problems worth mentioning, which finally forced me to reckon with the unsolvable ones. targets is not about remediation, it is about breaking new ground.

Maintenance

I will maintain drake indefinitely. I will continue to provide one-on-one help with use cases, fix known bugs, address known inefficiencies, and consider new community-requested features. However, since I am not going to propose new feature ideas of my own or strike out on Odyssean refactoring adventures, maintenance will be far easier and less time-consuming. In fact, because targets has a much cleaner design, the combined maintenance of drake, targets, tarchetypes, and the docs could prove far less demanding than that of drake alone up to this point.

Why not an edition?

I have been hearing quite a lot about testthat editions lately. I read the vignette, and I just updated targets and tarchetypes so their test suites are compatible with both editions 2 and 3. So although I did not actually inspect the changes to the testthat code base, I think I get the idea. From the last section of the vignette:

You might wonder why we came up with the idea of an “edition”, rather than creating a new package like testthat3. We decided against making a new package because the 2nd and 3rd edition share a very large amount of code, so making a new package would have substantially increased the maintenance burden: the majority of bugs would’ve needed to be fixed in two places.

I think Hadley says it well. The breaking changes in testthat would ordinarily be disruptive to users because of the thousands of reverse dependencies. However, the impact on the testthat package itself is minor, and a totally new package would be overkill.

With drake, similar changes have occurred a small handful of times in the past. In version 6, I restructured the cache in order to improve efficiency. The change was not back-compatible, and I configured drake to error out at the right time, preserve existing data, and walk users through their available options. Similarly, version 7 removed all the obsolete parallel backends and threw a deprecation warning. These small but disruptive changes were manageable, and they did not require a new package.

This time, however, the change is tectonic. drake's goals have transformed, and the package is on the brink of outgrowing its own architecture and design principles. More than that, my entire philosophy of programming has shifted, and I think completely differently about what a pipeline tool should do and how it should be designed. A new package seemed like the only way to make meaningful progress, and I am glad I went that direction.

Motivation

drake is the largest and most fruitful software development undertaking I have ever attempted. Just working through the technical issues more than doubled my proficiency in R, and the success of the tool granted me access to a fountain of community wisdom. To say I learned a lot would be an understatement. I am not the same developer I was four years ago, and I think totally differently about software.

For the purposes of targets, if I had to identify a single tipping point, it would probably be a book recommendation from Jim Hester: Design Patterns: Elements of Reusable Object-Oriented Software, the classic text by Gamma, Helm, Johnson, and Vlissides. At the time, I already had some experience with traditional message-passing OOP, but I was still confused about the problems it solved, and I was unsure how to apply it properly. The first few chapters finally answered questions about when to use object composition instead of inheritance, the importance of mutable objects for certain use cases, and how to choose a mental model that fits the task at hand.

I began to see the limitations of drake's design. drake uses simple immutable data structures such as ad hoc lists, some of them god objects. As a result, it has serious trouble expressing features such as static branching, dynamic branching, and dynamic files. Seamless targets-style cloud integration was beyond reach, and compatibility between dynamic files and data recovery was nearly impossible and super messy to enforce.

Finally, I realized I would no longer be able to make nontrivial improvements to drake as long as it relied on the old infrastructure. The path forward needed not only a complete rewrite and a complete reevaluation of every feature, but also an entirely different programming paradigm and philosophy.

A new design

Internally, targets is a collection of interdependent, carefully constructed, sharply defined, lightweight R6 and S3 classes with mutable instances. I built these 74 classes from the ground up in order to express the complicated reasoning that happens both within and among targets, and the mental model easily captures the behavior of a dynamic pipeline tool. This internal harmonization paved the way for several improvements that were impossible in drake, some of which appear in the statement of need, notably cloud storage integration, interface extensibility, and parallel-efficient flexible dynamic branching.

Let's take dynamic branching as an example. drake struggles with this because it does not have a formal structure to express what it means to be a target. targets, on the other hand, supports a formal inheritance hierarchy of classes to express the unique role of each target in the dynamic branching process. For example, "stems" are unbranched targets capable of producing "buds", and "patterns" are special templates that generate the actual branches. (This design document overviews the different kinds of targets and their role in dynamic branching.) All instances are mutable, and the mental model completely aligns with the realities of implementation. There is rigidness where we need rigidness, and there is flexibility where we need flexibility. This is what allows targets to overcome the permanent map-reduce efficiency limitation of drake.

@wlandau
Copy link
Author

wlandau commented Oct 18, 2020

And with that, I think I am all caught up on editor requests for the moment. When the time comes, please let me know what else I can do.

@wlandau
Copy link
Author

wlandau commented Oct 18, 2020

An aside: you can read a lot into the package names here. "drake" is an acronym: "data frames in R for Make". In the first few versions, drake::make(parallelism = "Makefile") would actually take a data frame of commands (a drake plan) and turn it into a real Makefile. drake has long since moved beyond this approach, and the "Makefile" backend is no longer supported. In other words, drake outgrew its own name.

The name "targets" represents a different way of thinking about the design. Whereas drake's data structures operate at the level of the entire pipeline, nearly all the structure and all the reasoning of targets happens on an individual target-by-target basis. In targets, target objects are first-class citizens, and they express almost all of the behavior through method dispatch and through the nested OOP objects that compose them. This reductionism really fits the situation, and it ensures the mental model aligns with the realities of implementation.

@DominikRafacz
Copy link

Thanks for thinking of me as a reviewer! I don't have much time to offer, but if I'm useful, I'll be happy to get involved, especially if there's a specific task to do.

@bpbond
Copy link

bpbond commented Oct 19, 2020

Odyssean refactoring adventures

🚣

That's a thoughtful and interesting comment @wlandau . Happy to help review if useful.

@maurolepore
Copy link
Member

Thanks @wlandau for addressing my comments. I'll come back to you likely next week. I would like to read your comments in greater detail, and discuss with other editors what might be the best two reviewers for this submission. But right now I'm on a short vacation with limited access to internet. I'm excited about how interesting this discussion already is. Thanks! 👍

@maurolepore
Copy link
Member

Thanks @wlandau for your patience. I can now confirm this submission satisfies editor checks. I'll start seeking reviewers (thanks for your suggestoins).

Please ensure the README files of each package in this submission has an rOpenSci review badge -- maybe via --rodev::use_review_badge(), rodev::use_review_badge(<issue_number>). Badge URL is https://badges.ropensci.org/<issue_id>_status.svg. Full link should be:

@maurolepore
Copy link
Member

RE #401 (comment)
Thanks for the heads up. If a potential conflict of interest is unclearly defined in the guidelines, I'll discuss with other editors. For now you should not worry.

@wlandau
Copy link
Author

wlandau commented Jan 12, 2021

Before I forget: @limnoliver, @tjmahr, and @maurolepore, would you like to be listed as "reviewers" in the DESCRIPTION files of the repos under review? If so, would you like me to include your ORCIDs?

It is amazing how much better the documentation has become as a result of your input. I am more optimistic now that new users will find targets accessible. Even though I feel like I already went down this road with drake, there is always much more to learn.

@limnoliver
Copy link

I approve as well. I like the additions of the "Overview" vignette and the "Getting Started" section of the README. Overall, I think the new users of targets will have a much easier time navigating these resources now and getting jump started.

@wlandau
Copy link
Author

wlandau commented Jan 12, 2021

Fantastic, thank you so much @limnoliver and @tjmahr!

@maurolepore
Copy link
Member

maurolepore commented Jan 12, 2021

Approved! Thanks @wlandau for submitting and @limnoliver and @tjmahr for your reviews! 🥇

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname). If Appveyor does not pick up new commits after transfer, you might need to delete and re-create the Appveyor project. (Repo transfers are smoother with Travis CI and GitHub Actions)
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

wlandau-lilly pushed a commit to ropensci/targets that referenced this issue Jan 12, 2021
wlandau-lilly pushed a commit to ropensci/tarchetypes that referenced this issue Jan 12, 2021
@maurolepore
Copy link
Member

maurolepore commented Jan 12, 2021

DONE

  • Create a two-person team in rOpenSci’s “ropensci” GitHub organization, named for the package, with yourself and the package author as members.
  • Go to the repository settings in rOpenSci’s “ropensci” GitHub organization and give the author “Admin” access to the repository.
  • If authors maintain a gitbook that is at least partly about their package, contact an rOpenSci staff member so they might contact the authors about transfer to the ropensci-books GitHub organisation.**

NOMINATION

  • Nominate a package to be featured in an rOpenSci blog post or tech note if you think it might be of high interest. Please note in the software review issue one or two things the author could highlight, and tag @ropensci/blog-editors

@ropensci/blog-editors, I think these packages might be of high interest. I suspect many readers will be familiar with the drake package and may wonder how it compares to targets and how to migrate; they may also be interested in the relationship with tarchetypes.

@stefaniebutland
Copy link
Member

Agreed. @wlandau we must have a post about this 😉. We're lucky to have this in the rOpenSci organization.

Guidelines: https://blogguide.ropensci.org/. When you're ready Will, please suggest a date to submit a draft post and my colleague @steffilazerte will review it.

@wlandau
Copy link
Author

wlandau commented Jan 12, 2021

Thanks, I would love to submit a post! I did write a draft, but much has happened since then, and it needs a lot of work. I will submit a PR when it is ready.

@wlandau
Copy link
Author

wlandau commented Jan 13, 2021

My apologies @tjmahr and @limnoliver, somehow I managed to miss this from your reviews:

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

I will add you both.

@wlandau
Copy link
Author

wlandau commented Jan 13, 2021

"rev" roles now added.

@maelle
Copy link
Member

maelle commented Jan 14, 2021

@wlandau I've sent you an invite to the ropensci-books organization so you might transfer your book repo there. Thank you!

@wlandau
Copy link
Author

wlandau commented Jan 14, 2021

Thanks so much, @maelle! As I discussed with @maurolepore, I transferred wlandau/targets-manual and wlandau/targets-design to ropensci-books.

Just one last thing: would you grant me access to the settings of both ropensci-books/targets-manual and ropensci-books/targets-design? I would like to rename ropensci-books/targets-manual to the more concise ropensci-books/targets.

@maelle
Copy link
Member

maelle commented Jan 14, 2021

@wlandau I've now made you an admin of both repos. 🙂

@wlandau
Copy link
Author

wlandau commented Jan 14, 2021

Amazing, thank you! And the URLs are working!

@wlandau
Copy link
Author

wlandau commented Jan 14, 2021

@maurolepore, I believe I addressed the items in #401 (comment) (except the URL redirects, which are currently not working for the ropensci-books repos). Please let me know if there is anything I missed.

@wlandau
Copy link
Author

wlandau commented Jan 14, 2021

I just submitted targets to CRAN, and I will submit tarchetypes as soon as targets is accepted and the binaries are available. My colleagues and I look forward to the upcoming cascade of internal production releases that will soon follow.

In the near future, I would like to post an rOpenSci submission for jagstargets, an R Targetopia package for Bayesian data analysis with JAGS. jagstargets should be a nice warmup for stantargets, a similar but larger package built on targets and Stan.

This review process was incredibly rewarding, and targets is much more accessible as a result. I cannot thank you enough!

@maurolepore
Copy link
Member

@maurolepore, I believe I addressed the items in #401 (comment) (except the URL redirects, which are currently not working for the ropensci-books repos).

Thanks! I'm sure you'll keep on top of it. Let me know on Slack if you need help.

Following https://devguide.ropensci.org/editorguide.html I added a "peer-reviewed" topic to targets and tarchetypes, and I'm now closing this software-review issue.

Thank you to our amazing reviewers @limnoliver and @tjmahr and amazing author @wlandau. I look forward to seeing the R community thrive with these new packages. Best luck with the submission to CRAN.

@wlandau
Copy link
Author

wlandau commented Jan 14, 2021

Fantastic, @maurolepore! Thank you so much for your conscientiousness and diligence during this whole process.

An update: the redirects and 404 pages are now working. The remaining issues are purely cosmetic and probably have to do with how R Markdown sites get rendered: https://community.rstudio.com/t/trouble-customizing-404-page-of-r-markdown-website/93142.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests