A new interface for triggers #473

wlandau · 2018-07-20T02:40:09Z

The problem

The current trigger interface is limiting. The choices "any", "always", "depends", "command", and "file" miss important use cases. For example, what if we want to check some remote data before deciding whether to build a target? For most projects, we would need to create a special upstream target with the fingerprint of the data and then have the real target depend on that fingerprint.

Proposal

It would be better to let users write arbitrary R code for each trigger.

drake_plan(
  target1 = {
    get_data(file_in("data.csv"))
    trigger(
      command = FALSE,
      logical = today() == "Tuesday"
    )
  }
  target2 = trigger(custom = remote_hash()),
  target3 = {
    dplyr::bind_rows(target1, tail(target2))
    trigger(
      file = FALSE,
      condition = is_small_enough(target1)
    )
  },
  target4 = {
    summary(target3)
    trigger(
      depends = TRUE,
      change = head(target3, 1)
    )
  }
)

Proposed trigger() function:

#' @title Set the trigger of a target.
#' @description Use this function inside a target's command
#'   in your [drake_plan()]. The target will rebuild if and only if:
#'   - Any of `command`, `depends`, or `file` is `TRUE`, or
#'   - `condition` evaluates to `TRUE`, or
#'   - `change` evaluates to a value different from last time.
#' @export
#' @seealso [drake_plan()], [make()]
#' @return a list of trigger specification details that
#'   `drake` processes internally when it comes time to decide
#'   whether to build the target.
#' @param command logical, whether to rebuild the target if the
#'   [drake_plan()] command changes.
#' @param depends logical, whether to rebuild if a
#'   non-file dependency changes.
#' @param file logical, whether to rebuild the target
#'   if a [file_in()]/[file_out()]/[knitr_in()] file changes.
#' @param condition R code (expression or language object)
#'   that returns a logical. The target will rebuild
#'   if the code evaluates to `TRUE`.
#' @param change R code (expression or language object)
#'  that returns any value. The target will rebuild
#'   if that value is different from last time
#'   or not already cached.
trigger <- function(
  command = TRUE,
  depends = TRUE,
  file = TRUE,
  condition = NULL,
  change = NULL
){
  stopifnot(is.logical(command))
  stopifnot(is.logical(depends))
  stopifnot(is.logical(file))
  list(
    command = command,
    depends = depends,
    file = file,
    condition = rlang::enexpr(condition),
    change = rlang::enexpr(change)
  )
}

People so far

@noamross brought up a motivating scenario in #252. @AlexAxthelm also has a good idea of what is needed because of his drake-powered work with databases. And of course, @krlmlr had the vision behind some of the best parts of the interface, including #232.

Please let me know if I am forgetting anyone. If possible, it would be nice to have input from affected power users.

The text was updated successfully, but these errors were encountered:

wlandau · 2018-07-21T12:40:35Z

Also cc @kendonB.

krlmlr · 2018-07-21T14:45:42Z

I'm not too sure about the syntax. I remember a discussion about a drake_command() function that returns a one-row tibble.

drake_plan(
  target1 = drake_command(
    get_data(file_in("data.csv")),
    trigger(
      command = FALSE,
      logical = today() == "Tuesday"
    )
  ),
  ...
)

kendonB · 2018-07-21T15:59:34Z

I like the idea and think @krlmlr's syntax is the tidiest. My only concern would be that introducing a change like this might make large plans very slow to start making.

wlandau · 2018-07-21T18:57:29Z

I like Kirill's approach too, mainly because it will be SO much easier to implement. I won't need to complicate the code analysis or sensitive internal bookkeeping. Also, there is a clear separation between the command and the trigger, which is great for users.

The target() function is the one that returns the 1-row tibble, and it works inside and outside of drake_plan().

library(drake)
pkgconfig::set_config("drake::strings_in_dots" = "literals")
drake_plan(
  x = target(
    command = 1 + 1,
    trigger = "always"
  )
)
#> # A tibble: 1 x 3
#>   target command trigger
#> * <chr>  <chr>   <chr>  
#> 1 x      1 + 1   always
target(
  command = 1 + 1,
  trigger = "always"
)
#> # A tibble: 1 x 2
#>   command trigger
#>   <chr>   <chr>  
#> 1 1 + 1   always

Question: should we detect dependencies from the trigger column of the drake_plan()? On the one hand, like @AlexAxthelm said in our conversation on Thursday, we may need some targets to be up to date before a custom trigger can be evaluated properly. On the other hand, if triggers can change the dependency structure of the workflow, targets could fall out of date from changes to the condition or change argument to trigger(). At this point, I would prefer to

Not add any more dependency detection, and
In the manual, caution users about using targets in trigger() and encourage to also mention them in the command so they register as dependencies.

wlandau · 2018-07-21T18:58:42Z

wlandau · 2018-07-21T19:02:20Z

And of course, we'll want to elaborate on triggers, custom columns, and target() in the drake_plan() chapter of the manual.

wlandau · 2018-07-21T19:06:21Z

@kendonB, Kirill's approach will definitely be at least as fast to execute as what I had in mind, probably faster because there is less static code analysis. If we check the condition trigger first and the value trigger last, I do not think there will be a loss in performance during make().

krlmlr · 2018-07-21T20:35:07Z

You could change the interface of target(...) to target(.command, ...) ; then the usual R argument matching kicks in.

Just thinking out loud. I'm slightly worried about the added complexity, especially in the light of brittle dependency detection for trigger conditions. I'd be very much interested in a "bare bones" version of drake that is restricted to evaluating commands and scheduling jobs, given the dependency graph and perhaps a run time estimate. Maybe this can be made available as a lower-level interface inside drake, or as a separate package? Complex triggers could then be implemented by automatically adding intermediate helper targets as necessary. (Error and progress messages would use artificial target names, though. Pretty sure there are other complications I'm not aware of.)

wlandau · 2018-07-21T23:30:19Z

A bare bones job scheduler is what I was going for with workers, but I have still not figured out a way to cleverly disentangle the job scheduling internals from everything else. I am continuing thinking about it, but I consider it very long term.

As for triggers, my original proposal would have added clutter. But with your idea, I feel there is a lot of gain and not much added complexity.

wlandau · 2018-07-22T15:48:04Z

Changing my mind about detecting dependencies from triggers. I think we should analyze the code of plan$trigger just like plan$command. Reasons:

The behavior more self-consistent.
It avoids surprises. Users expect drake to magically detect dependencies, and it should do so.
Yes, this could potentially make workflows more brittle, but only in a select few edge cases. There are so few times when users will really need the condition or change argument of trigger().

AlexAxthelm · 2018-07-23T14:52:22Z

I'll chime in that having plan$trigger be analyzed the same way as the command would also be preferable in my mind, since I will probably be writing custom trigger functions (sampling from remote databases, to check if contents have updated), and I would want drake to know if I've changed my function.

wlandau · 2018-07-23T18:39:11Z

I think I've got you covered, Alex. The way development is going, if you have a target x with trigger(condition = f()), drake will make sure f() is checked and processed before building x. By default, it will also rebuild x whenever f() changes because drake does not distinguish between the dependencies of the command and those of the trigger. It is a useful simplifying assumption that greatly reduces the complexity of the implementation, but it may not be exactly what you want. To decrease sensitivity, there are times when you might want trigger(condition = f(), depends = FALSE).

wlandau · 2018-07-23T22:39:54Z

Implemented via #478. The tests are very thorough, and it's on the master branch now. Let's keep iterating. I do not plan to submit to CRAN again until around August 10.

wlandau · 2018-07-24T20:37:10Z

@AlexAxthelm, please take a look at #480. I think the dependencies of triggers should be checked and processed beforehand, but I do not think changes to trigger-only dependencies should necessarily cause commands to rerun. Does the interface still do what you want?

wlandau added type: new feature difficulty: advanced topic: api status: priority labels Jul 20, 2018

wlandau added the branch: i473 🛠️ label Jul 21, 2018

This was referenced Jul 22, 2018

Name arguments of target() #474

Merged

Allow user-defined columns for wildcard expansion #475

Merged

wlandau mentioned this issue Jul 23, 2018

An improved interface for triggers #478

Merged

7 tasks

wlandau closed this as completed Jul 23, 2018

wlandau mentioned this issue Jul 28, 2018

Embrace igraph vertex attributes for storing code analysis information #483

Closed

idavydov mentioned this issue Oct 4, 2018

Wildcards in trigger condition are not expanded #528

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new interface for triggers #473

A new interface for triggers #473

wlandau commented Jul 20, 2018 •

edited

Loading

wlandau commented Jul 21, 2018

krlmlr commented Jul 21, 2018

kendonB commented Jul 21, 2018

wlandau commented Jul 21, 2018 •

edited

Loading

wlandau commented Jul 21, 2018 •

edited

Loading

wlandau commented Jul 21, 2018

wlandau commented Jul 21, 2018

krlmlr commented Jul 21, 2018

wlandau commented Jul 21, 2018

wlandau commented Jul 22, 2018 •

edited

Loading

AlexAxthelm commented Jul 23, 2018

wlandau commented Jul 23, 2018

wlandau commented Jul 23, 2018

wlandau commented Jul 24, 2018 •

edited

Loading

A new interface for triggers #473

A new interface for triggers #473

Comments

wlandau commented Jul 20, 2018 • edited Loading

The problem

Proposal

People so far

wlandau commented Jul 21, 2018

krlmlr commented Jul 21, 2018

kendonB commented Jul 21, 2018

wlandau commented Jul 21, 2018 • edited Loading

wlandau commented Jul 21, 2018 • edited Loading

wlandau commented Jul 21, 2018

wlandau commented Jul 21, 2018

krlmlr commented Jul 21, 2018

wlandau commented Jul 21, 2018

wlandau commented Jul 22, 2018 • edited Loading

AlexAxthelm commented Jul 23, 2018

wlandau commented Jul 23, 2018

wlandau commented Jul 23, 2018

wlandau commented Jul 24, 2018 • edited Loading

wlandau commented Jul 20, 2018 •

edited

Loading

wlandau commented Jul 21, 2018 •

edited

Loading

wlandau commented Jul 21, 2018 •

edited

Loading

wlandau commented Jul 22, 2018 •

edited

Loading

wlandau commented Jul 24, 2018 •

edited

Loading