tar_cue() change argument similar to change in drake::trigger() #130

jaredlander · 2020-07-29T05:32:10Z

Prework

I understand and agree to targets' code of conduct.
I understand and agree to targets' contributing guidelines.
New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

While it seems like dynamic files makes it easy to trigger targets when a file on disc changes, there does not seem to be a way to do the same for when a database table changes, at least as far as I can tell.

Per Chapter 4 of the {drake} book I would do something essentially like this.

library(DBI)

# Connection objects are brittle, so they should not be targets.
# We define them up front, and we use ignore() to prevent
# drake from rerunning targets when the connection object changes.
con <-  dbConnect(...)

plan <- drake_plan(
  data = target(
     dbReadTable(ignore(con), "my_table"), # Use ignore() for db connection objects.
     trigger = trigger(change = somehow_get_db_timestamp()) # Define yourself.
  ),
  preprocess = my_preprocessing(data) # runs when the data change
)

I would imagine something similar to the change argument could be added to tar_cue(). According to my preliminary reading of the help file, tar_cue() only has logical arguments and nowhere to evaluate code such as somehow_get_db_timestamp().I suppose we could change the value passed to mode based on an earlier target, but haven't tested that.

The text was updated successfully, but these errors were encountered:

wlandau · 2020-07-29T12:30:51Z

The change trigger from drake is one of those features I am deliberately excluding from targets. It turns out to be super complicated to implement and maintain, and we do not actually gain much because it is almost always possible to find workarounds that do not use it. You can achieve the same effect with a global variable or in _targets.R or a dependency on an upstream target in the pipeline that always runs (with tar_cue(mode = "always")).

jaredlander · 2020-07-29T15:21:14Z

You can achieve the same effect with a global variable or in _targets.R or a dependency on an upstream target in the pipeline that always runs (with tar_cue(mode = "always")).

I thought about that last night and almost didn't file the feature request but I couldn't quite figure it out. I thought to make a target that always checked the database time. But I wasn't sure how to get a downstream target to depend on it. I built the following pipeline.

library(targets)
source("R/functions.R")
con <- DBI::dbConnect()
tar_options(packages = c("DBI"))
tar_pipeline(
    tar_target(
        database_update_time,
        somehow_get_db_timestamp(),
        cue=tar_cue(mode='always')
    ),
    tar_target(
        read_data,
        dbReadTable(con, 'mytable')
    )
)

I'm not sure how read_data can check on the status of database_update_time. Would it be something crude like

tar_target(
        read_data,
        if(database_update_time){ dbReadTable(con, 'mytable') }
    )

or am I missing something very obvious?

wlandau · 2020-07-29T16:11:20Z

The key is to force read_data to depend on database_update_time. You could do this either with a global object or a target. Either way, please be sure to check tar_glimpse() or tar_visnetwork() so you know the dependency relationships are correct. Also, transient objects like con tend to invalidate easily and not apply to external R processes, so I would generate them within targets as opposed to up front. The following two options are basically equivalent.

# _targets.R
library(targets)
tar_options(packages = c("DBI"))

read_db <- function(database_update_time) {
  con <- DBI::dbConnect()
  on.exit(close(con))
  dbReadTable(con, 'mytable')
}

tar_pipeline(
    tar_target(
        database_update_time,
        somehow_get_db_timestamp(),
        cue=tar_cue(mode='always')
    ),
    tar_target(
        read_data,
        read_db(database_update_time)
    )
)

# _targets.R
library(targets)
tar_options(packages = c("DBI"))

read_db <- function(database_update_time) {
  con <- DBI::dbConnect()
  on.exit(close(con))
  dbReadTable(con, 'mytable')
}

database_update_time <- somehow_get_db_timestamp(),

tar_pipeline(
    tar_target(
        read_data,
        read_db(database_update_time)
    )
)

I put the code in a custom read_db() function just for convenience.

jaredlander · 2020-07-29T16:18:38Z

Thanks for the info. I had a feeling I would need to write a function that wrapped dbReadTable() in order to pass in a target. But you allude to that being just for convenience. Does that mean it's possible not to have to create a custom function?

Also, transient objects like con tend to invalidate easily and not apply to external R processes, so I would generate them within targets as opposed to up front.

This is different than how I was using {drake} so good to note, thank you. And from your example I assume you mean as part of the target using it, not its own target.

wlandau · 2020-07-29T20:23:52Z

Does that mean it's possible not to have to create a custom function?

Yes, a target's command can be an arbitrary code chunk. I just find functions cleaner and clearer in most situations.

# _targets.R
library(targets)
tar_options(packages = c("DBI"))

database_update_time <- somehow_get_db_timestamp()

tar_pipeline(
    tar_target(
        read_data, {
          database_update_time # Mention the symbol to enforce the dependency relationship.
          con <- DBI::dbConnect()
          on.exit(close(con))
          dbReadTable(con, 'mytable')
    })
)

And from your example I assume you mean as part of the target using it, not its own target.

Yes, exactly. A non-exportable object such as a connection object should be defined as part of the command of the target that needs it. targets has parallel computing capabilities, so it's not always clear that a target will run on the same process or even the same computer as your local R session.

jaredlander · 2020-07-29T20:52:56Z

Yes, a target's command can be an arbitrary code chunk. I just find functions cleaner and clearer in most situations.
Ah, you meant putting a few commands in a curly brace block, got it.

Yes, exactly. A non-exportable object such as a connection object should be defined as part of the command of the target that needs it. targets has parallel computing capabilities, so it's not always clear that a target will run on the same process or even the same computer as your local R session.

Cool. Thanks for the info. This tackles this problem for me.

wlandau · 2020-08-03T20:45:06Z

As I alluded to in #131, I think we can address your original request externally with ropensci/tarchetypes#2.

multimeric · 2022-09-26T01:12:43Z

Okay this is cool. So you can use tarchetypes::tar_change() to resolve this, which acts like a tar_target, but has a change argument which contains some code that will force a rerun of the command if it evaluates to something different. The implementation seems to involve splitting the target into two targets, one which does the invalidation, and a second task dependent on the first that runs the actual command.

jaredlander added the type: new feature label Jul 29, 2020

jaredlander assigned wlandau Jul 29, 2020

jaredlander changed the title ~~command argument for tar_cue()~~ change argument for tar_cue() Jul 29, 2020

jaredlander changed the title ~~change argument for tar_cue()~~ tar_cue() change argument similar to command in drake::trigger() Jul 29, 2020

wlandau changed the title ~~tar_cue() change argument similar to command in drake::trigger()~~ tar_cue() change argument similar to change in drake::trigger() Jul 29, 2020

wlandau closed this as completed Jul 29, 2020

noamross mentioned this issue Jul 29, 2020

Return timestamps in tar_meta() as POSIXct #131

Closed

3 tasks

This was referenced Aug 3, 2020

Replicate the change trigger from drake ropensci/tarchetypes#2

Closed

Replicate the condition trigger from drake ropensci/tarchetypes#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tar_cue() change argument similar to change in drake::trigger() #130

tar_cue() change argument similar to change in drake::trigger() #130

jaredlander commented Jul 29, 2020

wlandau commented Jul 29, 2020

jaredlander commented Jul 29, 2020

wlandau commented Jul 29, 2020

jaredlander commented Jul 29, 2020

wlandau commented Jul 29, 2020 •

edited

Loading

jaredlander commented Jul 29, 2020

wlandau commented Aug 3, 2020

multimeric commented Sep 26, 2022

tar_cue() change argument similar to change in drake::trigger() #130

tar_cue() change argument similar to change in drake::trigger() #130

Comments

jaredlander commented Jul 29, 2020

Prework

Proposal

wlandau commented Jul 29, 2020

jaredlander commented Jul 29, 2020

wlandau commented Jul 29, 2020

jaredlander commented Jul 29, 2020

wlandau commented Jul 29, 2020 • edited Loading

jaredlander commented Jul 29, 2020

wlandau commented Aug 3, 2020

multimeric commented Sep 26, 2022

wlandau commented Jul 29, 2020 •

edited

Loading