Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar_cue() change argument similar to change in drake::trigger() #130

Closed
3 tasks done
jaredlander opened this issue Jul 29, 2020 · 8 comments
Closed
3 tasks done

tar_cue() change argument similar to change in drake::trigger() #130

jaredlander opened this issue Jul 29, 2020 · 8 comments
Assignees

Comments

@jaredlander
Copy link

Prework

  • I understand and agree to targets' code of conduct.
  • I understand and agree to targets' contributing guidelines.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.

Proposal

While it seems like dynamic files makes it easy to trigger targets when a file on disc changes, there does not seem to be a way to do the same for when a database table changes, at least as far as I can tell.

Per Chapter 4 of the {drake} book I would do something essentially like this.

library(DBI)

# Connection objects are brittle, so they should not be targets.
# We define them up front, and we use ignore() to prevent
# drake from rerunning targets when the connection object changes.
con <-  dbConnect(...)

plan <- drake_plan(
  data = target(
     dbReadTable(ignore(con), "my_table"), # Use ignore() for db connection objects.
     trigger = trigger(change = somehow_get_db_timestamp()) # Define yourself.
  ),
  preprocess = my_preprocessing(data) # runs when the data change
)

I would imagine something similar to the change argument could be added to tar_cue(). According to my preliminary reading of the help file, tar_cue() only has logical arguments and nowhere to evaluate code such as somehow_get_db_timestamp().I suppose we could change the value passed to mode based on an earlier target, but haven't tested that.

@jaredlander jaredlander changed the title command argument for tar_cue() change argument for tar_cue() Jul 29, 2020
@jaredlander jaredlander changed the title change argument for tar_cue() tar_cue() change argument similar to command in drake::trigger() Jul 29, 2020
@wlandau wlandau changed the title tar_cue() change argument similar to command in drake::trigger() tar_cue() change argument similar to change in drake::trigger() Jul 29, 2020
@wlandau
Copy link
Member

wlandau commented Jul 29, 2020

The change trigger from drake is one of those features I am deliberately excluding from targets. It turns out to be super complicated to implement and maintain, and we do not actually gain much because it is almost always possible to find workarounds that do not use it. You can achieve the same effect with a global variable or in _targets.R or a dependency on an upstream target in the pipeline that always runs (with tar_cue(mode = "always")).

@wlandau wlandau closed this as completed Jul 29, 2020
@jaredlander
Copy link
Author

You can achieve the same effect with a global variable or in _targets.R or a dependency on an upstream target in the pipeline that always runs (with tar_cue(mode = "always")).

I thought about that last night and almost didn't file the feature request but I couldn't quite figure it out. I thought to make a target that always checked the database time. But I wasn't sure how to get a downstream target to depend on it. I built the following pipeline.

library(targets)
source("R/functions.R")
con <- DBI::dbConnect()
tar_options(packages = c("DBI"))
tar_pipeline(
    tar_target(
        database_update_time,
        somehow_get_db_timestamp(),
        cue=tar_cue(mode='always')
    ),
    tar_target(
        read_data,
        dbReadTable(con, 'mytable')
    )
)

I'm not sure how read_data can check on the status of database_update_time. Would it be something crude like

tar_target(
        read_data,
        if(database_update_time){ dbReadTable(con, 'mytable') }
    )

or am I missing something very obvious?

@wlandau
Copy link
Member

wlandau commented Jul 29, 2020

The key is to force read_data to depend on database_update_time. You could do this either with a global object or a target. Either way, please be sure to check tar_glimpse() or tar_visnetwork() so you know the dependency relationships are correct. Also, transient objects like con tend to invalidate easily and not apply to external R processes, so I would generate them within targets as opposed to up front. The following two options are basically equivalent.

# _targets.R
library(targets)
tar_options(packages = c("DBI"))

read_db <- function(database_update_time) {
  con <- DBI::dbConnect()
  on.exit(close(con))
  dbReadTable(con, 'mytable')
}

tar_pipeline(
    tar_target(
        database_update_time,
        somehow_get_db_timestamp(),
        cue=tar_cue(mode='always')
    ),
    tar_target(
        read_data,
        read_db(database_update_time)
    )
)
# _targets.R
library(targets)
tar_options(packages = c("DBI"))

read_db <- function(database_update_time) {
  con <- DBI::dbConnect()
  on.exit(close(con))
  dbReadTable(con, 'mytable')
}

database_update_time <- somehow_get_db_timestamp(),

tar_pipeline(
    tar_target(
        read_data,
        read_db(database_update_time)
    )
)

I put the code in a custom read_db() function just for convenience.

@jaredlander
Copy link
Author

Thanks for the info. I had a feeling I would need to write a function that wrapped dbReadTable() in order to pass in a target. But you allude to that being just for convenience. Does that mean it's possible not to have to create a custom function?

Also, transient objects like con tend to invalidate easily and not apply to external R processes, so I would generate them within targets as opposed to up front.

This is different than how I was using {drake} so good to note, thank you. And from your example I assume you mean as part of the target using it, not its own target.

@wlandau
Copy link
Member

wlandau commented Jul 29, 2020

Does that mean it's possible not to have to create a custom function?

Yes, a target's command can be an arbitrary code chunk. I just find functions cleaner and clearer in most situations.

# _targets.R
library(targets)
tar_options(packages = c("DBI"))

database_update_time <- somehow_get_db_timestamp()

tar_pipeline(
    tar_target(
        read_data, {
          database_update_time # Mention the symbol to enforce the dependency relationship.
          con <- DBI::dbConnect()
          on.exit(close(con))
          dbReadTable(con, 'mytable')
    })
)

And from your example I assume you mean as part of the target using it, not its own target.

Yes, exactly. A non-exportable object such as a connection object should be defined as part of the command of the target that needs it. targets has parallel computing capabilities, so it's not always clear that a target will run on the same process or even the same computer as your local R session.

@jaredlander
Copy link
Author

Yes, a target's command can be an arbitrary code chunk. I just find functions cleaner and clearer in most situations.
Ah, you meant putting a few commands in a curly brace block, got it.

Yes, exactly. A non-exportable object such as a connection object should be defined as part of the command of the target that needs it. targets has parallel computing capabilities, so it's not always clear that a target will run on the same process or even the same computer as your local R session.

Cool. Thanks for the info. This tackles this problem for me.

@wlandau
Copy link
Member

wlandau commented Aug 3, 2020

As I alluded to in #131, I think we can address your original request externally with ropensci/tarchetypes#2.

@multimeric
Copy link

Okay this is cool. So you can use tarchetypes::tar_change() to resolve this, which acts like a tar_target, but has a change argument which contains some code that will force a rerun of the command if it evaluates to something different. The implementation seems to involve splitting the target into two targets, one which does the invalidation, and a second task dependent on the first that runs the actual command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants