New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using drake with RStudio job launcher #807
Comments
|
Thanks for investigating. I do not have access to the RStudio Job Launcher, but I will try to follow along. Could we try something even simpler? What if we used the following library(drake)
library(readr)
import_data <- function(infile) {
suppressMessages(read_csv(infile))
}
stopifnot(file.exists("data/mtcars.csv"))
plan <- drake_plan(data = import_data(file_in("data/mtcars.csv")))
prefix <- paste(Sys.info()["nodename"], proc.time()["elapsed"], stringi::stri_rand_strings(1, 10), sep = "-")
make(
plan,
cache_log_file = paste0(prefix, "-cache.log",
console_log_file = paste0(prefix, "-console.log" # much more useful post-#808
)
config <- drake_config(plan)
vis_drake_graph(config, file = paste0(prefix, ".png")) # requires webshot::install_phantomjs() firstWhat do the results look like if you run the following with a fresh session and cache? callr::rscript("make.R", show = TRUE)
rstudioapi::jobRunScript("make.R")And what if we replace
I noticed that the dependency profile of So it looks like |
|
Thanks for the advice! I tried your suggestions and added them here, but maybe it's too minimal? I didn't see the same behavior I'm seeing in the larger example.
I actually don't have access to this either, but until now I've considered the local job launcher in RStudio to be a great way to spawn background processes in new, clean sessions.
I agree, and this is what led me to be suspicious of the environment. Just a shot in the dark, but I think the job launcher uses sourceWithProgress <- function(script, # path to R script
...
){
# create a new enviroment to host any values created; make its parent the global env so any
# variables inside this function's environment aren't visible to the script
sourceEnv <- new.env(parent = globalenv())
...
# evaluate the statement
eval(statements[[idx]], envir = sourceEnv)
...
}Edit (sent too soon): So maybe this is why the |
Could be. One thing you could try now is replacing the minimal plan with the larger one you started with and keep everything else the same.
Could easily be. It is difficult to micromanage the environment in which you In your original example, you had a top-level |
|
I think you're right @wlandau! Moving the more complete plan into a single I then moved one function into a separate file and |
|
Thanks for the thorough and pedantic detective work, @gadenbuie. I think we know what is going on now. Also, your original solution of
Yeah, |
Prework
drake's code of conduct.Description
I have a drake workflow with a long-running step and in my pre-drake life I would use the RStudio Job Launcher to run these kinds of tasks in the background so I can keep coding in my console.
I've discovered, though, that without additional work, running
drake::make()as an RStudio background job invalidates targets that depend on functions sourced into the global environment. Runningdrake::make()ordrake::r_make()from a standard environment after runningmake()inside the RStudio job launcher will again invalidate old but up-to-date targets, including those just built by the job launcher.It took me quite a bit of poking around to be able to pare down the problem to the reproducible example in this repo, but in doing so I think I've uncovered that the invalidating results from some minor changes to the global environment that RStudio uses to monitor script progress and update the job viewer.
A side-effect of this is that all of the hashes in the cache log are the same between steps, so it's not immediately obvious why the targets are invalidating. Some digging with
deps_profile()eventually provide clues that something is amiss. In the end, using an environment for the function dependencies clears up the problem.I'm sharing here because I thought you might like to know and to find out if I'm doing anything very wrong in my setup. I'm not sure if there's anything you can or would want to do about this, other than to document the issue, and with RStudio 1.2 releasing soon (or eventually) I would imagine that other drake users will try to do what I did and be equally confused. Also, if the workflow and solution in my reprex repo seems reasonable, I'll wrap it up into a blog post.
The text was updated successfully, but these errors were encountered: