Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Play nicer with devtools::load_all() #30

Closed
wlandau-lilly opened this issue Apr 14, 2017 · 6 comments
Closed

Play nicer with devtools::load_all() #30

wlandau-lilly opened this issue Apr 14, 2017 · 6 comments

Comments

@wlandau-lilly
Copy link
Collaborator

Packages loaded by the user before make() need to be loaded again for each process in some of the more advanced forms of parallel computing. That's the whole point of the packages and prework arguments.

Maybe drake should try require() first and then devtools::load_all(pkg = path_to_package) if that fails. But then I need to know path_to_package, so it may not be such a trivial fix. But since packages are loaded before any prework is done or any targets are made, this is not likely to exacerbate #13.

@wlandau-lilly wlandau-lilly changed the title Automatically detect and load packages loaded with devtools::load_all() Automatically detect and correctly load packages loaded with devtools::load_all() Apr 14, 2017
@wlandau-lilly wlandau-lilly changed the title Automatically detect and correctly load packages loaded with devtools::load_all() Detect if a package should be loaded with devtools::load_all() instead of library() May 4, 2017
@wlandau-lilly wlandau-lilly changed the title Detect if a package should be loaded with devtools::load_all() instead of library() Automatically detect which packages should be loaded with devtools::load_all() instead of library() May 4, 2017
@wlandau-lilly
Copy link
Collaborator Author

wlandau-lilly commented Jun 9, 2017

There are apparently more problems with devtools::load_all() and drake:

system("git clone git@github.com:wlandau/JeffLeekMeme")
## ...
library(drake)
prework = "devtools::load_all('JeffLeekMeme')" # git clone git@github.com:wlandau/JeffLeekMeme
myplan = plan(x = JeffLeek(), y = JeffLeek())
myplan
##   target    command
## 1      x JeffLeek()
## 2      y JeffLeek()
## make(myplan, prework = prework, parallelism = "parLapply", jobs = 2)
## starting worker pid=26093 on localhost:11020 at 12:37:11.555
## starting worker pid=26124 on localhost:11020 at 12:37:11.868
## Loading JeffLeekMeme
## Loading JeffLeekMeme
## import JeffLeek
## Warning: namespace 'JeffLeekMeme' is not available and has been replaced
## by .GlobalEnv when processing object ''
## build y
## build x
## Jeff Leek can fit a regression line with one point. And get a variance. - Brian Caffo
## Jeff Leek can fit a regression line with one point. And get a variance. - Brian Caffo

And if devtools::load_all("JeffLeekMeme") is called before make() in the code, there is an actual error if parallelism = "parLapply" and jobs = 2.

@wlandau-lilly wlandau-lilly changed the title Automatically detect which packages should be loaded with devtools::load_all() instead of library() Play nicer with devtools::load_all() Jun 9, 2017
@wlandau-lilly wlandau-lilly modified the milestone: v3.0.1 Jul 12, 2017
@wlandau-lilly
Copy link
Collaborator Author

After reflecting on duncantl/CodeDepends#16, devtools::load_all() no longer seems like a big deal. It is reasonable to have to include it in the prework rather than packages and expect possible warnings with parLapply() parallelism. Will reopen on request.

@violetcereza
Copy link
Contributor

violetcereza commented Sep 25, 2017

I don't totally understand how environments work, but I've been struggling to use drake within a package format with devtools, and it turns out that this works for me:

env = devtools::load_all()$env
drake::make(my_plan, envir = env)

I'm posting this for posterity. Please LMK if this has unseen implications or something!

EDIT: this doesn't work for Makefile parallelism however... the best solution still seems to be to source the scripts containing the functions you want, so they are attached to the global environment

@wlandau-lilly
Copy link
Collaborator Author

Interesting, I did not know you could get an environment from load_all(). What exactly went wrong for Makefile parallelism? My guess is that R treats package environments differently somehow. You may have to deep copy the environment into a different one first. Slow and inefficient, but maybe worth a shot. If it works, it should definitely go in a vignette or example.

env = devtools::load_all()$env
e <- list2env(as.list(env), parent = globalenv())
drake::make(my_plan, envir = env, parallelism = "Makefile", jobs = 2)

Also, feel free to weigh in on #6. The eventual solution could affect how this sort of situation is handled. I too like to put as much into packages as I can, including data analysis workflows.

@violetcereza
Copy link
Contributor

Ok, after a bunch of trial and error, here's a version that runs with Makefile parallelism without errors or warnings:

env <- devtools::load_all()$env
env <- list2env(as.list(env), parent = globalenv())

# Set the environment of every function in
# our package to our copied environment env.
for (n in ls(env))
  assign(n,
         envir = env,
         value =
           `environment<-`( get(n, envir = env), env )
         )

my_plan <- get_plan()

drake::make(my_plan,
            envir = env,
            jobs = 4, parallelism = "Makefile",
            packages = setdiff(.packages(), "YOURDEVTOOLSPACKAGENAME"))

Here's what the issue was: first of all, package namespace environments are treated differently than normal environments, so I had to use your line to copy the environment with list2env. I don't know if there's a more elegant way to do this, or if something within drake needs to be changed. Here is the line in drake that caused the problem: (Makefile.R)

if (identical(globalenv(), config$envir)){

Then, the issue became that functions that referenced other functions in the package couldn't see each-other. It turns out that the environment for functions is set when they are first defined, so eval(envir = ) didn't effect the execution environment within the function! To fix this, I had to manually move over the environment binding for each function within the package using environment<-. This is a hacky fix, and might break if you have non-functions defined in the environment?

Anyway: this seems like a common-ish use case for drake, and should definitely be written into a vignette. If it makes sense to you, we could also think about ways to integrate this fix into drake.

@wlandau-lilly
Copy link
Collaborator Author

wlandau-lilly commented Sep 28, 2017

I am impressed. I spent two months trying to figure out how to handle environments in the early days of drake, and I could not find a way to change the environment bindings the way you did.

However, I am hesitant to change the core internals of drake to accommodate this use case. I like leaving the evaluation environment alone (except for adding and removing built targets) so the users know exactly what they are getting. Whenever I tried reconfiguring the environment, whether the user's environment or a new isolated one, I always ran into lexical scoping problems eventually. Plus, if f <-Vectorize(...), then the original function is stored in environment(f)$FUN, so we should be careful about reassigning function environments.

If you want to write a vignette and/or an example, I would gladly accept it. I would probably even use it for my own drake workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants