Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different graphs when using pipes or traditional #335

Closed
rkrug opened this issue Mar 21, 2018 · 16 comments
Closed

Different graphs when using pipes or traditional #335

rkrug opened this issue Mar 21, 2018 · 16 comments

Comments

@rkrug
Copy link
Contributor

rkrug commented Mar 21, 2018

Please see last three graphs in attached html - am I missing something, or are pipes and traditional handled differently?

DrakeMake.html.zip

@wlandau
Copy link
Collaborator

wlandau commented Mar 21, 2018

It looks like you have at least two different versions of rcm_make_parameter_H_W_1999_Fig_41(), one containing rcm_make_parameter() and the other containing no other imports. Otherwise, I am not sure. It looks like this has more to do with these functions rather than magrittr pipes.

@rkrug
Copy link
Contributor Author

rkrug commented Mar 21, 2018

Nope - the resulting configs are different:

> config_pipe <- plan %>% 
+   drake_config(verbose = TRUE)
> config <- drake_config(plan, verbose = TRUE)
> sapply(names(config), function(n){identical(config[[n]], config_piupe[[n]])})
              plan            targets              envir              cache 
              TRUE               TRUE              FALSE              FALSE 
        cache_path        fetch_cache        parallelism               jobs 
              TRUE               TRUE               TRUE               TRUE 
           verbose               hook            prepend            prework 
              TRUE               TRUE               TRUE               TRUE 
           command               args     recipe_command              graph 
              TRUE               TRUE               TRUE              FALSE 
   short_hash_algo     long_hash_algo               seed            trigger 
              TRUE               TRUE               TRUE               TRUE 
           timeout                cpu            elapsed            retries 
              TRUE               TRUE               TRUE               TRUE 
      imports_only       skip_imports skip_safety_checks       log_progress 
              TRUE               TRUE               TRUE               TRUE 
         lazy_load       session_info     cache_log_file            caching 
              TRUE               TRUE               TRUE               TRUE 
         evaluator         keep_going 
              TRUE               TRUE 
> 

@rkrug
Copy link
Contributor Author

rkrug commented Mar 21, 2018

And a reproducible example:

> plan <- drake_plan(x = 1)
> config <- drake_config(plan, verbose = TRUE)
> config_pipe <- plan %>% 
+   drake_config(verbose = TRUE)
> sapply(names(config), function(n){identical(config[[n]], config_piupe[[n]])})
              plan            targets              envir              cache         cache_path        fetch_cache 
             FALSE              FALSE              FALSE              FALSE               TRUE               TRUE 
       parallelism               jobs            verbose               hook            prepend            prework 
              TRUE               TRUE               TRUE               TRUE               TRUE               TRUE 
           command               args     recipe_command              graph    short_hash_algo     long_hash_algo 
              TRUE               TRUE               TRUE              FALSE               TRUE               TRUE 
              seed            trigger            timeout                cpu            elapsed            retries 
              TRUE               TRUE               TRUE               TRUE               TRUE               TRUE 
      imports_only       skip_imports skip_safety_checks       log_progress          lazy_load       session_info 
              TRUE               TRUE               TRUE               TRUE               TRUE               TRUE 
    cache_log_file            caching          evaluator         keep_going 
              TRUE               TRUE               TRUE               TRUE 

@wlandau
Copy link
Collaborator

wlandau commented Mar 21, 2018

I think that is because drake_config()'s default environment is parent.frame(), which is different if you use the pipe. You could explicitly use the envir argument.

plan <- drake_plan(x = 1)
envir <- environment()
config <- drake_config(plan, envir = envir)
config_pipe <- plan %>% 
  drake_config(envir = envir)

You can expect the graph and cache elements to still not be identical because they create environments of their own, but the nodes and edges of the graphs should agree.

@rkrug
Copy link
Contributor Author

rkrug commented Mar 21, 2018

It can get even worse:

> library(drake)
> library(magrittr)
> expose_imports(graphics)
<environment: R_GlobalEnv>
> plan <- drake_plan(x = plot(1))
> config_pipe <- plan %>% 
+   drake_config(verbose = TRUE)
> config <- drake_config(plan, verbose = TRUE)
Error in validObject(.Object) : 
  invalid classScriptNodeInfoobject: invalid object for slot "removes" in class "ScriptNodeInfo": got class "list", should be or extend class "character"

Is there any way of excluding pipes, or raising an error if drake_config() is called within a pipe, until the environment question is solved?
I really like pipes...

@wlandau
Copy link
Collaborator

wlandau commented Mar 21, 2018

Sounds like a problem with the code analysis. But either way, I strongly discourage expose_imports() for external packages, especially base packages like graphics. Package reproducibility is much better handled with packrat and/or containerization (Docker/Singularity). Both are compatible with drake.

@rkrug
Copy link
Contributor Author

rkrug commented Mar 21, 2018

Don't worry - I am not using it. It was just an example. One could actually check if the package is a base package and than throw a warning in the expose_imports() function. I completely agree that packrat / docker / etc are much better tools for this.

I am using expose_imports() on a companion package which I also develop - so no problem there.

But am I correct in assuming, that expose_imports() does not work when using pipes?

@wlandau-lilly
Copy link
Collaborator

wlandau-lilly commented Mar 21, 2018

It may work if you also pass an explicit envir argument to expose_imports().

envir <- environment()
"your_pkg" %>% expose_imports(envir = envir)

@wlandau
Copy link
Collaborator

wlandau commented Mar 21, 2018

Re: #335 (comment): this is a problem in CodeDepends, not drake. See duncantl/CodeDepends#27.

CodeDepends::getInputs(body(graphics::close.screen))
#> Warning: replacing previous import 'graph::addNode' by 'XML::addNode' when
#> loading 'CodeDepends'
#> Warning: replacing previous import 'graph::plot' by 'graphics::plot' when
#> loading 'CodeDepends'
#> Error in validObject(.Object): invalid class "ScriptNodeInfo" object: invalid object for slot "removes" in class "ScriptNodeInfo": got class "list", should be or extend class "character"

@wlandau
Copy link
Collaborator

wlandau commented Mar 21, 2018

Confirmed: explicitly passing envir makes the behavior the same with or without the pipe. I am almost positive that it can solve your original example.

library(drake)
#> Warning: replacing previous import 'graph::addNode' by 'XML::addNode' when
#> loading 'CodeDepends'
#> Warning: replacing previous import 'graph::plot' by 'graphics::plot' when
#> loading 'CodeDepends'
library(magrittr)
expose_imports(graphics)
#> <environment: namespace:graphics>
envir <- environment()
plan <- drake_plan(x = plot(1))
config <- drake_config(plan = plan, envir = envir)
#> Error in validObject(.Object): invalid class "ScriptNodeInfo" object: invalid object for slot "removes" in class "ScriptNodeInfo": got class "list", should be or extend class "character"
config <- plan %>% drake_config(envir = envir)
#> Error in validObject(.Object): invalid class "ScriptNodeInfo" object: invalid object for slot "removes" in class "ScriptNodeInfo": got class "list", should be or extend class "character"

@wlandau wlandau closed this as completed Mar 21, 2018
@wlandau
Copy link
Collaborator

wlandau commented Mar 21, 2018

Update: with duncantl/CodeDepends@dfe31e7, the error from #335 (comment) should go away.

@rkrug
Copy link
Contributor Author

rkrug commented Mar 22, 2018

OK - so that one should be fixed.

But I think it is a bit worrying, that in a tool like drake, which aims at reproducability, the use of pipe versus classical programming makes a difference - even if the difference is only in the graphs created, and the targets remain the same.

I think that should be addressed. One solution would be to internally use a pipe to make the traditional behave as the pipe? Rename drake_config() to .drake_config(plan, ARGUMENTS) and than

drake_config <- function(plan, ARGUMENTS) {
   plan %>% 
        .drake_config(ARGUMENTS) %>%
        return
}

@wlandau
Copy link
Collaborator

wlandau commented Mar 22, 2018

But I think it is a bit worrying, that in a tool like drake, which aims at reproducibility, the use of pipe versus classical programming makes a difference - even if the difference is only in the graphs created, and the targets remain the same.

I think reproducibility is a different set of issues:

  • Do repeated runs of the same code produce the same results?
  • Could someone else run your code on a different machine and get the same results?
  • Could someone else understand and independently replicate your work to get the same results?

If the code changes, we should expect that the results may change, even in a reproducible workflow.

I think that should be addressed. One solution would be to internally use a pipe to make the traditional behave as the pipe? Rename drake_config() to .drake_config(plan, ARGUMENTS) and than

The issue is not with the pipe, but the default value of the envir argument. By default, envir is parent.frame(). Ordinarily, parent.frame() is the environment in which the function was called. In the case of a pipe, it parent.frame() is the environment inside the function call of the previous step of the pipe. Even if we define drake_config() and .drake_config() separately, drake_config(plan) and plan %>% drake_config() could still be different if you do not pass an envir argument explicitly.

@rkrug
Copy link
Contributor Author

rkrug commented Mar 22, 2018

Do repeated runs of the same code produce the same results?

This is given, if I assume the code is the code which

  • generates the plan
  • creates the config for drake (call to drake_config()
  • compiles the targets by calling make()

is exactly the same.

Could someone else run your code on a different machine and get the same results?

OK.

Could someone else understand and independently replicate your work to get the same results?

No - as pipes are assumed to be equivalent to traditional R - and this is here not the case (I also see the dependency graph as a result which should be reproducible - not only the targets).

if you do not pass an envir argument explicitly

This is a critical issue here. As far as I am aware, pipes and traditional R programming are considered to be equivalent - which they are not here in drake. Setting the envir argument is, in my opinion, a workaround which should not be necessary as it is not intuitive.

@wlandau
Copy link
Collaborator

wlandau commented Mar 22, 2018

From the magrittr documentation:

Basic piping
  • x %>% f is equivalent to f(x)
  • x %>% f(y) is equivalent to f(x, y)
  • x %>% f %>% g %>% h is equivalent to h(g(f(x)))

Here, "equivalent" is not technically exact: evaluation is non-standard, and the left-hand side is evaluated before passed on to the right-hand side expression. However, in most cases this has no practical implication.

I will see if the tidyverse folks think edge cases with environments are worth fixing.

@rkrug
Copy link
Contributor Author

rkrug commented Mar 22, 2018

Thanks. I surely think they are, as subtle errors can creep in which will be quite difficult to identify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants