Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate rlang tidy evaluation in drake #200

Closed
kendonB opened this issue Jan 30, 2018 · 20 comments
Closed

Incorporate rlang tidy evaluation in drake #200

kendonB opened this issue Jan 30, 2018 · 20 comments

Comments

@kendonB
Copy link
Contributor

kendonB commented Jan 30, 2018

Ref: https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html

The !! operator would be especially useful when building up complicated commands.

This, for example, works in tibble, which is my current workaround:

> library(tibble)
> library(drake)
>   
> little_b <- "b"

> tibble(target = "letter", command = !!little_b)
# A tibble: 1 x 2
  target command
  <chr>  <chr>  
1 letter b    


> drake_plan(letter = !!little_b)
  target      command
1 letter !(!little_b)

Using tibble in the background might be a good way to achieve this without much pain.

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

This could even help with @AlexAxthelm's #77 and #79, which are dplyr-focused. I am new to tidy evaluation, and @lionel-'s webinar last month was a great first exposure. I am super excited for @hadley's Extending the Tidyverse workshop this week, and I may be better versed in it afterwards.

I expect reimplementing drake_plan() will be a good first exercise. make() is a whole other story. To support tidy evaluation of drake workflow commands, there is some tricky sensitive work to do in run.R, particularly in with_timeout() and wrap_in_try_statement(). Looking back, this is a rather inelegant part of the code base. I expect tidy evaluation to have a better way of evaluating commands. Requirements:

  1. On success, return the target's value and make sure it is assigned to the target itself in config$envir.
  2. On error, return an informative object of class "error" just like evaluate::try_capture_stack(). This is essential to drake::diagnose(), which gives users access to the complete error logs of failed targets.
  3. Play nicely with R.utils::withTimeout().

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

On the other hand, changing how commands are evaluated in make() would be a big move. I need to step back and think about the big picture. Drake is tidyverse-friendly in the sense that the data frame is the first-class citizen: workflow plans are data frames, not remake YAML files or Makefiles. But elevating drake to be a proper extension of the tidyverse is something else. If done correctly, this will major contribution to the user experience. If done poorly, it will cause more unintuitive behavior that requires extra documentation. Drake is an unfamiliar idea, and most people already find it difficult to learn.

A major strength of the current drake is that a command is just an ordinary code chunk. Evaluating it in make() is the same as evaluating it in your console (provided that your dependencies are loaded in your environment: loadd(..., deps = TRUE)). Building on top of the tidy evaluation would make those two modes of execution different. I am not sure I am comfortable with that. On the other hand, if we do make the change, then updates to the documentation will be easy because the section on diagnosing failed targets is already almost there.

@kendonB
Copy link
Contributor Author

kendonB commented Jan 30, 2018

This is certainly a big move and you would probably want it on it's own branch to start.

I am not so concerned about code running differently from the console vs in make as the use of this feature is really just for power users who would be able to navigate fine. The nse interface in rlang is so particular that no one would put that in inadvertently.

One thing I've found a couple of times is that I want to create commands that depend on targets. For example, a regression command depends on the names of an intermediate dataset that only exists as a target. I workaround this by calling make twice. Once after the intermediate dataset is built. tidy evaluation might allow users to overcome these awkward workarounds with slightly less awkward workarounds.

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

Yeah, I guess tidy evaluation does not totally overturn evaluation itself. And if the code in actual commands could depend on the values of previous targets, #77 would have a lot more potential.

wlandau pushed a commit that referenced this issue Jan 30, 2018
Begins to address #200. Tries to support the !! operator
and quosures.
@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

From this blog, it should be enough to replace my use of quote() with rlang::expr(), as far as make() is concerned. In fact, c89f028#diff-74379f67ceff780b514be78e35c7e21cL101 should have done the trick. Unfortunately, I rely on formatR::tidy_source() to tidy up commands here, and there is an odd bug:

formatR::tidy_source(text = "!!little_b", output = FALSE)$text.tidy
## [1] "!(!little_b)"

Is there a pre-packaged equivalent that is more tidyverse-compliant?

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

Tidying commands is important because that's how drake ignores changes to comments and whitespace. Unfortunately, tidying text is likely to involve some form of deparse(parse(...)). (See this StackOverflow post.)

For maximum compatibility and increased safety all around, here is what I am thinking: the command that is reformatted with tidy_command() need not be the command that is actually evaluated. There should really be two versions of each command:

  1. An evaluation command, which is as close to the command in the original workflow plan as possible. This is the command that will run and actually evaluate the target. There is no need to cache/hash it because the workflow plan already has it.
  2. A standardized command. Standardization is a kind of reformatting that strips away non-standard whitespace and comments. This is what allows drake to avoid triggering excessive rebuilds from changes to whitespace and comments.

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

Another setback: non-lexical scoping in tidy eval:

my_plan <- drake_plan(list = c(
  little_b = "\"b\"",
  letter = "!!little_b"
))
make(my_plan, envir = new.env(parent = globalenv()))

## fail letter

diagnose(letter)

## [1] "object 'little_b' not found"

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

On the plus side, the following works in d8431e0 since all the environments agree. We're close.

my_plan <- drake_plan(list = c(
  little_b = "\"b\"",
  letter = "!!little_b"
))
make(my_plan)
readd(letter)

## [1] "b"

Now, is there a way to force a quasiquotation to use old-fashioned lexical scoping?

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

What we really need is a way to decide the environment where quasiquotation symbols are resolved. See this post.

wlandau pushed a commit that referenced this issue Jan 30, 2018
Quasiquotation with !! is enabled in commands, etc.
Supports #200.
@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

ccd967c seems to work for all parallel computing options currently implemented. Confirmation:

install.packages("drake", type = "source", repos = NULL)
devtools::load_all("drake")
for(s in testing_scenario_names()){ # parallel computing and global/custom environment scenarios
  set_testing_scenario(s)
  print(get_testing_scenario())

test_with_dir("make() does tidy eval in commands", {
  con <- dbug()
  con$plan <- drake_plan(list = c(
    little_b = "\"b\"",
    letter = "!!little_b"
  ))
  con$targets <- con$plan$target
  testrun(con)
  expect_equal(readd(letter), "b")
})
}

Now, drake_plan() needs just work, and we're all set.

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

@kendonB if !! is evaluated indrake_plan(), then there would no longer be a !! to evaluate in make(). Users would need the list argument of drake_plan() to suppress the evaluation of !! and save it for make(). Are you okay with that?

@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

From this discussion, the only option may be to evaluate !! right away.

@AlexAxthelm
Copy link
Collaborator

I'm curious about the use case here. It seems that the goal here is to use a evaluate an object while creating a plan, so that plans can be altered dynamically, but isn't that the domain of evaluate_plan()? I agree that it can be a little kludgy for the toy example here, and I know that I've been running into issues with multiple rules not expanding the way I want (will file separately), but I believe that this is the equivalent in current drake paradigm:

library("drake")
little_b <- "b"
letter_template <- drake_plan(
	letter = letter__
	)
letter_plan <- evaluate_plan(
	letter_template,
	rules = list(letter__ = little_b)
	)
letter_plan
#    target command
#1 letter_b       b

I agree that being tidy-friendly is a good thing, but I worry about what to do if I want to do if I want to delay evaluation of a variable until make. Personally, I often make plans along the lines of

drake_plan(foo = bar %>% summarize(baz = mean(rlang::UQ(as.name("baz")))

in order for drake to not identify baz as a dependency, when it is actually just a column.

wlandau pushed a commit that referenced this issue Jan 30, 2018
@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

Since you brought that up, I am planning to go ahead and use tidy evaluation in ... by default, but implement a tidy_evaluation argument in drake_plan() to disable it. From this post, I think recovering !!x unevaluated would be really messy if it is possible at all, which makes me think it would be safer to just embrace tidy evaluation. For most users, the current lack of tidy evaluation is a waste: the other parsing techniques I know just turn !!x into !(!x), so we're in a "use it or lose it" situation there.

wlandau pushed a commit that referenced this issue Jan 30, 2018
Support tidy evaluation (solve #200)
@wlandau
Copy link
Collaborator

wlandau commented Jan 30, 2018

Closed via #202. Thanks @kendonB and @AlexAxthelm!

@kendonB
Copy link
Contributor Author

kendonB commented Mar 6, 2018

Coming back to this now. As far as I can see, there is still no way to delay tidy evaluation until the making stage? The use case of constructing commands that depend on the values of targets is then still unavailable in drake?

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 6, 2018

Seems so. I have proposed some ideas in #233.

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 6, 2018

It seems worthwhile to incorporate concepts from tidy evaluation there.

@wlandau
Copy link
Collaborator

wlandau commented Mar 7, 2018

I agree that like so many other drake issues, the right approach is #233. In the short term, you can delay evaluation by writing your commands as text (clearly not ideal in the long run).

library(drake)
plan <- drake_plan(list = c(index = "sample.int(26, 1)", letter = "LETTERS[!!index]"))
make(plan)
#> cache /tmp/Rtmp4BtHQk/.drake
#> connect 1 import: plan
#> connect 2 targets: index, letter
#> check 2 items: LETTERS, sample.int
#> check 1 item: index
#> target index
#> check 1 item: letter
#> target letter
readd(letter)
#> cache /tmp/Rtmp4BtHQk/.drake
#> [1] "L"

@AlexAxthelm
Copy link
Collaborator

nb: I have a stopgap workaround that sort of works for me, but it involves calling make on 2 plans, and only works on simplistic expansions.

I use one plan to generate the data and create the levels for my splits, and then another to evaluate those levels. It loses the explicit dependency between the data source and the final analysis, but for something where there are changing levels, or the levels aren't known prior to runtime, I haven't found anything better in the current paradigm.

An example:

library("drake")
library("dplyr")
write.csv(iris, "iris.csv")

data_and_levels <- drake_plan(strings_in_dots = "literals",
  data = read.csv(file_in("iris.csv")),
  species_list = data %>% pull("Species") %>% unique()
)

make(data_and_levels)
loadd(species_list)

process_data_template <- drake_plan(strings_in_dots = "literals",
  species_data = dplyr::filter(data, Species == "XXSPECIES"),
  species_analysis = lm(
    formula = Sepal.Length ~ Sepal.Width + Petal.Length,
    data =species_data_XXSPECIES
  )
)

process_data <- evaluate_plan(
  plan = process_data_template,
  rules = list(XXSPECIES = species_list)
)

second_plan <- process_data %>%
  bind_rows(gather_plan(., target = "end"))

vis_drake_graph(drake_config(second_plan))

make(second_plan)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants