New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ceci n'est pas un pipe. #748
Conversation
Codecov Report
@@ Coverage Diff @@
## master #748 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 73 74 +1
Lines 6217 6273 +56
=====================================
+ Hits 6217 6273 +56
Continue to review full report at Codecov.
|
library(drake)
drake_plan(
result = target(
task1(data, analysis) %dp%
task2(),
transform = map(analysis = c("bayes", "freq"))
),
trace = TRUE
)
#> # A tibble: 4 x 4
#> target command analysis result
#> <chr> <expr> <chr> <chr>
#> 1 result_.bayes..1 task1(data, "bayes") "\"bayes\"" result_.bayes.
#> 2 result_.bayes. task2(result_.bayes..1) "\"bayes\"" result_.bayes.
#> 3 result_.freq..1 task1(data, "freq") "\"freq\"" result_.freq.
#> 4 result_.freq. task2(result_.freq..1) "\"freq\"" result_.freq. Created on 2019-02-20 by the reprex package (v0.2.1) |
Never mind, we should be okay because all transforms happen before any |
Confirmation of #748 (comment): library(drake)
plan <- drake_plan(
result = target(
task1(data, analysis) %dp%
task2() %dp%
task3(),
transform = map(analysis = c("bayes", "freq"))
),
end = target(
list(result),
transform = combine(result)
)
)
config <- drake_config(plan)
vis_drake_graph(config) Created on 2019-02-20 by the reprex package (v0.2.1) |
More intuitive to go in *increasing* numerical order
As I thought, scanning for library(drake)
x_vals = as.numeric(1:1000)
microbenchmark::microbenchmark(
plan = drake_plan(
x2 = target(x1, transform = map(x1 = !!x_vals)),
x3 = target(x2, transform = map(x2)),
x4 = target(x3, transform = map(x3)),
x5 = target(x4, transform = map(x4)),
x6 = target(x5, transform = map(x5)),
x7 = target(x6, transform = map(x6)),
x8 = target(x7, transform = map(x7)),
x9 = target(x8, transform = map(x8)),
x10 = target(x9, transform = map(x9)),
x11 = target(x10, transform = combine(x10))
)
) With 78de526: Unit: milliseconds
expr min lq mean median uq max neval
plan 528.6575 552.171 567.9617 562.8718 578.5794 638.1616 100 With decc787: Unit: milliseconds
expr min lq mean median uq max neval
plan 543.5965 565.4996 580.2999 573.7455 592.0488 654.019 100 We should probably pause and gather more information. How many people will use this? If enough of the community gets behind it, I will merge. |
Hmmm... oddly enough, this is another case where the pipe could help condense things. The following plan is equivalent to the one from #748 (comment). drake_plan(
x10 = target(
x1 %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity,
transform = map(x1 = !!x_vals)
),
x11 = target(x10, transform = combine(x10))
) Of course, we would replace |
...but the benchmarks are disappointing. Further work could probably speed it up though. library(drake)
x_vals = as.numeric(1:1000)
microbenchmark::microbenchmark(
plan = drake_plan(
x10 = target(
x1 %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity %dp%
identity,
transform = map(x1 = !!x_vals)
),
x11 = target(x10, transform = combine(x10))
)
)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> plan 2.270687 2.443403 2.600197 2.565977 2.71475 3.250199 100 |
|
Nope, we can't do that. If grouping variables appear in the middle of a pipe chain, we need to make sure the transformation applies to the entire chain. The best way to implement this is to just evaluate the pipe afterwards. Otherwise, we would have to manually track provenance, and things would get ugly fast. drake::drake_plan(
result = target(
data %dp%
task1() %dp%
task2(x, .) %dp%
task3(),
transform = map(x = c(1, 2))
)
)
#> # A tibble: 8 x 2
#> target command
#> <chr> <S3: expr_list>
#> 1 result_1_2 "data "
#> 2 result_1_3 "task1(result_1_2) "
#> 3 result_1_4 task2(1, result_1_3)
#> 4 result_1 "task3(result_1_4) "
#> 5 result_2_2 "data "
#> 6 result_2_3 "task1(result_2_2) "
#> 7 result_2_4 task2(2, result_2_3)
#> 8 result_2 "task3(result_2_4) " Created on 2019-03-19 by the reprex package (v0.2.1) |
Closing re #746 (comment). If we ever want to come back (which I doubt we will) the code is in the diff. |
Summary
This PR implements an experimental
%dp%
pipe operator for generating targets. Taken mostly fromsplit_chain()
inmagrittr
. Still needs profiling. cc @billdenney.I am still not sure this is a good idea. We should give it some time and think it over.
Created on 2019-02-20 by the reprex package (v0.2.1)
Related GitHub issues and pull requests
Checklist
drake
's code of conduct, and I agree to follow its rules.testthat
unit tests totests/testthat
to confirm that any new features or functionality work correctly.devtools::check()