Ceci n'est pas un pipe. #748

wlandau · 2019-02-20T19:26:29Z

Summary

This PR implements an experimental %dp% pipe operator for generating targets. Taken mostly from split_chain() in magrittr. Still needs profiling. cc @billdenney.

I am still not sure this is a good idea. We should give it some time and think it over.

library(drake)

drake_plan(
  result = data %dp%
    task1() %dp%
    task2(data = my_data, .) %dp%
    task3()
)
#> # A tibble: 4 x 2
#>   target   command                        
#>   <chr>    <expr>                         
#> 1 result.3 data                           
#> 2 result.2 task1(result.3)                
#> 3 result.1 task2(data = my_data, result.2)
#> 4 result   task3(result.1)

drake_plan(
  result = target(
    task1(data, analysis) %dp%
      task2(),
    transform = map(analysis = c("bayes", "freq"))
  )
)
#> # A tibble: 4 x 2
#>   target           command                
#>   <chr>            <expr>                 
#> 1 result_.bayes..1 task1(data, "bayes")   
#> 2 result_.bayes.   task2(result_.bayes..1)
#> 3 result_.freq..1  task1(data, "freq")    
#> 4 result_.freq.    task2(result_.freq..1)

^{Created on 2019-02-20 by the reprex package (v0.2.1)}

Related GitHub issues and pull requests

Ref: Feature Request: Pipe Plan? #746

Checklist

I have read drake's code of conduct, and I agree to follow its rules.
I have listed any substantial changes in the development news.
I have added testthat unit tests to tests/testthat to confirm that any new features or functionality work correctly.
I have tested this pull request locally with devtools::check()
This pull request is ready for review.
I think this pull request is ready to merge.

codecov-io · 2019-02-20T19:38:21Z

Codecov Report

Merging #748 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #748   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          73     74    +1     
  Lines        6217   6273   +56     
=====================================
+ Hits         6217   6273   +56

Impacted Files	Coverage Δ
R/api-pipe.R	`100% <100%> (ø)`
R/api-plan.R	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1fceb8...3ceaa1d. Read the comment docs.

wlandau · 2019-02-20T20:32:42Z

~~Still needs work. Grouping variables from transforms should not apply to upstream targets in a %dp% call.~~

library(drake)
drake_plan(
  result = target(
    task1(data, analysis) %dp%
      task2(),
    transform = map(analysis = c("bayes", "freq"))
  ),
  trace = TRUE
)
#> # A tibble: 4 x 4
#>   target           command                 analysis    result        
#>   <chr>            <expr>                  <chr>       <chr>         
#> 1 result_.bayes..1 task1(data, "bayes")    "\"bayes\"" result_.bayes.
#> 2 result_.bayes.   task2(result_.bayes..1) "\"bayes\"" result_.bayes.
#> 3 result_.freq..1  task1(data, "freq")     "\"freq\""  result_.freq. 
#> 4 result_.freq.    task2(result_.freq..1)  "\"freq\""  result_.freq.

^{Created on 2019-02-20 by the reprex package (v0.2.1)}

wlandau · 2019-02-20T20:34:15Z

Never mind, we should be okay because all transforms happen before any %dp% calls are resolved.

wlandau · 2019-02-20T20:49:01Z

Confirmation of #748 (comment):

library(drake)
plan <- drake_plan(
  result = target(
    task1(data, analysis) %dp%
      task2() %dp%
      task3(),
    transform = map(analysis = c("bayes", "freq"))
  ),
  end = target(
    list(result),
    transform = combine(result)
  )
)
config <- drake_config(plan)
vis_drake_graph(config)

^{Created on 2019-02-20 by the reprex package (v0.2.1)}

More intuitive to go in *increasing* numerical order

wlandau · 2019-02-22T02:14:22Z

As I thought, scanning for %dp% incurs a speed penalty when we create large plans.

library(drake)
x_vals = as.numeric(1:1000)
microbenchmark::microbenchmark(
  plan = drake_plan(
    x2 = target(x1, transform = map(x1 = !!x_vals)),
    x3 = target(x2, transform = map(x2)),
    x4 = target(x3, transform = map(x3)),
    x5 = target(x4, transform = map(x4)),
    x6 = target(x5, transform = map(x5)),
    x7 = target(x6, transform = map(x6)),
    x8 = target(x7, transform = map(x7)),
    x9 = target(x8, transform = map(x8)),
    x10 = target(x9, transform = map(x9)),
    x11 = target(x10, transform = combine(x10))
  )
)

With 78de526:

Unit: milliseconds
 expr      min      lq     mean   median       uq      max neval
 plan 528.6575 552.171 567.9617 562.8718 578.5794 638.1616   100

With decc787:

Unit: milliseconds
 expr      min       lq     mean   median       uq     max neval
 plan 543.5965 565.4996 580.2999 573.7455 592.0488 654.019   100

We should probably pause and gather more information. How many people will use this? If enough of the community gets behind it, I will merge.

wlandau · 2019-02-22T04:07:25Z

Hmmm... oddly enough, this is another case where the pipe could help condense things. The following plan is equivalent to the one from #748 (comment).

drake_plan(
  x10 = target(
    x1 %dp%
      identity %dp%
      identity %dp%
      identity %dp%
      identity %dp%
      identity %dp%
      identity %dp%
      identity %dp%
      identity %dp%
      identity,
    transform = map(x1 = !!x_vals)
  ),
  x11 = target(x10, transform = combine(x10))
)

Of course, we would replace identity() with other functions in a real application.

wlandau · 2019-02-22T04:16:12Z

...but the benchmarks are disappointing. Further work could probably speed it up though.

library(drake)
x_vals = as.numeric(1:1000)
microbenchmark::microbenchmark(
  plan = drake_plan(
    x10 = target(
      x1 %dp%
        identity %dp%
        identity %dp%
        identity %dp%
        identity %dp%
        identity %dp%
        identity %dp%
        identity %dp%
        identity %dp%
        identity,
      transform = map(x1 = !!x_vals)
    ),
    x11 = target(x10, transform = combine(x10))
  )
)
#> Unit: seconds
#>  expr      min       lq     mean   median      uq      max neval
#>  plan 2.270687 2.443403 2.600197 2.565977 2.71475 3.250199   100

wlandau · 2019-03-19T13:04:47Z

~~If we can evaluate the pipe before we apply the transforms, then we might not see a performance penalty. I will see if it is possible.~~

wlandau · 2019-03-19T13:41:44Z

Nope, we can't do that. If grouping variables appear in the middle of a pipe chain, we need to make sure the transformation applies to the entire chain. The best way to implement this is to just evaluate the pipe afterwards. Otherwise, we would have to manually track provenance, and things would get ugly fast.

drake::drake_plan(
  result = target(
    data %dp%
      task1() %dp%
      task2(x, .) %dp%
      task3(),
    transform = map(x = c(1, 2))
  )
)
#> # A tibble: 8 x 2
#>   target     command               
#>   <chr>      <S3: expr_list>       
#> 1 result_1_2 "data                "
#> 2 result_1_3 "task1(result_1_2)   "
#> 3 result_1_4 task2(1, result_1_3)  
#> 4 result_1   "task3(result_1_4)   "
#> 5 result_2_2 "data                "
#> 6 result_2_3 "task1(result_2_2)   "
#> 7 result_2_4 task2(2, result_2_3)  
#> 8 result_2   "task3(result_2_4)   "

^{Created on 2019-03-19 by the reprex package (v0.2.1)}

wlandau · 2019-03-19T13:54:22Z

Closing re #746 (comment). If we ever want to come back (which I doubt we will) the code is in the diff.

wlandau-lilly added 9 commits February 20, 2019 12:58

Sketch %dp% pipe

44b1e52

Handle null plans when processing pipe

da31c51

Add a test

f8c95f0

Add a test

28b0239

Add a test

1e7fae8

Add anonymous function support to %dp%

9ea813d

Add to a test

79ba0e9

Clean up pipe code

41d3c21

Fix a pipe

96b5aaa

wlandau added type: new feature topic: api labels Feb 20, 2019

wlandau self-assigned this Feb 20, 2019

wlandau mentioned this pull request Feb 20, 2019

Feature Request: Pipe Plan? #746

Closed

Attribute magrittr

5518028

wlandau-lilly added 2 commits February 20, 2019 15:52

Test that %dp% does not mess with the trace

8c1ee1a

Change naming order of %dp%

1295d06

More intuitive to go in *increasing* numerical order

wlandau mentioned this pull request Feb 20, 2019

Document the pipe ropensci-books/drake#63

Closed

Speed up the pipe

decc787

Exempt a line

01350ff

wlandau-lilly force-pushed the 746 branch from a820ae0 to 01350ff Compare February 22, 2019 15:48

wlandau added DO NOT MERGE ⚠️ status: uncertain labels Mar 1, 2019

wlandau removed the do not merge label Mar 11, 2019

wlandau-lilly added 2 commits March 19, 2019 09:09

Merge branch 'master' into 746

fa1e38b

Update indexing and news for the pipe

3ceaa1d

wlandau closed this Mar 19, 2019

wlandau added this to Done in A flexible, agile, domain-specific API via automation Mar 19, 2019

wlandau deleted the 746 branch March 19, 2019 13:54

wlandau mentioned this pull request Mar 22, 2019

inconsistency in naming using map transform with .id = FALSE #790

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceci n'est pas un pipe. #748

Ceci n'est pas un pipe. #748

wlandau commented Feb 20, 2019 •

edited

codecov-io commented Feb 20, 2019 •

edited

wlandau commented Feb 20, 2019 •

edited

wlandau commented Feb 20, 2019 •

edited

wlandau commented Feb 20, 2019

wlandau commented Feb 22, 2019 •

edited

wlandau commented Feb 22, 2019

wlandau commented Feb 22, 2019

wlandau commented Mar 19, 2019 •

edited

wlandau commented Mar 19, 2019

wlandau commented Mar 19, 2019

Ceci n'est pas un pipe. #748

Ceci n'est pas un pipe. #748

Conversation

wlandau commented Feb 20, 2019 • edited

Summary

Related GitHub issues and pull requests

Checklist

codecov-io commented Feb 20, 2019 • edited

Codecov Report

wlandau commented Feb 20, 2019 • edited

wlandau commented Feb 20, 2019 • edited

wlandau commented Feb 20, 2019

wlandau commented Feb 22, 2019 • edited

wlandau commented Feb 22, 2019

wlandau commented Feb 22, 2019

wlandau commented Mar 19, 2019 • edited

wlandau commented Mar 19, 2019

wlandau commented Mar 19, 2019

wlandau commented Feb 20, 2019 •

edited

codecov-io commented Feb 20, 2019 •

edited

wlandau commented Feb 20, 2019 •

edited

wlandau commented Feb 20, 2019 •

edited

wlandau commented Feb 22, 2019 •

edited

wlandau commented Mar 19, 2019 •

edited