Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass parameters to evaluate_plan through a grid, rather than a series of vectors #235

Closed
AlexAxthelm opened this issue Feb 5, 2018 · 19 comments

Comments

@AlexAxthelm
Copy link
Collaborator

An issue that I keep running into with evaluate_plan() is that setting up incomplete multiples is kind of a pain. As an example, If I have three schools that I want to run an analysis on, I might have something along the lines

hard_plan <- drake_plan(
  credits = check_credit_hours(school__),
  students = check_students(school__),
  grads = check_graduations(school__),
  public_funds = check_public_funding(school__)
)

evaluate_plan(
  hard_plan, 
  rules = list(school__ = c("schoolA", "schoolB", "schoolC"))
) 
                 target                       command
1       credits_schoolA        check_credits(schoolA)
2       credits_schoolB        check_credits(schoolB)
3       credits_schoolC        check_credits(schoolC)
4      students_schoolA       check_students(schoolA)
5      students_schoolB       check_students(schoolB)
6      students_schoolC       check_students(schoolC)
7         grads_schoolA    check_graduations(schoolA)
8         grads_schoolB    check_graduations(schoolB)
9         grads_schoolC    check_graduations(schoolC)
10 public_funds_schoolA check_public_funding(schoolA)
11 public_funds_schoolB check_public_funding(schoolB)
12 public_funds_schoolC check_public_funding(schoolC) ## This will throw an error

Except schoolC will throw an error on check_public_funds because they don't receive any. So at this point, I have a few options:

  • make 2 drake_plans, one for schoolC, and then another for everybody else. 🙅‍♂️
  • use evaluate_plan, as above, and then use something like dplyr::filter to prune away everything I dion't want. Works okay for small numbers of exceptions, but doesn't scale well.
  • Pass a second school_type__ argument to each of my functions, which returns a NULL when appropriate. Not a perfect solution, but it (ideally) makes the drake plan easy to maintain, and if I put the return(NULL) early in the function, it's not a huge time sink overall.

But, there isn't a great way to pass those arguments such that they match:

very_wrong <- evaluate_plan(
  better_plan,
  rules = list(
    school__ = c("schoolA", "schoolB", "schoolC"),
    school_type__ = c("public", "public", "nonpublic")
    ),
  expand = TRUE
)
print(very_wrong) # this makes each school both a public and a nonpublic, and tries to evaluate both. I could use filter, but again, won't scale well. Further, makes duplicate targets

also_wrong <- evaluate_plan(
  better_plan,
  rules = list(
    school__ = c("schoolA", "schoolB", "schoolC"),
    school_type__ = c("public", "public", "nonpublic")
  ),
  expand = FALSE
)
print(also_wrong) #This correctly matches schools and school_types, but doesn't work to actually *expand* the plan. 

Ideally I would have something like this:

matched_rules = tibble::tibble( #could also define a tribble
  school__ = c("schoolA", "schoolB", "schoolC"),
  school_type__ = c("public", "public", "nonpublic")
)

working_master_plan <- evaluate_plan(
  better_plan,
  rules = matched_rules,
  expand = TRUE
)

Currently, this evaluates to the same as very_wrong above. I'm not sure if the best option here would be to change default behaviors for rectangular objects passed to rules, or maybe add a matched_arguments flag in evaluate_plan, so that it can understand that not all expansions go with each other. Also, maybe I'm on a weird edge case, and a clarification on best practices around evaluate_plan would be helpful?

I think this is relevant for #228 and #233.

@wlandau
Copy link
Member

wlandau commented Feb 5, 2018

I see the general picture of what you're saying, and I'm trying to wrap my head around how we would solve it. It sounds like you want one wildcard for the expansion and the others to go along for the ride. How close is this to what you're after:

library(magrittr)
drake_plan(
  credits = check_credit_hours("school_", "funding_"),
  students = check_students("school_", "funding_"),
  grads = check_graduations("school_", "funding_"),
  public_funds = check_public_funding("school_", "funding_"),
  strings_in_dots = "literals"
) %>% evaluate_plan(
    wildcard = "school_",
    values = c("schoolA", "schoolB", "schoolC"),
    expand = TRUE
  ) %>%
  evaluate_plan(
    wildcard = "funding_",
    values = c("public", "public", "private"),
    expand = FALSE
  )

#>                  target                                    command
#> 1       credits_schoolA    check_credit_hours("schoolA", "public")
#> 2       credits_schoolB    check_credit_hours("schoolB", "public")
#> 3       credits_schoolC   check_credit_hours("schoolC", "private")
#> 4      students_schoolA        check_students("schoolA", "public")
#> 5      students_schoolB        check_students("schoolB", "public")
#> 6      students_schoolC       check_students("schoolC", "private")
#> 7         grads_schoolA     check_graduations("schoolA", "public")
#> 8         grads_schoolB     check_graduations("schoolB", "public")
#> 9         grads_schoolC    check_graduations("schoolC", "private")
#> 10 public_funds_schoolA  check_public_funding("schoolA", "public")
#> 11 public_funds_schoolB  check_public_funding("schoolB", "public")
#> 12 public_funds_schoolC check_public_funding("schoolC", "private")

@AlexAxthelm
Copy link
Collaborator Author

This is perfect. This works well for a simple, 1 to 1 matchup between targets, like above, and more complicated many to 1 matchps can be resolved using just the same pair of evaluate_plans , replacing values, with rules:

rules_grid <- tibble(
  school_ =  c("schoolA", "schoolB", "schoolC"),
  funding_ = c("public", "public", "private"),
) %>% 
crossing(cohort_ = c("2012", "2013", "2014", "2015")) %>%
filter(!(school_ == "schoolB" & cohort_ %in% c("2012", "2013"))) %>%
print()


drake_plan(
  credits = check_credit_hours("school_", "funding_", "cohort_"),
  students = check_students("school_", "funding_", "cohort_"),
  grads = check_graduations("school_", "funding_", "cohort_"),
  public_funds = check_public_funding("school_", "funding_", "cohort_"),
  strings_in_dots = "literals"
) %>% evaluate_plan(
    wildcard = "school_",
    values = rules_grid$school_,
    expand = TRUE
  ) %>%
  evaluate_plan(
    wildcard = "funding_",
    rules = rules_grid,
    expand = FALSE
  )

In the example above, I have schoolB reporting data for only a subset of the years, but filting the missing years out, or constructing the rules_grid in some other way, lets me build this however I need.

Thanks! 👍

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 5, 2018

Do we have a "usage patterns" vignette or section where we could document this?

@wlandau
Copy link
Member

wlandau commented Feb 5, 2018

I think the best practices vignette is the right place. Reopening because it's now a documentation issue.

@wlandau
Copy link
Member

wlandau commented Feb 11, 2018

Thanks again @AlexAxthelm! Your example is great, and I have appended a section in the best practices vignette.

@jw5
Copy link

jw5 commented May 27, 2018

Unfortunately, the plan generated here and documented in best practices is not a valid Drake plan as it contains duplicate target names. I took a stab at a version with unique names (appended year), but I'm not happy with the solution:

# Possible solution: #235 Modifed to generate unique targets as required.

rules_grid <- tibble(
  # The schools and their funding types.
  # Note that this solution does not handle the case of a school switching type!
  school_ =  c("schoolA", "schoolB", "schoolC"),
  funding_ = c("public", "public", "private"),
) %>%
  # Generate the full cross product of (school,funding)x(years)
  crossing(cohort_ = c("2012", "2013", "2014", "2015")) %>%
  # Remove the two years school B didn't exist.
  filter(!(school_ == "schoolB" & cohort_ %in% c("2012", "2013"))) %>%
  # Confirm the correct plan template.
  print()

plan <- drake_plan(
  # Start with the four types of checks to perform
  credits = check_credit_hours("school_", "funding_", "cohort_"),
  students = check_students("school_", "funding_", "cohort_"),
  grads = check_graduations("school_", "funding_", "cohort_"),
  public_funds = check_public_funding("school_", "funding_", "cohort_"),
  strings_in_dots = "literals"
) %>% expand_plan(
  # Use a forced expansion with a target suffix defined by school_year.
  # I don't really like this solution but I couldn't think of a better one :-(
  # Note that this duplicates each target 10 times for a total of 40.
  # However, no parameter substitution is done, that is fixed in the next step.
  values = paste(rules_grid$school_, rules_grid$cohort_, sep = "_")
) %>% evaluate_plan(
  # Finally, substitute the correct parameter values into the commands.
  # Note that since each target is duplicated 10 times, they each get a full
  # complement of parameter values which are used repeatedly a total of 4 times.
  rules = rules_grid,
  expand = FALSE
)
print(plan, n = 40)

# Confirm depenencies and parameter mappings.
config <- drake_config(plan)
vis_drake_graph(config)

@jw5
Copy link

jw5 commented May 27, 2018

Here is an updated solution that deals with avoiding applying public only functions on private schools and allows for schools to switch from public to private at any time.

# Possible solution: #235 Modifed to generate unique targets as required.
# Version two: deal with avoiding public checks on private schools.
# Note that this solution can now handle the case of a school switching type.

rules_grid <- tibble(
  # The schools and their funding types.
  school_ =  c("schoolA", "schoolB", "schoolC"),
  funding_ = c("public", "public", "private"),
) %>%
  # Generate the full cross product of (school,funding)x(years)
  crossing(cohort_ = c("2012", "2013", "2014", "2015")) %>%
  # Remove the two years school B didn't exist.
  filter(!(school_ == "schoolB" & cohort_ %in% c("2012", "2013")))
# Make schoolC switch funding each year
rules_grid$funding_[rules_grid$school_ == "schoolC"] <-
  c("public", "private", "public", "private")
# Confirm the correct plan template.
print(rules_grid)

plan_both <- drake_plan(
  # Start with the three universal types of checks to perform (public or private)
  credits = check_credit_hours("school_", "funding_", "cohort_"),
  students = check_students("school_", "funding_", "cohort_"),
  grads = check_graduations("school_", "funding_", "cohort_"),
  # Leave this for later.
  #public_funds = check_public_funding("school_", "funding_", "cohort_"),
  strings_in_dots = "literals"
) %>% expand_plan(
  # Use a forced expansion with a target suffix defined by school_year.
  # I don't really like this solution but I couldn't think of a better one :-(
  # Note that this duplicates each target 10 times for a total of 30.
  # However, no parameter substitution is done, that is fixed in the next step.
  values = paste(rules_grid$school_, rules_grid$cohort_, sep = "_")
) %>% evaluate_plan(
  # Finally, substitute the correct parameter values into the commands.
  # Note that since each target is duplicated 10 times, they each get a full
  # complement of parameter values which are used repeatedly a total of 3 times.
  rules = rules_grid,
  expand = FALSE
)
print(plan_both, n = 30)

# Next get the rules for just the public schools. Note that a school could change
# from public to private or vis-versa in any year and this still works.
public_rules_grid <- rules_grid %>% filter(funding_ == "public")
print(public_rules_grid)

# Build the public only plans
plan_public <- drake_plan(
  # Include the public only checks that shouldn't be run on private schools.
  public_funds = check_public_funding("school_", "funding_", "cohort_"),
  strings_in_dots = "literals"
) %>% expand_plan(
  # Use a forced expansion with a target suffix defined by school_year.
  # I don't really like this solution but I couldn't think of a better one :-(
  # Note that this duplicates each target 10 times for a total of 10.
  # However, no parameter substitution is done, that is fixed in the next step.
  values = paste(public_rules_grid$school_, public_rules_grid$cohort_, sep = "_")
) %>% evaluate_plan(
  # Finally, substitute the correct parameter values into the commands.
  # Note that since each target is duplicated 10 times, they each get a full
  # complement of parameter values which are used repeatedly a total of 1 times.
  rules = public_rules_grid,
  expand = FALSE
)
print(plan_public, n = 8)

# Combine the both and public only plans together
plan <- bind_plans(plan_both, plan_public)
# Note that no check_public_funding is ever performed on schoolC in odd years.

# Confirm depenencies and parameter mappings.
config <- drake_config(plan)
vis_drake_graph(config)

@wlandau
Copy link
Member

wlandau commented May 27, 2018

@jw5, glad you're helping us with slick ways to generate plans.

Unfortunately, the plan generated here and documented in best practices is not a valid Drake plan as it contains duplicate target names.

Are you talking about the plan at the end of this section? Because there, I think we're fine. Here's a reprex.

library(drake)
library(tidyverse)
#> ── Attaching packages ───────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.5
#> ✔ tidyr   0.8.1     ✔ stringr 1.3.1
#> ✔ readr   1.1.1     ✔ forcats 0.3.0
#> ── Conflicts ──────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ tidyr::expand() masks drake::expand()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tidyr::gather() masks drake::gather()
#> ✖ dplyr::lag()    masks stats::lag()

# Generate the plan from the end of
# https://ropensci.github.io/drake/articles/best-practices.html#generating-workflow-plan-data-frames

rules_grid <- tibble::tibble(school_ = c("schoolA", "schoolB", "schoolC"), funding_ = c("public", 
  "public", "private"), ) %>% tidyr::crossing(cohort_ = c("2012", "2013", 
  "2014", "2015")) %>% dplyr::filter(!(school_ == "schoolB" & cohort_ %in% 
  c("2012", "2013"))) %>% print()
#> # A tibble: 10 x 3
#>    school_ funding_ cohort_
#>    <chr>   <chr>    <chr>  
#>  1 schoolA public   2012   
#>  2 schoolA public   2013   
#>  3 schoolA public   2014   
#>  4 schoolA public   2015   
#>  5 schoolB public   2014   
#>  6 schoolB public   2015   
#>  7 schoolC private  2012   
#>  8 schoolC private  2013   
#>  9 schoolC private  2014   
#> 10 schoolC private  2015

plan <- drake_plan(credits = check_credit_hours("school_", "funding_", "cohort_"), 
  students = check_students("school_", "funding_", "cohort_"), grads = check_graduations("school_", 
    "funding_", "cohort_"), public_funds = check_public_funding("school_", 
    "funding_", "cohort_"), strings_in_dots = "literals") %>% evaluate_plan(wildcard = "school_", 
  values = rules_grid$school_, expand = TRUE) %>% evaluate_plan(wildcard = "funding_", 
  rules = rules_grid, expand = FALSE)

# Do we have duplicate targets?
any(duplicated(plan$target))
#> [1] FALSE

@jw5
Copy link

jw5 commented May 28, 2018 via email

@jw5
Copy link

jw5 commented May 28, 2018 via email

@jw5
Copy link

jw5 commented May 28, 2018

Sorry, responded by email and now no way to fix up the formatting.

Bottom line, your best practices solution does pass the no-dups test, but yields:

# A tibble: 12 x 2
   target               command                                                   
   <chr>                <chr>                                                     
 1 credits_schoolA      "check_credit_hours(\"schoolA\", \"public\", \"2012\")"   
 2 credits_schoolB      "check_credit_hours(\"schoolB\", \"public\", \"2013\")"   
 3 credits_schoolC      "check_credit_hours(\"schoolC\", \"public\", \"2014\")"   
 4 students_schoolA     "check_students(\"schoolA\", \"public\", \"2015\")"       
 5 students_schoolB     "check_students(\"schoolB\", \"public\", \"2014\")"       
 6 students_schoolC     "check_students(\"schoolC\", \"public\", \"2015\")"       
 7 grads_schoolA        "check_graduations(\"schoolA\", \"private\", \"2012\")"   
 8 grads_schoolB        "check_graduations(\"schoolB\", \"private\", \"2013\")"   
 9 grads_schoolC        "check_graduations(\"schoolC\", \"private\", \"2014\")"   
10 public_funds_schoolA "check_public_funding(\"schoolA\", \"private\", \"2015\")"
11 public_funds_schoolB "check_public_funding(\"schoolB\", \"public\", \"2012\")" 
12 public_funds_schoolC "check_public_funding(\"schoolC\", \"public\", \"2013\")" 

While I believe it should yield (from my original solution proposal):

# A tibble: 40 x 2
   target                    command                                                  
   <chr>                     <chr>                                                    
 1 credits_schoolA_2012      "check_credit_hours(\"schoolA\", \"public\", \"2012\")"  
 2 credits_schoolA_2013      "check_credit_hours(\"schoolA\", \"public\", \"2013\")"  
 3 credits_schoolA_2014      "check_credit_hours(\"schoolA\", \"public\", \"2014\")"  
 4 credits_schoolA_2015      "check_credit_hours(\"schoolA\", \"public\", \"2015\")"  
 5 credits_schoolB_2014      "check_credit_hours(\"schoolB\", \"public\", \"2014\")"  
 6 credits_schoolB_2015      "check_credit_hours(\"schoolB\", \"public\", \"2015\")"  
 7 credits_schoolC_2012      "check_credit_hours(\"schoolC\", \"private\", \"2012\")" 
 8 credits_schoolC_2013      "check_credit_hours(\"schoolC\", \"private\", \"2013\")" 
 9 credits_schoolC_2014      "check_credit_hours(\"schoolC\", \"private\", \"2014\")" 
10 credits_schoolC_2015      "check_credit_hours(\"schoolC\", \"private\", \"2015\")" 
11 students_schoolA_2012     "check_students(\"schoolA\", \"public\", \"2012\")"      
12 students_schoolA_2013     "check_students(\"schoolA\", \"public\", \"2013\")"      
13 students_schoolA_2014     "check_students(\"schoolA\", \"public\", \"2014\")"      
14 students_schoolA_2015     "check_students(\"schoolA\", \"public\", \"2015\")"      
15 students_schoolB_2014     "check_students(\"schoolB\", \"public\", \"2014\")"      
16 students_schoolB_2015     "check_students(\"schoolB\", \"public\", \"2015\")"      
17 students_schoolC_2012     "check_students(\"schoolC\", \"private\", \"2012\")"     
18 students_schoolC_2013     "check_students(\"schoolC\", \"private\", \"2013\")"     
19 students_schoolC_2014     "check_students(\"schoolC\", \"private\", \"2014\")"     
20 students_schoolC_2015     "check_students(\"schoolC\", \"private\", \"2015\")"     
21 grads_schoolA_2012        "check_graduations(\"schoolA\", \"public\", \"2012\")"   
22 grads_schoolA_2013        "check_graduations(\"schoolA\", \"public\", \"2013\")"   
23 grads_schoolA_2014        "check_graduations(\"schoolA\", \"public\", \"2014\")"   
24 grads_schoolA_2015        "check_graduations(\"schoolA\", \"public\", \"2015\")"   
25 grads_schoolB_2014        "check_graduations(\"schoolB\", \"public\", \"2014\")"   
26 grads_schoolB_2015        "check_graduations(\"schoolB\", \"public\", \"2015\")"   
27 grads_schoolC_2012        "check_graduations(\"schoolC\", \"private\", \"2012\")"  
28 grads_schoolC_2013        "check_graduations(\"schoolC\", \"private\", \"2013\")"  
29 grads_schoolC_2014        "check_graduations(\"schoolC\", \"private\", \"2014\")"  
30 grads_schoolC_2015        "check_graduations(\"schoolC\", \"private\", \"2015\")"  
31 public_funds_schoolA_2012 "check_public_funding(\"schoolA\", \"public\", \"2012\")"
32 public_funds_schoolA_2013 "check_public_funding(\"schoolA\", \"public\", \"2013\")"
33 public_funds_schoolA_2014 "check_public_funding(\"schoolA\", \"public\", \"2014\")"
34 public_funds_schoolA_2015 "check_public_funding(\"schoolA\", \"public\", \"2015\")"
35 public_funds_schoolB_2014 "check_public_funding(\"schoolB\", \"public\", \"2014\")"
36 public_funds_schoolB_2015 "check_public_funding(\"schoolB\", \"public\", \"2015\")"
37 public_funds_schoolC_2012 "check_public_funding(\"schoolC\", \"private\", \"2012\"
38 public_funds_schoolC_2013 "check_public_funding(\"schoolC\", \"private\", \"2013\"
39 public_funds_schoolC_2014 "check_public_funding(\"schoolC\", \"private\", \"2014\"
40 public_funds_schoolC_2015 "check_public_funding(\"schoolC\", \"private\", \"2015\"

@wlandau
Copy link
Member

wlandau commented May 28, 2018

In that particular example, since school C does not receive public funding, we should not actually be calling check_public_funding("schoolC"). But I do see your point about expanding each row with matching wildcards over a manual grid.

By the way, I'm wrong for a different reason: the resulting data frame should be 10 rows, not 12. In 4e4cb98, which I will push soon, I patched the issue and updated the best practices vignette. The documentation website should update next time I rebuild it.

> plan <- drake_plan(
+   credits = check_credit_hours("school_", "funding_", "cohort_"),
+   students = check_students("school_", "funding_", "cohort_"),
+   grads = check_graduations("school_", "funding_", "cohort_"),
+   public_funds = check_public_funding("school_", "funding_", "cohort_"),
+   strings_in_dots = "literals"
+ )[c(rep(1, 4), rep(2, 2), rep(3, 4)), ] %>%
+   evaluate_plan(
+     rules = rules_grid,
+     expand = FALSE,
+     always_rename = TRUE
+   )
> plan
# A tibble: 10 x 2
   target   command                                                
   <chr>    <chr>                                                  
 1 credits  "check_credit_hours(\"schoolA\", \"public\", \"2012\")"
 2 credits  "check_credit_hours(\"schoolA\", \"public\", \"2013\")"
 3 credits  "check_credit_hours(\"schoolA\", \"public\", \"2014\")"
 4 credits  "check_credit_hours(\"schoolA\", \"public\", \"2015\")"
 5 students "check_students(\"schoolB\", \"public\", \"2014\")"    
 6 students "check_students(\"schoolB\", \"public\", \"2015\")"    
 7 grads    "check_graduations(\"schoolC\", \"private\", \"2012\")"
 8 grads    "check_graduations(\"schoolC\", \"private\", \"2013\")"
 9 grads    "check_graduations(\"schoolC\", \"private\", \"2014\")"
10 grads    "check_graduations(\"schoolC\", \"private\", \"2015\")"

I do want to think about better handling of custom grids and whether we should expand every matching command over the whole grid. My mind has not been on wildcards lately, though.

wlandau pushed a commit that referenced this issue May 29, 2018
@jw5
Copy link

jw5 commented May 29, 2018

I've been trying to come up with a better paradigm for the substitution rules in evaluate plan. I note that you have added a new flag "always_rename" which looks promising.

It seems like the problem is made more difficult by trying to get consistent behavior for expand=T/F. So for the moment, I'll ignore it. I'm also ignoring the wildcard/value args as they are really a subset of a single rule list and could be deprecated.

When you have multiple parameters being substituted at the same time via rule = list(), the primary distinction (in my mind) is whether you are generating all combinations of those parameters (as currently coded with expand = T), or if you are taking them verbatim as "rowwise" tuples of parameter values and always treating each row as a unit. This latter might be the more natural interpretation of rule = data.frame as rows are often seen a unique observations. While the former makes sense when the list contains vectors of different lengths.

This leads to the suggestion of enhancing the expansion option beyond just T/F. Currently false indicates no replication of targets and just round robin substitution of parameters. However, the actual substitution appears to depend on the both the original targets (counts, ordering and parameter usage) and the rules parameter counts. I'm not a fan, but this may need to be kept for backward compatibility?

With expand = T and a a set of rules the current combinatorial expansion would take place.

Finally, with expand = "rowwise", each target would get expanded with each of the parameter tuples defined in a row (no combinatorics unless you did the expansion when generating the rules using for example expand.grid). Thus if you had N targets and M rows in the rules you would always end up with exactly N*M evaluated targets.

Note that in some sense the rowwise expansion is more fundamental than the current combinatorics as the latter can easily be replicated using the former, but not vice-versa.

================================
On a separate issue, I'm still a little concerned about the duplicate target names in your results above. I'm guessing that always_rename isn't completely implemented?

In any event, it is only evaluating credits on schoolA, students on schoolB and grads on schoolC rather than each test on each school.

I would have expected converging to a solution similar to my "Version two" proposal above (but with out the SchoolC varying public/private as I added to the example code).

This would generate the following 36 target plan:

# A tibble: 36 x 2
   target                    command                                                  
   <chr>                     <chr>                                                    
 1 credits_schoolA_2012      "check_credit_hours(\"schoolA\", \"public\", \"2012\")"  
 2 credits_schoolA_2013      "check_credit_hours(\"schoolA\", \"public\", \"2013\")"  
 3 credits_schoolA_2014      "check_credit_hours(\"schoolA\", \"public\", \"2014\")"  
 4 credits_schoolA_2015      "check_credit_hours(\"schoolA\", \"public\", \"2015\")"  
 5 credits_schoolB_2014      "check_credit_hours(\"schoolB\", \"public\", \"2014\")"  
 6 credits_schoolB_2015      "check_credit_hours(\"schoolB\", \"public\", \"2015\")"  
 7 credits_schoolC_2012      "check_credit_hours(\"schoolC\", \"private\", \"2012\")" 
 8 credits_schoolC_2013      "check_credit_hours(\"schoolC\", \"private\", \"2013\")" 
 9 credits_schoolC_2014      "check_credit_hours(\"schoolC\", \"private\", \"2014\")" 
10 credits_schoolC_2015      "check_credit_hours(\"schoolC\", \"private\", \"2015\")" 
11 students_schoolA_2012     "check_students(\"schoolA\", \"public\", \"2012\")"      
12 students_schoolA_2013     "check_students(\"schoolA\", \"public\", \"2013\")"      
13 students_schoolA_2014     "check_students(\"schoolA\", \"public\", \"2014\")"      
14 students_schoolA_2015     "check_students(\"schoolA\", \"public\", \"2015\")"      
15 students_schoolB_2014     "check_students(\"schoolB\", \"public\", \"2014\")"      
16 students_schoolB_2015     "check_students(\"schoolB\", \"public\", \"2015\")"      
17 students_schoolC_2012     "check_students(\"schoolC\", \"private\", \"2012\")"     
18 students_schoolC_2013     "check_students(\"schoolC\", \"private\", \"2013\")"     
19 students_schoolC_2014     "check_students(\"schoolC\", \"private\", \"2014\")"     
20 students_schoolC_2015     "check_students(\"schoolC\", \"private\", \"2015\")"     
21 grads_schoolA_2012        "check_graduations(\"schoolA\", \"public\", \"2012\")"   
22 grads_schoolA_2013        "check_graduations(\"schoolA\", \"public\", \"2013\")"   
23 grads_schoolA_2014        "check_graduations(\"schoolA\", \"public\", \"2014\")"   
24 grads_schoolA_2015        "check_graduations(\"schoolA\", \"public\", \"2015\")"   
25 grads_schoolB_2014        "check_graduations(\"schoolB\", \"public\", \"2014\")"   
26 grads_schoolB_2015        "check_graduations(\"schoolB\", \"public\", \"2015\")"   
27 grads_schoolC_2012        "check_graduations(\"schoolC\", \"private\", \"2012\")"  
28 grads_schoolC_2013        "check_graduations(\"schoolC\", \"private\", \"2013\")"  
29 grads_schoolC_2014        "check_graduations(\"schoolC\", \"private\", \"2014\")"  
30 grads_schoolC_2015        "check_graduations(\"schoolC\", \"private\", \"2015\")"  
31 public_funds_schoolA_2012 "check_public_funding(\"schoolA\", \"public\", \"2012\")"
32 public_funds_schoolA_2013 "check_public_funding(\"schoolA\", \"public\", \"2013\")"
33 public_funds_schoolA_2014 "check_public_funding(\"schoolA\", \"public\", \"2014\")"
34 public_funds_schoolA_2015 "check_public_funding(\"schoolA\", \"public\", \"2015\")"
35 public_funds_schoolB_2014 "check_public_funding(\"schoolB\", \"public\", \"2014\")"
36 public_funds_schoolB_2015 "check_public_funding(\"schoolB\", \"public\", \"2015\")"

@wlandau
Copy link
Member

wlandau commented May 31, 2018

I did some work since the last post, and those targets are no longer duplicated. Reprex:

library(drake)
library(tidyverse)
#> ── Attaching packages ──────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.5
#> ✔ tibble  1.4.2     ✔ dplyr   0.7.5
#> ✔ tidyr   0.8.1     ✔ stringr 1.3.1
#> ✔ readr   1.1.1     ✔ forcats 0.3.0
#> ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ tidyr::expand() masks drake::expand()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tidyr::gather() masks drake::gather()
#> ✖ dplyr::lag()    masks stats::lag()
rules_grid <- tibble::tibble(school_ = c("schoolA", "schoolB", "schoolC"), funding_ = c("public", 
  "public", "private"), ) %>% tidyr::crossing(cohort_ = c("2012", "2013", 
  "2014", "2015")) %>% dplyr::filter(!(school_ == "schoolB" & cohort_ %in% 
  c("2012", "2013")))
plan <- drake_plan(credits = check_credit_hours("school_", "funding_", "cohort_"), 
  students = check_students("school_", "funding_", "cohort_"), grads = check_graduations("school_", 
    "funding_", "cohort_"), public_funds = check_public_funding("school_", 
    "funding_", "cohort_"), strings_in_dots = "literals")[c(rep(1, 4), rep(2, 
  2), rep(3, 4)), ] %>% evaluate_plan(rules = rules_grid, expand = FALSE, 
  always_rename = TRUE) %>% print
#> # A tibble: 10 x 2
#>    target                       command                                   
#>    <chr>                        <chr>                                     
#>  1 credits_schoolA_public_2012  "check_credit_hours(\"schoolA\", \"public…
#>  2 credits_schoolA_public_2013  "check_credit_hours(\"schoolA\", \"public…
#>  3 credits_schoolA_public_2014  "check_credit_hours(\"schoolA\", \"public…
#>  4 credits_schoolA_public_2015  "check_credit_hours(\"schoolA\", \"public…
#>  5 students_schoolB_public_2014 "check_students(\"schoolB\", \"public\", …
#>  6 students_schoolB_public_2015 "check_students(\"schoolB\", \"public\", …
#>  7 grads_schoolC_private_2012   "check_graduations(\"schoolC\", \"private…
#>  8 grads_schoolC_private_2013   "check_graduations(\"schoolC\", \"private…
#>  9 grads_schoolC_private_2014   "check_graduations(\"schoolC\", \"private…
#> 10 grads_schoolC_private_2015   "check_graduations(\"schoolC\", \"private…

I will need some time to think about the rest of your comments about different modes of wildcard substitution and expansion. I am planning to put this functionality in the wildcard package, which I have not updated in months.

At this point, I see wildcards as a medium-term solution. Long-term, I still prefer to move to @krlmlr's proposed DSL interface (ref: #233, #304).

@jw5
Copy link

jw5 commented Jun 7, 2018

Unfortunately, the proposed solution doesn't generate the correct answer. It pseudo-randomly combines the checks with the schools and generates only 10 results.

What it should do is combine all 4 independent checks with all specified schools and years (cohorts) and generate 40 results (if you allow check_public_funding to be invoked on schoolC, or 36 if you don't).

# A tibble: 40 x 2
   target                    command                                                  
   <chr>                     <chr>                                                    
 1 credits_schoolA_2012      "check_credit_hours(\"schoolA\", \"public\", \"2012\")"  
 2 credits_schoolA_2013      "check_credit_hours(\"schoolA\", \"public\", \"2013\")"  
 3 credits_schoolA_2014      "check_credit_hours(\"schoolA\", \"public\", \"2014\")"  
 4 credits_schoolA_2015      "check_credit_hours(\"schoolA\", \"public\", \"2015\")"  
 5 credits_schoolB_2014      "check_credit_hours(\"schoolB\", \"public\", \"2014\")"  
 6 credits_schoolB_2015      "check_credit_hours(\"schoolB\", \"public\", \"2015\")"  
 7 credits_schoolC_2012      "check_credit_hours(\"schoolC\", \"private\", \"2012\")" 
 8 credits_schoolC_2013      "check_credit_hours(\"schoolC\", \"private\", \"2013\")" 
 9 credits_schoolC_2014      "check_credit_hours(\"schoolC\", \"private\", \"2014\")" 
10 credits_schoolC_2015      "check_credit_hours(\"schoolC\", \"private\", \"2015\")" 
11 students_schoolA_2012     "check_students(\"schoolA\", \"public\", \"2012\")"      
12 students_schoolA_2013     "check_students(\"schoolA\", \"public\", \"2013\")"      
13 students_schoolA_2014     "check_students(\"schoolA\", \"public\", \"2014\")"      
14 students_schoolA_2015     "check_students(\"schoolA\", \"public\", \"2015\")"      
15 students_schoolB_2014     "check_students(\"schoolB\", \"public\", \"2014\")"      
16 students_schoolB_2015     "check_students(\"schoolB\", \"public\", \"2015\")"      
17 students_schoolC_2012     "check_students(\"schoolC\", \"private\", \"2012\")"     
18 students_schoolC_2013     "check_students(\"schoolC\", \"private\", \"2013\")"     
19 students_schoolC_2014     "check_students(\"schoolC\", \"private\", \"2014\")"     
20 students_schoolC_2015     "check_students(\"schoolC\", \"private\", \"2015\")"     
21 grads_schoolA_2012        "check_graduations(\"schoolA\", \"public\", \"2012\")"   
22 grads_schoolA_2013        "check_graduations(\"schoolA\", \"public\", \"2013\")"   
23 grads_schoolA_2014        "check_graduations(\"schoolA\", \"public\", \"2014\")"   
24 grads_schoolA_2015        "check_graduations(\"schoolA\", \"public\", \"2015\")"   
25 grads_schoolB_2014        "check_graduations(\"schoolB\", \"public\", \"2014\")"   
26 grads_schoolB_2015        "check_graduations(\"schoolB\", \"public\", \"2015\")"   
27 grads_schoolC_2012        "check_graduations(\"schoolC\", \"private\", \"2012\")"  
28 grads_schoolC_2013        "check_graduations(\"schoolC\", \"private\", \"2013\")"  
29 grads_schoolC_2014        "check_graduations(\"schoolC\", \"private\", \"2014\")"  
30 grads_schoolC_2015        "check_graduations(\"schoolC\", \"private\", \"2015\")"  
31 public_funds_schoolA_2012 "check_public_funding(\"schoolA\", \"public\", \"2012\")"
32 public_funds_schoolA_2013 "check_public_funding(\"schoolA\", \"public\", \"2013\")"
33 public_funds_schoolA_2014 "check_public_funding(\"schoolA\", \"public\", \"2014\")"
34 public_funds_schoolA_2015 "check_public_funding(\"schoolA\", \"public\", \"2015\")"
35 public_funds_schoolB_2014 "check_public_funding(\"schoolB\", \"public\", \"2014\")"
36 public_funds_schoolB_2015 "check_public_funding(\"schoolB\", \"public\", \"2015\")"
37 public_funds_schoolC_2012 "check_public_funding(\"schoolC\", \"private\", \"2012\"
38 public_funds_schoolC_2013 "check_public_funding(\"schoolC\", \"private\", \"2013\"
39 public_funds_schoolC_2014 "check_public_funding(\"schoolC\", \"private\", \"2014\"
40 public_funds_schoolC_2015 "check_public_funding(\"schoolC\", \"private\", \"2015\"

@wlandau
Copy link
Member

wlandau commented Jun 12, 2018

I think the 10-row data frame is really what we are going for here. (@AlexAxthelm, do you agree?) Setting expand = FALSE in evaluate_plan() means it will not expand out to 40 (or 36) rows. If you need more expansion, consider expand_plan(), more wildcards, tidyr::crossing(), etc.

@wlandau
Copy link
Member

wlandau commented Jun 19, 2018

https://github.com/tidyverse/glue may be a better solution to all this. Ref: #424.

@wlandau
Copy link
Member

wlandau commented Jun 19, 2018

Coming back to #235 (comment), I thought of a much better solution to the original problem: just define a special wildcard for public schools.

library(drake)
library(magrittr)
drake_plan(
  credits = check_credit_hours(all_schools__),
  students = check_students(all_schools__),
  grads = check_graduations(all_schools__),
  public_funds = check_public_funding(public_schools__)
) %>%
  evaluate_plan(
    rules = list(
      all_schools__ =  c("schoolA", "schoolB", "schoolC"),
      public_schools__ = c("schoolA", "schoolB")
    )
  )
#> # A tibble: 11 x 2
#>    target               command                      
#>    <chr>                <chr>                        
#>  1 credits_schoolA      check_credit_hours(schoolA)  
#>  2 credits_schoolB      check_credit_hours(schoolB)  
#>  3 credits_schoolC      check_credit_hours(schoolC)  
#>  4 students_schoolA     check_students(schoolA)      
#>  5 students_schoolB     check_students(schoolB)      
#>  6 students_schoolC     check_students(schoolC)      
#>  7 grads_schoolA        check_graduations(schoolA)   
#>  8 grads_schoolB        check_graduations(schoolB)   
#>  9 grads_schoolC        check_graduations(schoolC)   
#> 10 public_funds_schoolA check_public_funding(schoolA)
#> 11 public_funds_schoolB check_public_funding(schoolB)

Without that 12th row, this is the correct answer to the question posed at the top of the thread. And it only requires one call to evaluate_plan().

@wlandau
Copy link
Member

wlandau commented Oct 30, 2018

Edit: map_plan() is probably a better fit for this general situation where you want to select only certain combinations of input settings.

This was referenced Jan 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants