Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pretty printing of plans #489

Closed
wlandau opened this issue Aug 2, 2018 · 30 comments
Closed

pretty printing of plans #489

wlandau opened this issue Aug 2, 2018 · 30 comments

Comments

@wlandau
Copy link
Member

wlandau commented Aug 2, 2018

It is difficult to visually inspect long commands in workflow plan data frames. Example: #257 (comment).

library(drake)
pkgconfig::set_config("drake::strings_in_dots" = "literals")
drake_plan(
  u_auckland = make_place(
    Name = "University of Auckland",
    Latitude = -36.8521369,
    Longitude = 174.7688785
  ),
  shapefile = {
    file_out("u-auckland.prj", "u-auckland.shx", "u-auckland.dbf")
    st_write(
      obj = u_auckland,
      dsn = file_out("u-auckland.shp"),
      driver = "ESRI Shapefile",
      delete_dsn = TRUE
    )
  }
)
#> # A tibble: 2 x 2
#>   target     command                                                      
#> * <chr>      <chr>                                                        
#> 1 u_auckland "make_place(Name = \"University of Auckland\", Latitude = -3…
#> 2 shapefile  "{\n    file_out(\"u-auckland.prj\", \"u-auckland.shx\", \"u…

Desired output:

# A tibble: 2 x 2
  target     command                                             
* <chr>      <drake_cmd>
1 u_auckland make_place(
               Name = "University of Auckland",
               Latitude = -36.8521369,
               Longitude = 174.7688785
             )
2 shapefile  {
               file_out(
                 "u-auckland.prj",
                 "u-auckland.shx",
                 "u-auckland.dbf"
               )
               st_write(
                 obj = u_auckland,
                 dsn = file_out("u-auckland.shp"),
                 driver = "ESRI Shapefile",
                 delete_dsn = TRUE
               )
             }

I am not sure if this is possible, but I think it may be worth trying some combination of pillar::pillar_shaft(), styler, highlight, and crayon.

@connoralbright
Copy link

This would be a very, very nice feature.

@krlmlr
Copy link
Collaborator

krlmlr commented Aug 2, 2018

How about a pretty-print method that prints/returns a drake_plan() call, with indentation?

@wlandau
Copy link
Member Author

wlandau commented Aug 3, 2018

Brilliant! Much easier and just as good.

@wlandau
Copy link
Member Author

wlandau commented Aug 3, 2018

We will just have to make it really clear that the plan is just a data frame and that people do not always need to use drake_plan(). Ref: #451 (comment). Maybe a comment would be enough.

> print(plan)
data frame generated by drake_plan(
  u_auckland = make_place(
                 Name = "University of Auckland",
                 Latitude = -36.8521369,
                 Longitude = 174.7688785
               )
  shapefile = {
                file_out(
                  "u-auckland.prj",
                  "u-auckland.shx",
                  "u-auckland.dbf"
                )
                st_write(
                  obj = u_auckland,
                  dsn = file_out("u-auckland.shp"),
                  driver = "ESRI Shapefile",
                  delete_dsn = TRUE
                )
              }
)

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

We should define a special S3 class and print method for printing, and I am having some initial trouble. The guide on extending tibble helped me get started, but I still do not know how to make sure an S3 "drake_plan" class gets preserved during dplyr operations.

as_drake_plan <- function(x, ...){
  class(x) <- c("drake_plan", "tbl_df", "tbl", "data.frame")
  x
}

drake_plan.as.data.frame <- as_drake_plan
drake_plan.as_data_frame <- as_drake_plan
drake_plan.as_tibble <- as_drake_plan
drake_plan.as.tibble <- as_drake_plan

`[.drake_plan` <- function(...){
  as_drake_plan(NextMethod())
}

library(dplyr)
library(drake)
plan <- drake_plan(
  x = get_data(),
  y = analyze_data(x)
) %>%
  as_drake_plan() %>%
  print
#> # A tibble: 2 x 2
#>   target command        
#> * <chr>  <chr>          
#> 1 x      get_data()     
#> 2 y      analyze_data(x)
class(plan)
#> [1] "drake_plan" "tbl_df"     "tbl"        "data.frame"
filter(plan, target == "x") %>%
  class()
#> [1] "tbl_df"     "tbl"        "data.frame"

Related Stack Overflow post here.

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

Just started to construct the call in the drake_plan_call branch.

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

We now have an internal drake_plan_call() function to generate calls to drake_plan() from actual plans. Ref: #493.

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

Next steps:

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

Update: from #494, we now have a special "drake_plan" S3 class for plans. dplyr functions still seem to drop the class, but other than that, we have the ground work now. A new print.drake_plan() method will come eventually. I am not exactly sure on the best style, but I am convinced that it should be some cleverly indented and colored version of a call to drake_plan().

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

Continued work on this will be in the files R/drake_plan_class.R and R/drake_plan_print.R.

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

So apparently, dropping tibble subclasses is already a known dplyr issue: tidyverse/dplyr#2532. Makes me think we should just roll with it and expose the as_drake_plan() method.

@wlandau
Copy link
Member Author

wlandau commented Aug 5, 2018

@lorenzwalthert, I would like to include you in this conversation too. This feature is likely to require a custom styler style guide for calls to drake_plan().

@lorenzwalthert
Copy link
Contributor

lorenzwalthert commented Aug 5, 2018

Thanks. I think the fact that attributes and sub classes are generally not preserved in R are a pitty and limit the usefulness of this mission slightly. Anyways, as_drake_plan() seems a way out and I am happy to provide guidance regarding styling. Just let me know what the requirements are.

@wlandau
Copy link
Member Author

wlandau commented Aug 6, 2018

Yeah, not the best circumstances, but I still think this feature is worth exploring.

I laid the groundwork for R/drake_plan_class.R and R/drake_plan_print.R in #493 and #494. Beyond that, I think we're in a brainstorming phase (which is the most fun part). So I think it is a bit early to have all the requirements. But I do think our raw drake_plan() call should be styled.

drake::load_mtcars_example()
my_plan$trigger <- NA
my_plan$trigger[4] <- "trigger(condition = is_tuesday(), file = FALSE)"
my_plan$non_standard_column <- 1234
# Long commands are often truncated this way:
print(my_plan)
#> # A tibble: 15 x 4
#>    target      command                  trigger           non_standard_co…
#>    <chr>       <chr>                    <chr>                        <dbl>
#>  1 report      "knit(knitr_in(\"report… <NA>                          1234
#>  2 small       simulate(48)             <NA>                          1234
#>  3 large       simulate(64)             <NA>                          1234
#>  4 regression… reg1(small)              trigger(conditio…             1234
#>  5 regression… reg1(large)              <NA>                          1234
#>  6 regression… reg2(small)              <NA>                          1234
#>  7 regression… reg2(large)              <NA>                          1234
#>  8 summ_regre… suppressWarnings(summar… <NA>                          1234
#>  9 summ_regre… suppressWarnings(summar… <NA>                          1234
#> 10 summ_regre… suppressWarnings(summar… <NA>                          1234
#> 11 summ_regre… suppressWarnings(summar… <NA>                          1234
#> 12 coef_regre… suppressWarnings(summar… <NA>                          1234
#> 13 coef_regre… suppressWarnings(summar… <NA>                          1234
#> 14 coef_regre… suppressWarnings(summar… <NA>                          1234
#> 15 coef_regre… suppressWarnings(summar… <NA>                          1234
# So we might style a drake_plan() call that could generate the plan.
call <- drake:::drake_plan_call(my_plan)
str(call)
#>  language drake_plan(report = knit(knitr_in("report.Rmd"), file_out("report.md"),      quiet = TRUE), small = simulate(48),| __truncated__ ...
# But the default print method does not guide intuition.
print(call)
#> drake_plan(report = knit(knitr_in("report.Rmd"), file_out("report.md"), 
#>     quiet = TRUE), small = simulate(48), large = simulate(64), 
#>     regression1_small = target(command = reg1(small), trigger = trigger(condition = is_tuesday(), 
#>         file = FALSE)), regression1_large = reg1(large), regression2_small = reg2(small), 
#>     regression2_large = reg2(large), summ_regression1_small = suppressWarnings(summary(regression1_small$residuals)), 
#>     summ_regression1_large = suppressWarnings(summary(regression1_large$residuals)), 
#>     summ_regression2_small = suppressWarnings(summary(regression2_small$residuals)), 
#>     summ_regression2_large = suppressWarnings(summary(regression2_large$residuals)), 
#>     coef_regression1_small = suppressWarnings(summary(regression1_small))$coefficients, 
#>     coef_regression1_large = suppressWarnings(summary(regression1_large))$coefficients, 
#>     coef_regression2_small = suppressWarnings(summary(regression2_small))$coefficients, 
#>     coef_regression2_large = suppressWarnings(summary(regression2_large))$coefficients)

My main preference is to arrange this output in a 2-column grid with each target on its own set of rows. drake plans are just data frames, and I think it is important to preserve the overall look and feel of a data frame/tibble. Beyond that, we can experiment with per-column widths, the overall width, max lines per target, max lines overall, and syntax highlighting.

Thank you for offering to provide guidance. It will really help to draw on your aesthetic opinions and styler expertise.

@wlandau
Copy link
Member Author

wlandau commented Aug 18, 2018

After letting this issue sit for a few days, I no longer think the drake plan itself should have special printing. It is so crucial to emphasize that the plan is just a data frame, and printing out a function call by default could make the plan too mysterious. But what if we expose drake_plan_call() and add a nice print method for the output?

Most of the remaining work for that is to create a custom style guide. Relevant: https://lorenzwalthert.netlify.com/posts/customizing-styler-the-quick-way/

@wlandau
Copy link
Member Author

wlandau commented Aug 18, 2018

I think the syntax highlighting piece may be better suited to styler itself. Ref: r-lib/styler#417.

@lorenzwalthert
Copy link
Contributor

To be honest, I think you don't even need styler for the highlighting only. Can we just use cat(prettycode::highlight(...), sep = "\n")?

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

Sure, seems to work fine in my own console.

@lorenzwalthert
Copy link
Contributor

Ok, cool. I mean you can still use styler to prettify your code beforehand according to a (custom) style guide.

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

Absolutely. I am in the beginnings of constructing a custom style guide to manage line breaks and indentation. The desired output is a drake_plan() call as formatted below.

pkgconfig::set_config("drake::strings_in_dots" = "literals")
plan <- drake::drake_plan(
  small_data = download_data("https://some_website.com") %>%
    select_my_columns() %>%
    munge(),
  large_data_raw = target(
    command = download_data("https://lots_of_data.com") %>%
      select_top_columns,
    trigger = trigger(
      change = time_last_modified("https://lots_of_data.com"),
      command = FALSE,
      depend = FALSE
    ),
    timeout = 1e3
  )
)

Text to begin with:

call <- drake:::drake_plan_call(plan)
text <- rlang::expr_text(call, width = 70)
text
#> [1] "drake_plan(small_data = download_data(\"https://some_website.com\") %>% select_my_columns() %>% \n    munge(), large_data_raw = target(command = download_data(\"https://lots_of_data.com\") %>% \n    select_top_columns, trigger = trigger(change = time_last_modified(\"https://lots_of_data.com\"), \n    command = FALSE, depend = FALSE), timeout = 1000))"

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

I'm a little unclear about the role of pd_flat$newlines vs pd_flat$lag_newlines. Would you elaborate? Here is what I am seeing. Using pd_flat$newlines, no new line breaks are added.

add_line_breaks <- function(pd_flat){
  pd_flat$newlines <- pd_flat$newlines + (pd_flat$token == "SYMBOL_SUB")
  pd_flat
}
guide <- function(){
  styler::create_style_guide(line_break = tibble::lst(add_line_breaks))
}
styled <- styler::style_text(text, style = guide)
cat(prettycode::highlight(styled), sep = "\n")
#> drake_plan(small_data = download_data("https://some_website.com") %>% select_my_columns() %>%
#> munge(), large_data_raw = target(command = download_data("https://lots_of_data.com") %>%
#> select_top_columns, trigger = trigger(change = time_last_modified("https://lots_of_data.com"),
#> command = FALSE, depend = FALSE), timeout = 1000))

And with pd_flat$lag_newlines:

add_line_breaks <- function(pd_flat){
  pd_flat$lag_newlines <- pd_flat$lag_newlines | (pd_flat$token == "SYMBOL_SUB")
  pd_flat
}
guide <- function(){
  styler::create_style_guide(line_break = tibble::lst(add_line_breaks))
}
styled <- styler::style_text(text, style = guide)
cat(prettycode::highlight(styled), sep = "\n")
#> drake_plan(
#> small_data = download_data("https://some_website.com") %>% select_my_columns() %>%
#> munge(),
#> large_data_raw = target(
#> command = download_data("https://lots_of_data.com") %>%
#> select_top_columns,
#> trigger = trigger(
#> change = time_last_modified("https://lots_of_data.com"),
#> command = FALSE,
#> depend = FALSE),
#> timeout = 1000))

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

I think it is appropriate to have:

  • A new line and a 2-space indent for each target name.
  • A new line and a 4-space indent for each argument to target() (only appears at the target name level).
  • A new line and a 6-space indent for each argument to trigger() (only appears as the trigger argument to target()).
  • Line breaks and appropriate indentation for other code long enough to wrap (for commands and custom triggers).

Any suggestions on wrangling pd_flat to accomplish this?

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

I figured out a solution: at the top level, recurse the AST of a drake_plan() call to make sure we have the proper line breaks and indentation. Then, apply styler::style_text() to the leaf nodes. The result is quite nice, and we have syntax highlighting with prettycode::highlight() in the console.

library(drake)
plan <- drake_plan(
  small_data = download_data("https://some_website.com") %>%
    select_my_columns() %>%
    munge(),
  large_data_raw = target(
    command = download_data("https://lots_of_data.com") %>%
      select_top_columns,
    trigger = trigger(
      change = time_last_modified("https://lots_of_data.com"),
      command = FALSE,
      depend = FALSE
    ),
    timeout = 1e3
  ),
  strings_in_dots = "literals"
)
print(plan)
#> # A tibble: 2 x 4
#>   target    command                     trigger                    timeout
#> * <chr>     <chr>                       <chr>                        <dbl>
#> 1 small_da… "download_data(\"https://s… <NA>                            NA
#> 2 large_da… "download_data(\"https://l… "trigger(change = time_la…    1000
drake_plan_source(plan)
#> drake_plan(
#>   small_data = download_data("https://some_website.com") %>%
#>     select_my_columns() %>%
#>     munge(),
#>   large_data_raw =target(
#>     command = download_data("https://lots_of_data.com") %>% select_top_columns(),
#>     trigger =trigger(
#>       change = time_last_modified("https://lots_of_data.com"),
#>       command = FALSE,
#>       depend = FALSE
#>     ),
#>     timeout = 1000
#>   ),
#>   strings_in_dots = "literals"
#> )

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

I do like the simplicity of deferring to styler's defaults, but the code for the recursion is longer and more complicated than I would like. A custom style guide would make it more concise.

@lorenzwalthert
Copy link
Contributor

This indeed seems complicated. What are the requirements for the styling? E.g. line break after drake_plan( and after every target code?

@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

That's part of it. I would prefer each of the following to start on its own line:

  • Every target (argument to drake_plan())
  • Every argument to target()
  • Every argument to trigger()
  • Every piece of code following %>%
  • Every closing parenthesis.

This, plus the indentation above, are what I am after. I take it this is difficult to do in styler? Do you think a custom style guide would require more code than the existing solution?

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 7, 2018

FYI: The "vertical" class in pillar is being renamed to "pillar_vertical". Seems safer to use the "glue" class for this purpose, or actually any other class that just prints the contents of the object via cat() .

wlandau pushed a commit that referenced this issue Nov 9, 2018
@wlandau
Copy link
Member Author

wlandau commented Nov 9, 2018

Good point. I forgot I already defined a "drake_plan_source" class and a special print method, so we actually do not need to borrow an S3 class from another package.

@lorenzwalthert
Copy link
Contributor

This, plus the indentation above, are what I am after. I take it this is difficult to do in styler? Do you think a custom style guide would require more code than the existing solution?

The problem with a style guide is that all rules are not exported from styler, so you needed :::, which gives R CMD Check notes, or implement the style guide in styler for which you can't control release schedules and other things and it adds overhead, so it's a bad dependency I guess. Plus we have not even decided on how to handle third-party style guides. So until we expose rules for styling in stylerinfra or similar, I think you are better served with the current solution.

@wlandau
Copy link
Member Author

wlandau commented May 6, 2019

Update: the current behavior is to create a special expr_list class for the command, trigger, and transform columns of the plan. Using S3 magic, these columns get deparsed to characters when the plan gets printed, and an <expr> label appears at the top of the tibble.

library(drake)
drake_plan(
  data = get_data(),
  y = analyze_data(data)
)
#> # A tibble: 2 x 2
#>   target command           
#>   <chr>  <expr>            
#> 1 data   get_data()        
#> 2 y      analyze_data(data)

Created on 2019-05-06 by the reprex package (v0.2.1)

Drawbacks:

  1. This is a lot of bookkeeping for an feature as minor as pretty printing.
  2. The entire column is deparsed when the tibble gets printed, which does not scale well. Printing should be instantaneous, we have a low tolerance for delays.

So I would rather rely on r-lib/pillar#34 and r-lib/pillar#153. But because pillar's enhanced printing relies on vctrs, there is a snag: r-lib/pillar#34 (comment). Not sure yet if it totally solves (2), but I think has potential to solve (1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants