Serialisation with indention #21

lorenzwalthert · 2017-06-12T14:39:12Z

This PR contains an implementation of functions indent_round and serialize_parse_data_nested, so that

indention information for round brackets can be added to the nested parse table (indent_round)
and this information can be used when serialising the parse table (serialize_parse_data_nested).

Further, a new utility function newlines_and_spaces is introduced to facilitate a common operation.
These functions are accompanied by tests for function calls, a vignette and further documentation.

…improves on #20

codecov · 2017-06-12T14:42:57Z

Codecov Report

Merging #21 into master will increase coverage by 19.09%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           master      #21       +/-   ##
===========================================
+ Coverage   67.87%   86.97%   +19.09%     
===========================================
  Files           7        8        +1     
  Lines         193      238       +45     
===========================================
+ Hits          131      207       +76     
+ Misses         62       31       -31

Impacted Files	Coverage Δ
R/nested.R	`100% <100%> (+100%)`	⬆️
R/parsed.R	`98.03% <100%> (-0.04%)`	⬇️
R/utils.R	`100% <100%> (ø)`	⬆️
R/modify_pd.R	`100% <100%> (ø)`
R/nested_to_tree.R	`0% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c41c50...8eee6ae. Read the comment docs.

krlmlr

Thanks. The results look great, judging by the tests. We should work a bit on the code.

krlmlr · 2017-06-12T14:42:35Z

R/nested.R

+  raw <- serialize_parse_data_nested_helper(pd_nested, pass_indent = 0) %>%
+    unlist()
+  newline <- which(raw == "\n")
+  token <- setdiff(1:length(raw), union(which(raw == ""),


seq_along() is slightly safer. Why do we need this post-processing at all?

Ok. Well I could not find a way how to add indention to the first token on a line only. Also, it seems not a big performance deal to just remove the spaces immediately before every token that are not proceeded by a new line in the serialisation. I think it's safe but not very nice.

Can you give an example that fails without postprocessing?

I think generally every multi-line input for which at least one level of indention is required and there is more than one token per line. Since the indention level is currently added before every token all but the first token end up with too many spaces before them.
Here, I drop the lines 127 to 131 in nested.R, which are

newline <- which(raw == "\n") token <- setdiff(1:length(raw), union(which(raw == ""), union(grep("^ +$", raw), newline))) to_zero <- setdiff(token - 1, newline + 1) raw[to_zero] <- ""

Using reprex , we get the following example:

library(magrittr) indented_multi_line_random <- c( " call( ", " 1,", " call2( ", " 2, 3,", "call3(1, 2, 22),", " 5", "),", " 144", ")") raw <- styler:::compute_parse_data_nested(indented_multi_line_random) %>% styler:::create_filler_nested() %>% styler:::indent_round_nested() %>% styler:::serialize_parse_data_nested_helper(pass_indent = 0) %>% unlist() out <- raw %>% paste0(collapse = "") %>% strsplit("\n", fixed = TRUE) %>% .[[1L]] out #> [1] "call(" #> [2] " 1," #> [3] " call2 (" #> [4] " 2 , 3 ," #> [5] " call3 ( 1 , 2 , 22 ) ," #> [6] " 5" #> [7] " )," #> [8] " 144" #> [9] ")" cat(out, sep = "\n") #> call( #> 1, #> call2 ( #> 2 , 3 , #> call3 ( 1 , 2 , 22 ) , #> 5 #> ), #> 144 #> )

I see a problem with the current data structure: The line break information (newline variable) is given as newline after a token, but perhaps the information that a newline appears before a token is more useful. Can you replace newline with a new variable lag_newline = lag(newline) on the flat parse data (before nesting), and work with that? I hope this will eliminate the need for postprocessing.

Ok, I tried that and it works. The question is how exactly it should be implemented. I see two possible approaches.

Use lag_newlines only (and drop newlines) and always inserting new lines and indention before a token, spaces after it.

Use lag_newlines to determine whether there is indention before a token. Continue using newlines and spaces for inserting non-indention spacing after each token.

I think for now we can stick with the former.
However, the latter might be preferable because it is more flexible in the sense that spacing information other than our computed indention can technically be stored. At the moment, we just set spacing after a token to zero when a line break comes after the token (see R/modify_pd.R, line 20). Maybe the flexibility is not needed later and it just complicates things, so I suggest to go with the former.

I agree that we should try resorting only on lag_newlines. Do we still need the postprocessing now?

No, it's not necessary anymore. In the next step, I can maybe even see if we need newlines at all, maybe we can get rid of it completely.

krlmlr · 2017-06-12T14:45:05Z

R/nested.R

+             pd_nested$spaces, pd_nested$newlines, pd_nested$indent),
+           function(terminal, text, child, spaces, newlines, indent) {
+             if (terminal) {
+               c(rep_char(" ", pass_indent), text,


Can text contain newlines?

Yes, in general I think it can, but if this if clause applies, we are in a terminal token, for which text cannot be newlines ("\n") I think. It could be a string like "'\n'" though, but then it would remain a string and not get evaluated.

I was thinking about newlines embedded in strings, like this:

"a b"

But we just cannot apply indention on these without changing their semantics. So the point is moot.

krlmlr · 2017-06-12T14:46:48Z

R/modify_pd.R

+indent_round <- function(pd, indent_by) {
+  start <- which(pd$token == "'('") + 1
+  stop <- which(pd$token == "')'") - 1
+  if (length(start) == 0 && length(stop) == 0) {


Please check logic.

I am not sure I understand what you mean. Probably it's enough to check for just one of them to be zero.

Ok, resolved that now differently anyways.

krlmlr · 2017-06-12T14:47:31Z

R/modify_pd.R

+#' Update indention information of parse data
+#'
+#' @param pd A nested or flat parse table that is already enhanced with
+#'   line break and space information via [create_filler] or


[create_filler()] gives nicer markup.

sorry for that one, should know it by now...

krlmlr · 2017-06-12T14:48:25Z

R/modify_pd.R

+
+#' @rdname update_indention
+indent_round <- function(pd, indent_by) {
+  start <- which(pd$token == "'('") + 1


Are there cases where start != 2 or stop != nrow(pd) ?

Well, do you mean stop != nrow(pd) - 1?
Probably not in the case of round brackets. However, I wanted to keep the function rather general, since later on, we probably want to use a closure that returns functions like indent_round, but also one that indents after operators such as %>%.
data_frame(x = 1, y = 2) %>%
__ do_one_thing()

Here, we want stop = nrow(pd), which is different from the round brackets, where probably always stop != nrow(pd) -1 .
But anyways, I think you are right, in the end we might just use two different closures then, i.e. one for backets and one for other operators such as %>% + etc.

The thing here is rather that you need which since it might also be the case that there are no round brackets and you want indention to be zero.

I'd like to use a different handler for other expression types, but we can think about it when we're done with the indention of function calls.

krlmlr · 2017-06-12T14:49:24Z

R/modify_pd.R

+    pd$indent <- ifelse(1:nrow(pd) %in% start[1]:stop[1], indent_by, 0) *
+      lag(pd$newlines, default = 0)
+  }
+  # general, should maybe not go here.


Agreed, we could wrap this in a function that updates the spaces attribute.

ok, will do so.

Are you going to extract a function here?

yes, I will do that separately.

krlmlr · 2017-06-12T14:51:35Z

tests/testthat/test-indetion_round_brackets.R

+  code <- "a <- xyz(x, 22, if(x > 1) 33 else 4)"
+
+  back_and_forth <- code %>%
+    styler:::compute_parse_data_nested() %>%


You can omit styler::: .

ok, will do so.

krlmlr

Thanks. Maybe our data structure needs some tweaking?

krlmlr · 2017-06-12T19:24:36Z

R/nested.R

+  raw <- serialize_parse_data_nested_helper(pd_nested, pass_indent = 0) %>%
+    unlist()
+  newline <- which(raw == "\n")
+  token <- setdiff(1:length(raw), union(which(raw == ""),


I see a problem with the current data structure: The line break information (newline variable) is given as newline after a token, but perhaps the information that a newline appears before a token is more useful. Can you replace newline with a new variable lag_newline = lag(newline) on the flat parse data (before nesting), and work with that? I hope this will eliminate the need for postprocessing.

krlmlr · 2017-06-12T19:27:14Z

R/modify_pd.R

+
+#' @rdname update_indention
+indent_round <- function(pd, indent_by) {
+  start <- which(pd$token == "'('") + 1


I'd like to use a different handler for other expression types, but we can think about it when we're done with the indention of function calls.

krlmlr

I'm not sure, have you pushed new changes?

krlmlr · 2017-06-13T20:29:03Z

R/nested.R

+  raw <- serialize_parse_data_nested_helper(pd_nested, pass_indent = 0) %>%
+    unlist()
+  newline <- which(raw == "\n")
+  token <- setdiff(1:length(raw), union(which(raw == ""),


I agree that we should try resorting only on lag_newlines. Do we still need the postprocessing now?

krlmlr · 2017-06-13T20:36:44Z

R/modify_pd.R

+    pd$indent <- ifelse(1:nrow(pd) %in% start[1]:stop[1], indent_by, 0) *
+      lag(pd$newlines, default = 0)
+  }
+  # general, should maybe not go here.


Are you going to extract a function here?

lorenzwalthert · 2017-06-13T20:45:33Z

No I have not yet pushed because things in this PR depended quite a bit on how to use lag_newlines and I was not sure which approach you prefer. Now it's clear.

* use lag_newlines instead of post-processing to handle indention properly. * use specific handler for indent_round(): (seq_along, new start / stop), remove dependency with new_lines. * utility functions add_newlines and add_spaces due to use of lag_newlines.

lorenzwalthert · 2017-06-13T21:32:59Z

Regarding EOL stripping, I am not sure whether that is the best way to do it. I mean I looked at how we can integrate these internal functions with make_transformer() and I think we can use all (flat) functions from R/rules.R without the need of writing nested versions of them (see this commit in another branch). Hence, maybe the EOL stripping should just be one of them - or maybe not, if you don't want to think of EOL stripping as it as a rule, but it can be thought of being one.

krlmlr

Thanks, looks great. Could you please look at my final comments and merge, and also create new issues for the comments not covered in this PR?

krlmlr · 2017-06-14T07:22:31Z

R/nested.R

+    unlist() %>%
+    paste0(collapse = "") %>%
+    strsplit("\n", fixed = TRUE) %>%
+    .[[1L]]


dplyr 0.7.0 has pull(), but we can address this in a separate PR

krlmlr · 2017-06-14T07:24:29Z

tests/testthat/test-indetion_round_brackets.R

+
+##  ............................................................................
+
+indented_multi_line_correct <- c(


I wonder if this is easier in example files (like in.R and out.R). Perhaps you can extend to support an arbitrary number of in.R files? Again, separate PR.

krlmlr · 2017-06-14T07:27:05Z

R/modify_pd.R

+
+#' @rdname update_indention
+indent_round <- function(pd, indent_by) {
+  if (any(pd$token == "')'")) {


Can we look only at pd$token[[2]] or pd$token[[nrow(pd)]]?

I don't think so because we have some tibbles with only one row. Looking at the last token is probably also not a good idea because we have two cases
mycall(1, 2) and (1 + x)
Which have their opening brace at the second and the first position respectively. I think that's really the reason why I implemented the approach with which in the first place: To get the start right.
I think this function should really handle both cases and it's easier to implement it in one function instead of two. Hence, I propose to go back to the which approach, at least for the start. I think the end is unambiguous.

I wasn't aware that this also covers parens when they are used to group arithmetic expressions. Would you mind adding a test?

krlmlr · 2017-06-14T07:28:02Z

R/modify_pd.R

+  } else {
+    start <- stop <- 0
+  }
+  pd$indent <- ifelse(seq_len(nrow(pd)) %in% start:stop, indent_by, 0)


This might look clearer if we move this assignment to the if branch, and remove the else branch.

Not sure what you @krlmlr mean. We need to assign a value to pd$indent anyways, since this column is used in the recursion of serialize_parse_data_nested_helper. Otherwise we have to change this function so it can handle the case for which the column does not exist, which I think is not a good idea.

Oh, I see. Let's leave it for now.

krlmlr · 2017-06-14T07:28:32Z

R/modify_pd.R

+#' @rdname update_indention
+indent_round <- function(pd, indent_by) {
+  if (any(pd$token == "')'")) {
+    start <- 2


Please check boundaries: Do we really need to indent the second token, or perhaps only starting from the third?

see above, I think we need to explicitly find the start token because of the two scenarios.

krlmlr · 2017-06-14T07:34:37Z

R/modify_pd.R

+indent_round_nested <- function(pd) {
+  if (is.null(pd)) return(pd)
+  pd <- indent_round(pd, indent_by = 2)
+  pd$child <- map(pd$child, indent_round_nested)


I'd prefer mutate() over sub-assignment here and in the functions below and above. This should work well with dplyr 0.7.0. Separate PR?

Ok. In a pipe like this?

pd <- pd %>% indent_round(pd, indent_by = 2) %>% mutate(child = map(child, indent_round_nested)

Yes. What exactly do you want to add to the style guide?

Avoiding sub assignment if possible. See here. I think this particular case is not yet covered. We won't be able to enforce that with styler though I think.

…oupings

lorenzwalthert added 8 commits June 12, 2017 15:52

add serialize_parse_data_nested incl. vignette

f8b9d76

documenting utils / add newlines_and_spaces function

53b8b45

add indention function. This closes #20

d000190

update indent_round and indent_round_nested plus documentation. This …

29f6781

…improves on #20

internal tests for function calls.

531b0a0

adapt vignette to new serialization

60af320

documenting serialize_parse_data_nested

33acf82

documenting serialization

ea2ecb2

lorenzwalthert requested a review from krlmlr June 12, 2017 14:39

krlmlr reviewed Jun 12, 2017

View reviewed changes

lorenzwalthert force-pushed the master branch from b88ab46 to 4743382 Compare June 12, 2017 18:46

improve style

aa84511

lorenzwalthert force-pushed the master branch from 4743382 to aa84511 Compare June 12, 2017 18:55

krlmlr reviewed Jun 12, 2017

View reviewed changes

lorenzwalthert requested a review from krlmlr June 13, 2017 15:55

krlmlr reviewed Jun 13, 2017

View reviewed changes

lorenzwalthert added 2 commits June 13, 2017 23:23

do EOL stripping separately

607fc23

lorenzwalthert requested a review from krlmlr June 13, 2017 21:33

krlmlr reviewed Jun 14, 2017

View reviewed changes

This was referenced Jun 14, 2017

refine testing #23

Closed

Mutate instead of sub assignment tidyverse/style#25

Closed

lorenzwalthert added 3 commits June 14, 2017 13:20

adapt indent_round so it can handle function calls and arithmetric gr…

e6a362e

…oupings

add arithmetric grouping test

9c8e348

add arithmetic grouping test

8eee6ae

lorenzwalthert merged commit b3666f0 into r-lib:master Jun 14, 2017

lorenzwalthert mentioned this pull request Jun 24, 2017

Indent curly #49

Merged


		## ............................................................................

		indented_multi_line_correct <- c(

Serialisation with indention #21

Serialisation with indention #21

Conversation

lorenzwalthert commented Jun 12, 2017

codecov bot commented Jun 12, 2017 • edited Loading

Codecov Report

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzwalthert Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzwalthert Jun 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzwalthert Jun 12, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzwalthert commented Jun 13, 2017

lorenzwalthert commented Jun 13, 2017 • edited Loading

krlmlr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzwalthert Jun 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzwalthert Jun 14, 2017 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jun 12, 2017 •

edited

Loading

lorenzwalthert Jun 13, 2017 •

edited

Loading

lorenzwalthert Jun 13, 2017 •

edited

Loading

lorenzwalthert Jun 12, 2017 •

edited

Loading

lorenzwalthert commented Jun 13, 2017 •

edited

Loading

lorenzwalthert Jun 14, 2017 •

edited

Loading

lorenzwalthert Jun 14, 2017 •

edited

Loading