Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

styler executed via console and from within R does not always yield same result #187

Closed
lorenzwalthert opened this issue Sep 7, 2017 · 21 comments · Fixed by #188
Closed

Comments

@lorenzwalthert
Copy link
Collaborator

As discussed at krlmlr/dplyr@1c06efc, styler does not return the same prettified code when run from within R

styler::style_file("R/bench-compare.r")

and from the command line

R -q -e 'styler::style_file("R/bench-compare.r")'

For the majority of other files in this repo (dplyr package), the styling is identical.

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 7, 2017

I was able to reproduce the problem with a smaller code snippet

#
#
#
r <- function(y, s, g = 10) {
  b("", "")

  #
  q <- o(d(i), function(i) {
    d(op(t[[p]]), n(i = i))
  })
  f(calls) <- f(g)

  mb <- j(c(
    q(a::b), r,
    y(u = 1)
  ))
  k(b)
}

#
#

When removing any of the above tokens, the problem does not occur anymore (I tried a few). It's very strange. E.g. replacing r <- function(y, s, g = 10) with r <- function(y, s g) will give correct results.

@krlmlr
Copy link
Member

krlmlr commented Sep 7, 2017

Thanks. Can you confirm that this problem also occurs when starting R from the terminal and then running style_file()?

@lorenzwalthert
Copy link
Collaborator Author

Yes, that is the case

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 7, 2017

I think I managed to narrow down the problem further. The parse data is already different for both methods (at least the parents). The nested parse data is then different again since the nesting is done differently I guess. Looking into it further just now.

This is the flat parse data (interactive is the one run in R (which is correct), non_interactive the one that is run from the command line (which is not correct)).

> non_interactive
# A tibble: 147 x 12
   line1  col1 line2  col2    id parent       token terminal     text short token_before    token_after
   <int> <int> <int> <int> <int>  <int>       <chr>    <lgl>    <chr> <chr>        <chr>          <chr>
 1     1     1     1     1     1      0     COMMENT     TRUE        #     #                     COMMENT
 2     2     1     2     1     4      0     COMMENT     TRUE        #     #      COMMENT        COMMENT
 3     3     1     3     1     7      0     COMMENT     TRUE        #     #      COMMENT         SYMBOL
 4     4     1    18     1   227    235        expr    FALSE                        <NA>           <NA>
 5     4     1     4     1    10     12      SYMBOL     TRUE        r     r      COMMENT    LEFT_ASSIGN
 6     4     1     4     1    12    227        expr    FALSE                        <NA>           <NA>
 7     4     3     4     4    11    227 LEFT_ASSIGN     TRUE       <-    <-       SYMBOL       FUNCTION
 8     4     6    18     1   226    227        expr    FALSE                        <NA>           <NA>
 9     4     6     4    13    13    226    FUNCTION     TRUE function funct  LEFT_ASSIGN            '('
10     4    14     4    14    14    226         '('     TRUE        (     (     FUNCTION SYMBOL_FORMALS
# ... with 137 more rows
> interactive
# A tibble: 147 x 12
   line1  col1 line2  col2    id parent       token terminal     text short token_before    token_after
   <int> <int> <int> <int> <int>  <int>       <chr>    <lgl>    <chr> <chr>        <chr>          <chr>
 1     1     1     1     1     1   -227     COMMENT     TRUE        #     #                     COMMENT
 2     2     1     2     1     4   -227     COMMENT     TRUE        #     #      COMMENT        COMMENT
 3     3     1     3     1     7   -227     COMMENT     TRUE        #     #      COMMENT         SYMBOL
 4     4     1    18     1   227      0        expr    FALSE                        <NA>           <NA>
 5     4     1     4     1    10     12      SYMBOL     TRUE        r     r      COMMENT    LEFT_ASSIGN
 6     4     1     4     1    12    227        expr    FALSE                        <NA>           <NA>
 7     4     3     4     4    11    227 LEFT_ASSIGN     TRUE       <-    <-       SYMBOL       FUNCTION
 8     4     6    18     1   226    227        expr    FALSE                        <NA>           <NA>
 9     4     6     4    13    13    226    FUNCTION     TRUE function funct  LEFT_ASSIGN            '('
10     4    14     4    14    14    226         '('     TRUE        (     (     FUNCTION SYMBOL_FORMALS
# ... with 137 more rows

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 7, 2017

This is the nested parse data

> non_interactive
# A tibble: 5 x 14
  line1  col1 line2  col2    id parent   token terminal  text short token_before token_after internal
  <int> <int> <int> <int> <int>  <int>   <chr>    <lgl> <chr> <chr>        <chr>       <chr>    <lgl>
1     1     1     1     1     1      0 COMMENT     TRUE     #     #                  COMMENT     TRUE
2     2     1     2     1     4      0 COMMENT     TRUE     #     #      COMMENT     COMMENT     TRUE
3     3     1     3     1     7      0 COMMENT     TRUE     #     #      COMMENT      SYMBOL     TRUE
4    20     1    20     1   232      0 COMMENT     TRUE     #     #          '}'     COMMENT     TRUE
5    21     1    21     1   235      0 COMMENT     TRUE     #     #      COMMENT                 TRUE
# ... with 1 more variables: child <list>
> interactive
# A tibble: 6 x 14
  line1  col1 line2  col2    id parent   token terminal  text short token_before token_after internal
  <int> <int> <int> <int> <int>  <int>   <chr>    <lgl> <chr> <chr>        <chr>       <chr>    <lgl>
1     1     1     1     1     1   -227 COMMENT     TRUE     #     #                  COMMENT     TRUE
2     2     1     2     1     4   -227 COMMENT     TRUE     #     #      COMMENT     COMMENT     TRUE
3     3     1     3     1     7   -227 COMMENT     TRUE     #     #      COMMENT      SYMBOL     TRUE
4     4     1    18     1   227      0    expr    FALSE                     <NA>        <NA>     TRUE
5    20     1    20     1   232      0 COMMENT     TRUE     #     #          '}'     COMMENT     TRUE
6    21     1    21     1   235      0 COMMENT     TRUE     #     #      COMMENT                 TRUE
# ... with 1 more variables: child <list>

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 7, 2017

id 227 in non_interactive has parent 235, which is a comment. I think that's the problem.

@krlmlr
Copy link
Member

krlmlr commented Sep 7, 2017

So this is triggered by --interactive? 8.5 out of 10 for weirdness...

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 7, 2017

Not really. Even with the option --interactive, I can't reproduce what I get in an interactive session. Can you double check? I think simply -> undo -> save some times does not work, at least the git tab of RStudio sometimes is not getting it. That's why I also change one of the token texts to be sure.
I really wonder why utils::getParseData() assigns different IDs and parents depending ok whether ran interactively or not.

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 7, 2017

I think one (pragmatic) solution would be to make sure tokens have a parent that is not a terminal. If it's is a terminal, we can assign the parent of the parent to the token as the parent. That's may fit into the package for handling the inconsistency in the parse data (see #100). I just tried it and it works for the case at hand. Line / col values of the parse data are corrected with the approach outlined above.
Alternatively, we may attempt to find out the cause of the problem.

@lorenzwalthert
Copy link
Collaborator Author

I created #188, please have a look. It solves the specific case introduced in this issue.

@lorenzwalthert lorenzwalthert changed the title styler executed via console and from within R does not yield same result styler executed via console and from within R does not always yield same result Sep 7, 2017
@jimhester
Copy link
Member

Pretty sure this is a bug in the R parser, it is not initialized properly the first time parse is run https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16041.

lintr works around this by always running parse twice https://github.com/jimhester/lintr/blob/aa70430282852afa15a61cc69771e71dd32e1aec/R/get_source_expressions.R#L121-L129

I am pretty sure you don't see this interactively because the parser has to run on the input expression, and this bug only happens the very first time something is parsed.

If you have a small reproducible example consider including it in the bug report, if there is independent verification maybe the root cause could be sorted out.

@lorenzwalthert
Copy link
Collaborator Author

lorenzwalthert commented Sep 8, 2017

Thanks for looking at that @jimhester. Indeed, using two identical parse() statements prevents the problem from occurring, so

R -q -e 'styler::style_file("R/debug.r")'

gives the same result as running

styler::style_file("R/debug.r")

For the problematic case introduced above.

Regarding the reprex, if I get you right, I could save the above code snippet as file_causing_error.R and show:

R -q -e 'first <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); second <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); all.equal(first, second)'
> first <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); second <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE)); all.equal(first, second)
#> [1] "Attributes: < Component “srcfile”: Component “parseData”: Mean relative difference: 3.897872 >"
#> [2] "Component “parent”: Mean relative difference: 3.897872"                                        
> 

And in particular, show the offending values of "parent".

R -e 'first <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE))$parent; second <- utils::getParseData(parse(file = "file_causing_error.R", keep.source = TRUE))$parent; data.frame(first, second)[first != second,]'
#>   first second
#> 1     0   -227
#> 2     0   -227
#> 3     0   -227
#> 4   235      0

And submit a new bug report and make reference to Bug 16041.

@jimhester
Copy link
Member

I wouldn't submit a new bug report, rather comment on the existing one, I am pretty sure it is the same bug.

@lorenzwalthert
Copy link
Collaborator Author

Ok, I will do it that way then.

@lorenzwalthert
Copy link
Collaborator Author

As far as styler goes, I suggest to adapt @jimhester's solution and parse twice.

@lorenzwalthert
Copy link
Collaborator Author

If Kirill already has access to the R bug reporting system, maybe he could add my example to Jim Hester's bug report since I just realized that I need an R Core member to manually add me to the bug reporting system in order to comment on an existing bug report.

@krlmlr
Copy link
Member

krlmlr commented Sep 15, 2017

Thanks, catching up. Rather than posting to a four-year-old ticket I'd try to fix the error in R-core and submit a PR there. IIUC, I need to call parse() on the input below, both from an interactive session and from a session started with R -q -e?

#
#
#
r <- function(y, s, g = 10) {
  b("", "")

  #
  q <- o(d(i), function(i) {
    d(op(t[[p]]), n(i = i))
  })
  f(calls) <- f(g)

  mb <- j(c(
    q(a::b), r,
    y(u = 1)
  ))
  k(b)
}

#
#

@kalibera
Copy link

I've fixed 16041 in R-devel, 75178. Could you please check if this fixes also the problem you found using styler? Thanks

lorenzwalthert added a commit to lorenzwalthert/styler that referenced this issue Aug 24, 2018
@lorenzwalthert
Copy link
Collaborator Author

Ok, cool. Let's see.

@lorenzwalthert
Copy link
Collaborator Author

After having temporarily fixed a problem with our CI tool tic, we see the expected result:

  • All releases fail (3.1-3.5).
  • Devel passes.

https://travis-ci.org/lorenzwalthert/styler/builds/420091391

What is the best practice to deal with fixed bugs in base R? Circumvent them explicitly only depending on the R version? Or any time?

@kalibera
Copy link

Thanks, I am happy to hear it worked. How to deal with base R fixes, it depends. In this case I think using the workaround unnecessarily is undesirable due to performance overhead, so I think it would make sense to add a guard on specific R version. Even though not terribly pretty, you could for instead use the double parsing only when getRversion() <= "3.5.1" || R.version$`svn rev` < 75178.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants