Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement indentation_linter #1411

Merged
merged 36 commits into from
Oct 17, 2022
Merged

implement indentation_linter #1411

merged 36 commits into from
Oct 17, 2022

Conversation

AshesITR
Copy link
Collaborator

@AshesITR AshesITR commented Jun 18, 2022

supersedes #497
fixes #63

@AshesITR
Copy link
Collaborator Author

Putting this out here looking for feedback.
CC @dgkf

@AshesITR
Copy link
Collaborator Author

So far, every mis-indented line causes a lint.
That means if, e.g., en entire block is mis-indented, there will be tons of lints with identical messages.

  top_level_misindent <- function() {
    foo
    bar
  }

Causes four lints, one per line.

@AshesITR
Copy link
Collaborator Author

So, there were a ton of indentation lints in lintr itself.
One particular problem is the following with hanging indents:

lapply(1:10, function(x) {
  x^2
})

structure(1:10 %>%
            cumsum() %>%
            mean(), class = "meh")

Currently, the first code doesn't lint, because hanging indents are suppressed if there is a multi-line expression on the same line as the (.
This however causes the second piece of code to lint, because no hanging indent situation is detected.

Any thoughts on how hanging indents should be defined and detected and what exception there should be?

@MichaelChirico
Copy link
Collaborator

This one definitely seems susceptible to edge cases -- have you tried any .dev/compare_branches here to see what sort of patterns might be missed "in the wild"? (can only turn up false positives of course).

also, at a glance there's a lot going on & several loops -- I'm curious the performance relative to other default linters.

@AshesITR
Copy link
Collaborator Author

I've hand sifted all lints on lintr (which brought up quite a few edge cases that made it to the tests), but haven't run compare_branches yet.
That's definitely a good idea to do, but I'm also interested in preliminary feedback e.g. regarding the hanging indents.

Will try to get compare_branches to run.

Regarding performance, I'm not too worried because the XPath should produce relatively few hits on each expression and is written to avoid common nodes (such as expr), but will produce a benchmark when satisfied with the behavior.

@AshesITR
Copy link
Collaborator Author

I ran on the top 20 CRAN packages, yielding quite a few lints.

CLI has some odd formatted tryCatch() code, which doesn't expect a paren-indent because it's similar to the lapply example.
https://github.com/r-lib/cli/blob/76a2733b2a15893537c1ac95dee7dbd6bb885513/inst/examples/apps/news.R#L11-L26

Hanging indents of if conditions with logical conjunctions are expected to be indented by one more level, unlike here.
I can't find guidance in the tidyverse style guide on multi-line if conditions, just a hunch that they are discouraged alltogether.
https://github.com/r-lib/cli/blob/76a2733b2a15893537c1ac95dee7dbd6bb885513/R/ansi-hyperlink.R#L83-L84
Styler does something very ugly:

styler::style_text("
if (a &&
    b) {
  boo
}
")

# and

styler::style_text("
if (a &&
      b) {
  boo
}
")

#> if (a &&
#>   b) {
#>   boo
#> }

This one seems like it definitely shouldn't lint, but currently lints ll. 28-37:
https://github.com/r-lib/cli/blob/76a2733b2a15893537c1ac95dee7dbd6bb885513/inst/scripts/up.R#L22-L43

Ditto here, I think a hanging indent should trigger, although L232 should lint because the argument binds to || in the preceding line.
https://github.com/r-lib/cli/blob/76a2733b2a15893537c1ac95dee7dbd6bb885513/R/ansi.R#L231-L234

The huge number of lints produced will require careful examination.
For now, I'll try to get the lints on cli to be satisfying.

@AshesITR
Copy link
Collaborator Author

Satisfied with cli now.
ggplot2 features a style that triggers the hanging indent rule, causing lots of lints.
Maybe we need an option to allow unnamed arguments on the same line as the function call, followed by only named arguments on subsequent lines, to be block-indented instead of hanging-indented. WDYT @MichaelChirico?

R/scale-.R:801

#' @rdname ggplot2-ggproto
#' @format NULL
#' @usage NULL
#' @export
ScaleDiscrete <- ggproto("ScaleDiscrete", Scale,
  drop = TRUE,
  na.value = NA,
  n.breaks.cache = NULL,
  palette.cache = NULL,

  is_discrete = function() TRUE,

  train = function(self, x) {
    if (length(x) == 0) {
      return()
    }
    self$range$train(x, drop = self$drop, na.rm = !self$na.translate)
  }

# ...
)

@AshesITR
Copy link
Collaborator Author

Here's a current tally of lints:

     package    N
 1:        sf 8155 -- uses indent = 1
 2:       rgl 4015 -- uses indent = 4
 3:     rgeos 2295 -- uses indent = 4
 4:   ggplot2 1278
 5:   stringi 1228 --  uses indent = 4
 6:     vctrs  463
 7:     dplyr  297
 8:     rlang  263
 9:       cli  215
10:  magrittr  209
11:   stringr  152
12:    tibble  151
13:  devtools  113
14:  processx   89
15:      glue   81
16:      ragg   28
17:    pillar   15
18: lifecycle    5

Deduplicating consecutive lines:

      package   N
 1:   stringi 738
 2:   ggplot2 468
 3:     vctrs 137
 4:     dplyr 116
 5:       cli 108
 6:     rlang  72
 7:  processx  69
 8:  magrittr  65
 9:   stringr  63
10:      glue  34
11:    tibble  33
12:  devtools  26
13:      ragg  19
14:    pillar   4
15: lifecycle   3

Both stringi and ggplot2 feature function declarations of this form:

fun <- function(a, b)
{
  code
}

Since that style is caught by brace_linter(), indentation linter should let it slide.
Working in an exception for the hanging indent of function() s if they start with {.

@MichaelChirico
Copy link
Collaborator

Maybe suitable for follow-up but hard to discern from actual wrong indentation. These situations seem suited for # nolint start: indentation.

Agree it's tricky to always get right, but {styler} does have some logic that helps. Maybe the way to approach it is piecemeal -- if we can think of a good rule for detecting "false positives" in a certain case, add it, gradually improving.

@MichaelChirico
Copy link
Collaborator

These [testthat with multiline test description] seem like a false positive to me, i.e. shouldn't lint.

Agreed

@MichaelChirico
Copy link
Collaborator

These [space-designated code blocks in magrittr] are false positives but IMO should be treated by the code extractor in a similar fashion to #1043

Agreed, seems like a more general issue, no need to fix here I'd say

@MichaelChirico
Copy link
Collaborator

Regarding vctrs use of tab indents I think we should implicitly assume that "\t" is indent wide instead of 1.

Agreed, the linter should be useful even for packages using tabs-not-spaces, and \t==indent seems the easiest way to reconcile this without needing options.

But as it's not part of the style guide, happy to defer on that to a future PR.

@MichaelChirico
Copy link
Collaborator

{styler} butchers this (IMO):

Also agree. Are you using CRAN or dev {styler}? I know there's been a lot of improvements in dev. If dev, maybe worth an issue? IMO that should be edited to:

if (
  a &&
    b
) {
}

OTOH, I think {styler}'s current choice does stick to the letter of the guide (except maybe the trailing )?)... not sure if it's easy to fix automatically

@MichaelChirico
Copy link
Collaborator

do you have naming suggestions?

I like the idea of offering a few options. use_hanging_indent implies a boolean to me though, so how about hanging_indent_style = c(...)?

The 3 option names you gave sound good to me. Unless there's some universal reference to apply here, that's about as good as we'll be able to muster. Definitely a place for "See examples" in the docs & we use the examples you gave.

@AshesITR
Copy link
Collaborator Author

Okay, I've spotted a problem with the only remaining false-positive:

test_that("multi-line
           test describtion", {
})

hanging_call(looking,
             basically,
             identical = function() {
                but_with_a_need_for_hanging_indent
             })

Therefor, I think this is ready to merge now.
I've run multiple configurations of the linter against the testing packages to see if the adaptations turn out okay.
The result is the packages are surprisingly inconsistent in their indentation style 😆

Here is a tally of lints using default settings:

      package    N
 1:        sf 4710 -- 1416 with indent = 1
 2:       rgl 1646 -- 6584 with indent = 4
 3:     rgeos 1227 -- 428 with indent = 4
 4:   stringi 1023 -- 112 with indent = 4
 5:   ggplot2  514 -- 543 with hanging_indent_style = "never"
 6:       cli  271
 7:     vctrs  208
 8:     dplyr  126
 9:      glue  101
10:  processx   92
11:     rlang   89
12:   stringr   74
13:  magrittr   59
14:    tibble   33
15:  devtools   21
16:      ragg    8
17:    pillar    2
18: lifecycle    1

To get a gist of the consistency within "problematic" packages, I intersected the default lints with the customized parametrization to find only completely broken indentations.
Here are these counts where neither the default settings nor the style resulting in most of the default lints are valid indentations:

   package    N
1:     rgl 1318 -- some valid lints, some should disappear with hanging_indent_style = "always"
2:      sf 1005 -- extremely inconsistent indents (sometimes different levels in the same file, tabs and spaces mixed)
3:   rgeos  292 -- I see mostly commented code where the comments aren't indented properly
4: ggplot2  212 -- inconsistent indents; especially R/theme-defaults.r has no consistent indentation rules.
5: stringi  112 -- most should disappear with hanging_indent_style = "always"

@MichaelChirico are you okay with the propsal? Then I'll see to fixing our lints and get this merged.

@MichaelChirico
Copy link
Collaborator

LGTM. I think you are the master of indentation now. You'll need a cross & garlic to read r-devel source code from now on 😄

@MichaelChirico
Copy link
Collaborator

Okay, I've spotted a problem with the only remaining false-positive:

BTW, QQ here to check my understanding, the difference between the testthat example:

test_that("multi-line
           test describtion", {
})

and the comparison example:

hanging_call(looking,
             basically,
             identical = function() {
                but_with_a_need_for_hanging_indent
             })

is that identical= is a new argument on its own line, whereas { is coming following the previous argument. So I guess it's actually closer to the ggplot2 examples of indent styles, so would this be a closer comparison?

hanging_call(looking,
             basically, function() {
                but_with_a_need_for_hanging_indent
             })

@MichaelChirico MichaelChirico modified the milestones: 3.1.0, 3.0.2 Oct 17, 2022
@MichaelChirico MichaelChirico merged commit c224853 into main Oct 17, 2022
@MichaelChirico MichaelChirico deleted the feature/indentation-linter branch October 17, 2022 23:28
@MichaelChirico
Copy link
Collaborator

Great work here @AshesITR !! thanks for the final push to get it over the line.

Original issue = 6.5 years ago
Original implementation = 2.5 years ago
Started this PR = 4 months ago & 36 commits

🚀 🚀 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

indentation consistency
5 participants