unneeded_concatenation_linter() can throw on valid usages #1344

MichaelChirico · 2022-06-01T23:32:46Z

lint("c(matrix(1:10, 2, 5))\n", unneeded_concatenation_linter())
# <text>:1:1: style: Unneeded concatenation of a constant. Remove the "c" call.
# c(matrix(1:10, 1, 5))
# ^~~~~~~~~~~~~~~~~~~~~

c() can be used as here to concisely drop the dimensions on a matrix, so we can't just throw a lint on all c(<expr>) cases.

We have considered whether this usage should always be replaced by a more declarative version like as.vector() instead; perhaps we could do so with a linter parameter. In any case, this new lint marks a departure from 2.0.1 behavior.

Note that the new behavior is good in some cases, e.g. it newly finds c(1 - alpha / 2) and c(1:10), but I'm not sure there's a maintainable way to always catch these.

The text was updated successfully, but these errors were encountered:

AshesITR · 2022-06-02T05:26:34Z

So we should customize the lint message in that case?
Maybe for all /SYMBOL_FUNCTION_CALL cases?

MichaelChirico · 2022-06-02T05:28:14Z

Yes, I think a custom message would be good (for the optional case where we force as.vector). SYMBOL should be included too:

m <- matrix(1:10, 5, 2)
c(m)

MichaelChirico · 2022-06-02T18:30:08Z

Adding the note from ?c here:

c is sometimes used for its side effect of removing attributes except names, for example to turn an array into a vector. as.vector is a more intuitive way to do this, but also drops names. Note that methods other than the default are not required to do this (and they will almost certainly preserve a class attribute).

I don't totally follow this, but the main takeaway is people may be using c() to accomplish name-preserving as.vector(). That doesn't seem like great practice, but reassures me that allowing c(x) by default is the better approach.

MichaelChirico · 2022-06-02T18:50:20Z

Clarifying the bug -- the issue is that any constant in the expression throws off the linter -- otherwise, it knows to ignore c(x) already:

# at HEAD
lint("\nc(x)", unneeded_concatenation_linter())
lint("\nc(x / y)", unneeded_concatenation_linter())
lint("\nc(x / 2)", unneeded_concatenation_linter())
# <text>:2:1: style: Unneeded concatenation of a constant. Remove the "c" call.
# c(x / 2)
# ^~~~~~~~

AshesITR · 2022-06-02T22:16:23Z

Not sure I follow the c(x / 2) case.
I think it's reasonable to allow c(symbol) by default and to use a different message if we encounter a SYMBOL_FUNCTION_CALL, i.e. the c(matrix(...)) case.
But for other exprs, such as c(a / b), c(a + b), c(a:b), a lint should be thrown by default IMO.
If the intention is to strip attributes except for names (which I can't imagine a good use case for right now), the code could still be refactored by introducing a local variable to stress the intention of using c(symbol).

Or is there an example piece of code where this feature is really needed and as.vector() can't be used?

MichaelChirico · 2022-06-02T22:24:36Z

x <- matrix(1:10, 5, 2)
c(x / 2)

I don't think we should lint that. I don't think this is a good example to refactor either -- needlessly verbose:

x <- matrix(1:10, 5, 2)
x_half <- x / 2
c(x_half)

I don't know immediately of any examples where as.vector() can't be used, but both c() and as.vector() are generics, so it's easy to imagine there are cases, e.g. a c() method is defined but as.vector() is not.

So for now, I think we should be conservative and not lint by default.

MichaelChirico · 2022-06-02T22:35:54Z

Hid a comment about c(x:y) when x and y are factors that was mistaken. That case is the same as any other c(factor) usage.

AshesITR · 2022-06-02T22:43:24Z

x <- matrix(1:10, 5, 2)
c(x / 2)
I don't think we should lint that. I don't think this is a good example to refactor either -- needlessly verbose:

x <- (1:10) / 2 and x <- seq(0.5, 5, by = 0.5) do the same, and as.vector(x / 2) returns the same, so I think it should be linted.
Still failing to find a valid piece of code that would require c(symbol / 42) to work (i.e. needs names) and can't be trivially refactored to something nicer.

MichaelChirico · 2022-06-02T22:50:47Z

but we don't always have control of generating x. e.g. y <- other_pkg::foo(x); c(y / 2).

MichaelChirico · 2022-06-02T22:54:44Z

I still think we need to err on the side of caution -- we should always prefer avoiding false positives to false negatives.

Here, it's an unknown unknown -- we can't think of any examples where as.vector() won't work, but that doesn't prove there aren't any.

One approach to reassuring ourselves would be to check empirical usages with .dev/compare_branches and comb through that to find any edge cases we haven't considered. But I'm not particularly keen on doing that myself.

AshesITR · 2022-06-02T22:55:02Z

y <- other_pkg::foo(x) / 2; c(y)
I feel like we trade a rare false positive for many false negatives here.

MichaelChirico · 2022-06-02T23:02:54Z

allow_single_expression = FALSE in #1350 unlocks all of those for linting. I am still in favor of conservatism by default.

AshesITR · 2022-06-03T05:56:41Z

Hmm, okay.

MichaelChirico · 2022-06-03T06:06:28Z

As a compromise I will emphasize that we encourage changing the default in the NEWS item. Also sets us up to change the default in the future if we get more confidence/feedback about it after release.

MichaelChirico added false-positive code that shouldn't lint, but does regression code that used to work, but now doesn't labels Jun 1, 2022

MichaelChirico added this to the 3.0.0 milestone Jun 1, 2022

MichaelChirico mentioned this issue Jun 2, 2022

Release lintr 3.0.0 #824

Closed

21 tasks

MichaelChirico mentioned this issue Jun 2, 2022

unneeded_concatenation_linter improves accuracy, gains argument for c(x) lints #1350

Merged

This comment was marked as outdated.

Sign in to view

AshesITR closed this as completed in #1350 Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unneeded_concatenation_linter() can throw on valid usages #1344

unneeded_concatenation_linter() can throw on valid usages #1344

MichaelChirico commented Jun 1, 2022

AshesITR commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022 •

edited

Loading

MichaelChirico commented Jun 2, 2022 •

edited

Loading

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022 •

edited

Loading

This comment was marked as outdated.

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 2, 2022 •

edited

Loading

MichaelChirico commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 3, 2022

MichaelChirico commented Jun 3, 2022

unneeded_concatenation_linter() can throw on valid usages #1344

unneeded_concatenation_linter() can throw on valid usages #1344

Comments

MichaelChirico commented Jun 1, 2022

AshesITR commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022 • edited Loading

MichaelChirico commented Jun 2, 2022 • edited Loading

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022 • edited Loading

This comment was marked as outdated.

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 2, 2022 • edited Loading

MichaelChirico commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 2, 2022

MichaelChirico commented Jun 2, 2022

AshesITR commented Jun 3, 2022

MichaelChirico commented Jun 3, 2022

MichaelChirico commented Jun 2, 2022 •

edited

Loading

MichaelChirico commented Jun 2, 2022 •

edited

Loading

MichaelChirico commented Jun 2, 2022 •

edited

Loading

AshesITR commented Jun 2, 2022 •

edited

Loading