Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unneeded_concatenation_linter() can throw on valid usages #1344

Closed
MichaelChirico opened this issue Jun 1, 2022 · 15 comments · Fixed by #1350
Closed

unneeded_concatenation_linter() can throw on valid usages #1344

MichaelChirico opened this issue Jun 1, 2022 · 15 comments · Fixed by #1350
Labels
false-positive code that shouldn't lint, but does regression code that used to work, but now doesn't
Milestone

Comments

@MichaelChirico
Copy link
Collaborator

lint("c(matrix(1:10, 2, 5))\n", unneeded_concatenation_linter())
# <text>:1:1: style: Unneeded concatenation of a constant. Remove the "c" call.
# c(matrix(1:10, 1, 5))
# ^~~~~~~~~~~~~~~~~~~~~

c() can be used as here to concisely drop the dimensions on a matrix, so we can't just throw a lint on all c(<expr>) cases.

We have considered whether this usage should always be replaced by a more declarative version like as.vector() instead; perhaps we could do so with a linter parameter. In any case, this new lint marks a departure from 2.0.1 behavior.

Note that the new behavior is good in some cases, e.g. it newly finds c(1 - alpha / 2) and c(1:10), but I'm not sure there's a maintainable way to always catch these.

@MichaelChirico MichaelChirico added false-positive code that shouldn't lint, but does regression code that used to work, but now doesn't labels Jun 1, 2022
@MichaelChirico MichaelChirico added this to the 3.0.0 milestone Jun 1, 2022
@MichaelChirico MichaelChirico mentioned this issue Jun 2, 2022
21 tasks
@AshesITR
Copy link
Collaborator

AshesITR commented Jun 2, 2022

So we should customize the lint message in that case?
Maybe for all /SYMBOL_FUNCTION_CALL cases?

@MichaelChirico
Copy link
Collaborator Author

MichaelChirico commented Jun 2, 2022

Yes, I think a custom message would be good (for the optional case where we force as.vector). SYMBOL should be included too:

m <- matrix(1:10, 5, 2)
c(m)

@MichaelChirico
Copy link
Collaborator Author

MichaelChirico commented Jun 2, 2022

Adding the note from ?c here:

c is sometimes used for its side effect of removing attributes except names, for example to turn an array into a vector. as.vector is a more intuitive way to do this, but also drops names. Note that methods other than the default are not required to do this (and they will almost certainly preserve a class attribute).

I don't totally follow this, but the main takeaway is people may be using c() to accomplish name-preserving as.vector(). That doesn't seem like great practice, but reassures me that allowing c(x) by default is the better approach.

@MichaelChirico
Copy link
Collaborator Author

Clarifying the bug -- the issue is that any constant in the expression throws off the linter -- otherwise, it knows to ignore c(x) already:

# at HEAD
lint("\nc(x)", unneeded_concatenation_linter())
lint("\nc(x / y)", unneeded_concatenation_linter())
lint("\nc(x / 2)", unneeded_concatenation_linter())
# <text>:2:1: style: Unneeded concatenation of a constant. Remove the "c" call.
# c(x / 2)
# ^~~~~~~~

@AshesITR
Copy link
Collaborator

AshesITR commented Jun 2, 2022

Not sure I follow the c(x / 2) case.
I think it's reasonable to allow c(symbol) by default and to use a different message if we encounter a SYMBOL_FUNCTION_CALL, i.e. the c(matrix(...)) case.
But for other exprs, such as c(a / b), c(a + b), c(a:b), a lint should be thrown by default IMO.
If the intention is to strip attributes except for names (which I can't imagine a good use case for right now), the code could still be refactored by introducing a local variable to stress the intention of using c(symbol).

Or is there an example piece of code where this feature is really needed and as.vector() can't be used?

@MichaelChirico
Copy link
Collaborator Author

MichaelChirico commented Jun 2, 2022

x <- matrix(1:10, 5, 2)
c(x / 2)

I don't think we should lint that. I don't think this is a good example to refactor either -- needlessly verbose:

x <- matrix(1:10, 5, 2)
x_half <- x / 2
c(x_half)

I don't know immediately of any examples where as.vector() can't be used, but both c() and as.vector() are generics, so it's easy to imagine there are cases, e.g. a c() method is defined but as.vector() is not.

So for now, I think we should be conservative and not lint by default.

@MichaelChirico

This comment was marked as outdated.

@MichaelChirico
Copy link
Collaborator Author

Hid a comment about c(x:y) when x and y are factors that was mistaken. That case is the same as any other c(factor) usage.

@AshesITR
Copy link
Collaborator

AshesITR commented Jun 2, 2022

x <- matrix(1:10, 5, 2)
c(x / 2)

I don't think we should lint that. I don't think this is a good example to refactor either -- needlessly verbose:

x <- (1:10) / 2 and x <- seq(0.5, 5, by = 0.5) do the same, and as.vector(x / 2) returns the same, so I think it should be linted.
Still failing to find a valid piece of code that would require c(symbol / 42) to work (i.e. needs names) and can't be trivially refactored to something nicer.

@MichaelChirico
Copy link
Collaborator Author

but we don't always have control of generating x. e.g. y <- other_pkg::foo(x); c(y / 2).

@MichaelChirico
Copy link
Collaborator Author

I still think we need to err on the side of caution -- we should always prefer avoiding false positives to false negatives.

Here, it's an unknown unknown -- we can't think of any examples where as.vector() won't work, but that doesn't prove there aren't any.

One approach to reassuring ourselves would be to check empirical usages with .dev/compare_branches and comb through that to find any edge cases we haven't considered. But I'm not particularly keen on doing that myself.

@AshesITR
Copy link
Collaborator

AshesITR commented Jun 2, 2022

y <- other_pkg::foo(x) / 2; c(y)
I feel like we trade a rare false positive for many false negatives here.

@MichaelChirico
Copy link
Collaborator Author

allow_single_expression = FALSE in #1350 unlocks all of those for linting. I am still in favor of conservatism by default.

@AshesITR
Copy link
Collaborator

AshesITR commented Jun 3, 2022

Hmm, okay.

@MichaelChirico
Copy link
Collaborator Author

As a compromise I will emphasize that we encourage changing the default in the NEWS item. Also sets us up to change the default in the future if we get more confidence/feedback about it after release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
false-positive code that shouldn't lint, but does regression code that used to work, but now doesn't
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants