Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate_argument_linter should skip on mutate() calls #1345

Closed
MichaelChirico opened this issue Jun 1, 2022 · 4 comments · Fixed by #1550
Closed

duplicate_argument_linter should skip on mutate() calls #1345

MichaelChirico opened this issue Jun 1, 2022 · 4 comments · Fixed by #1550

Comments

@MichaelChirico
Copy link
Collaborator

MichaelChirico commented Jun 1, 2022

dplyr::mutate() allows sequential updates to variables, e.g.

x %>%
  dplyr::mutate(
    col = gsub("t", "", col),
    col = gsub("\\s+$", "xxx", col)
  )

but duplicate_argument_linter() sees col used twice and throws.

@MichaelChirico MichaelChirico mentioned this issue Jun 2, 2022
21 tasks
@AshesITR
Copy link
Collaborator

AshesITR commented Jun 2, 2022

mutate and transmute(?) could be added to the default exclude list.

OTOH I don't like this style and, at least on older versions of dbplyr, that backend actually didn't work with this kind of code, since reusing columns in SQL isn't possible. So I'm slightly in favor of linting this by default.

WDYT?

@MichaelChirico
Copy link
Collaborator Author

The problem IMO is every mutate() copies the table again which is quite wasteful in general. For dbplyr, it's not as wasteful since those backends are typically lazy.

We have a ConsecutiveMutateLinter() for precisely this, and it attempts a carveout for dbplyr by looking for library(dbplyr) in the script.

Maybe we should have @hadley or @DavisVaughan weigh in here about dplyr best practices.

@hadley
Copy link
Member

hadley commented Jun 2, 2022

I personally would convert the above to use pipe:

x |> 
  dplyr::mutate(
    col = col |> str_replace("t", "") |> str_replace("\\s+$", "xxx")
  )

But I don't thinking mutating a variable multiple times is an anti-pattern. dplyr copies the column involved, so it's not quite as expensive as you might think.

@MichaelChirico
Copy link
Collaborator Author

OK, looks like an allowlist is the way to go then. not tagging for 3.0.0 release either.

IndrajeetPatil added a commit that referenced this issue Sep 25, 2022
* Add exceptions for `duplicate_argument_linter()`

Closes #1345

* Change argument default instead

* address review comments

* Update test-duplicate_argument_linter.R

* more rewording of NEWS

* #basePipe

* wording in docs

* manual roxy

* vestigial

Co-authored-by: Michael Chirico <chiricom@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants