Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv(col_select = dplyr::select() variable) fails at variable declaration time #270

Closed
twest820 opened this issue Jul 7, 2022 · 5 comments · Fixed by #290
Closed

Comments

@twest820
Copy link

twest820 commented Jul 7, 2022

What I expected to work:

columnsOfInterest = c("Year", "ID1", starts_with("Tmin"))
bind_rows(read_csv("file1.csv", col_select = columnsOfInterest),
          read_csv("file2.csv", col_select = columnsOfInterest))

What actually happened:

Error:
! `starts_with()` must be used within a *selecting* function.See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Error:
! `starts_with()` must be used within a *selecting* function.See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
---
Backtrace:
 1. tidyselect::starts_with("Tmin")
 3. tidyselect::peek_vars(fn = "starts_with")
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error:
! `starts_with()` must be used within a *selecting* function.See <https://tidyselect.r-lib.org/reference/faq-selection-context.html>.
---
Backtrace:1. └─tidyselect::starts_with("Tmin")
 2.   ├─vars %||% peek_vars(fn = "starts_with")
 3.   └─tidyselect::peek_vars(fn = "starts_with")
 4.     └─rlang::abort(msg, call = NULL)

What the best workaround appears to be:

bind_rows(read_csv("file1.csv", col_select = c("Year", "ID1", starts_with("Tmin")),
          read_csv("file2.csv", col_select = c("Year", "ID1", starts_with("Tmin")))

It seems strange to force users loading a bunch of similar .csv files to repeatedly indicate col_select rather than declaring it once. As with many such things it's not a big deal once you figure out what to do, though I did find the linked FAQ and other readr and dplyr documentation to have no relevant content.

It'd be nice to be able to write more proper code instead of relying on find/replace to maintain many hardcoded instances of the same thing. This is, as usual, a trivialized repex—the actual c() statement is considerably more complex—and I certainly wasn't expecting which line of code a c() was declared in would influence whether the declaration was valid.

I'm putting this issue in tidyselect since it seems like this might be worth a look from the context of dplyr::select() mini-language design it appears this is the repo which covers that. From the backtrace it seems peek_vars() might be prematurely positioned.

@lionel-
Copy link
Member

lionel- commented Jul 8, 2022

From the backtrace it seems peek_vars() might be prematurely positioned.

This is true, but because you've called it too soon. As mentioned in the error message, selection helpers must be called in selection context. One way would be to make a selection helper, in this case:

columnsOfInterest <- function() {
  c(
    all_of(c("Year", "ID1")),
    starts_with("Tmin")
  )
}

bind_rows(
  read_csv("file1.csv", col_select = columnsOfInterest()),
  read_csv("file2.csv", col_select = columnsOfInterest())
)

This makes colsOfInterest() a selection helper that you then call in selection context.

However this will first need a fix in all_of() to return a vector of positions, like starts_with() does, otherwise this will be combining strings and numbers. Alternatively, we could also support lists, and then this would be:

columnsOfInterest <- function() {
  list(
    all_of(c("Year", "ID1")),
    starts_with("Tmin")
  )
}

@twest820
Copy link
Author

twest820 commented Jul 8, 2022

This is true, but because you've called it too soon.

The mini-language failing to compose with a routine readr coding pattern really isn't something an application programmer can be held at fault for. It seems there's a tidyverse design hole here and, from a black box perspective, the most obvious structural fix appears to be for tidyselect to stop assuming construction means use.

As mentioned in the error message, selection helpers must be called in selection context.

The error message and linked help make no effort to explain what a selection context is (a search gives no hits, too) or why tidyselect conflates construction and use. If the design intent was to require lazy construction it's curious the documentation doesn't seem to mention that at all.

@lionel-
Copy link
Member

lionel- commented Jul 8, 2022

There is no need to be unpleasant here.

It seems there's a tidyverse design hole here and, from a black box perspective, the most obvious structural fix appears to be for tidyselect to stop assuming construction means use.

This is only an appearance. There is no way for tidyselect to work the way that you want because it's a contradiction in semantics.

I'm keeping this issue open to look into making it easier to create selection helpers that compose multiple selections (cf my previous comment) and I'll take another look at this error message.

@twest820

This comment was marked as off-topic.

@r-lib r-lib locked as too heated and limited conversation to collaborators Jul 8, 2022
@hadley
Copy link
Member

hadley commented Aug 10, 2022

I think we could just add a little more documentation to faq-selection-context, to cover the case where you're extracting out repeated code.

hadley added a commit that referenced this issue Aug 12, 2022
hadley added a commit that referenced this issue Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants