Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where() with NA values #236

Closed
MyKo101 opened this issue Apr 22, 2021 · 2 comments · Fixed by #278
Closed

where() with NA values #236

MyKo101 opened this issue Apr 22, 2021 · 2 comments · Fixed by #278

Comments

@MyKo101
Copy link

MyKo101 commented Apr 22, 2021

The where() function produces relatively uninformative errors when functions output NA values. It would be useful for the user if this was checked.

In this example, the function, ~any(str_detect(.x,",")) returns TRUE and FALSE values, but also returns NA values, and so the below throws an uninformative error:

gss_cat %>%
  select(where(~any(str_detect(.x,","))))
Error: `where()` must be used with functions that return `TRUE` or `FALSE`.

This can be rectified using na.rm=T in the any() function, as below:

gss_cat %>%
  select(where(~any(str_detect(.x,","),na.rm=T)))

I believe that having this throw an error is important, as it draw attention to the NA values, but since all the values are logical, the error should be more informative.

This is also in contrast to the select_if() version of this call:

gss_cat %>%
  select_if(~any(str_detect(.,",")))

which treats NA values the same as FALSE

@NeuronalMike
Copy link

NeuronalMike commented Aug 20, 2021

I believe this is a limitation with the anonymous function you have used rather than an issue with the where() function .

fruit <- c("apple", "banana", "pear", "pinapple", NA); stringr::str_detect(fruit, "a"); [1] TRUE TRUE TRUE TRUE NA
Open to being wrong, though I do believe its a limit of the stringr::str_detect function. Maybe stringr or string has a better alternative?

@hadley
Copy link
Member

hadley commented Aug 10, 2022

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  a = c("x", NA), 
  b = c(NA, NA), 
  c = c("y", "y")
)

df |> select(where(~ any(stringr::str_detect(.x, "x"))))
#> Error in `FUN()`:
#> ! `where()` must be used with functions that return `TRUE` or `FALSE`.

Created on 2022-08-10 by the reprex package (v2.0.1)

Interestingly the problem doesn't occur with grepl() because it appears to never match missing strings.

hadley added a commit that referenced this issue Aug 10, 2022
@hadley hadley closed this as completed in 4b26df5 Aug 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants