Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factors with NA level #255

Open
ddsjoberg opened this issue Jun 3, 2024 · 1 comment
Open

Factors with NA level #255

ddsjoberg opened this issue Jun 3, 2024 · 1 comment
Assignees

Comments

@ddsjoberg
Copy link
Collaborator

I was thinking about a special (and annoying) case where factors have explicit levels, but that level does not have name. I think the most common case is when users may use forcats::fct_na_value_to_level(). We don't have an error when running ard_continuous() BUT I think it'll throw a wrench into the shuffle functions (due to all the assumptions we make about NA values).

What do you think we should do? I am fine with detecting a level without a name and returning an error. What do you think? @bzkrouse

set.seed(123456)

# Create a version of iris$Species that has missing entries.
sampled_Species <- sample(c(NA, "setosa", "virginica", "versicolor"), size = 150, replace = TRUE)

# By default, forcats::fct_na_value_to_level() turns missings into a level called `NA` that is actually
# a missing level name.
na_Species <- forcats::fct_na_value_to_level(sampled_Species)

my_iris <- iris
my_iris$na_Species <- na_Species
levels(my_iris$na_Species)
#> [1] "setosa"     "versicolor" "virginica"  NA

cards::ard_continuous(
  my_iris,
  by = na_Species,
  variables = Sepal.Length
) |> 
  tail()
#> {cards} data frame: 6 x 10
#>       group1 group1_level     variable stat_name stat_label  stat
#> 1 na_Species           NA Sepal.Length        sd         SD 0.931
#> 2 na_Species           NA Sepal.Length    median     Median   6.4
#> 3 na_Species           NA Sepal.Length       p25  25th Per…   5.5
#> 4 na_Species           NA Sepal.Length       p75  75th Per…   6.7
#> 5 na_Species           NA Sepal.Length       min        Min   4.6
#> 6 na_Species           NA Sepal.Length       max        Max   7.9
#> ℹ 4 more variables: context, fmt_fn, warning, error

Created on 2024-06-02 with reprex v2.1.0

@ddsjoberg
Copy link
Collaborator Author

Just need to add this function

check_na_factor_levels <- function(data, variables) {
  walk(
    variables,
    \(variable) {
      if (is.factor(data[[variable]]) && any(is.na(levels(data[[variable]])))) {
        cli::cli_abort(
          "Factors with {.val {NA}} levels are not allowed, which are present in column {.val {variable}}.",
          call = get_cli_abort_call()
        )
      }
    }
  )
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant