You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was thinking about a special (and annoying) case where factors have explicit levels, but that level does not have name. I think the most common case is when users may use forcats::fct_na_value_to_level(). We don't have an error when running ard_continuous() BUT I think it'll throw a wrench into the shuffle functions (due to all the assumptions we make about NA values).
What do you think we should do? I am fine with detecting a level without a name and returning an error. What do you think? @bzkrouse
set.seed(123456)
# Create a version of iris$Species that has missing entries.sampled_Species<- sample(c(NA, "setosa", "virginica", "versicolor"), size=150, replace=TRUE)
# By default, forcats::fct_na_value_to_level() turns missings into a level called `NA` that is actually# a missing level name.na_Species<-forcats::fct_na_value_to_level(sampled_Species)
my_iris<-irismy_iris$na_Species<-na_Species
levels(my_iris$na_Species)
#> [1] "setosa" "versicolor" "virginica" NAcards::ard_continuous(
my_iris,
by=na_Species,
variables=Sepal.Length
) |>
tail()
#> {cards} data frame: 6 x 10#> group1 group1_level variable stat_name stat_label stat#> 1 na_Species NA Sepal.Length sd SD 0.931#> 2 na_Species NA Sepal.Length median Median 6.4#> 3 na_Species NA Sepal.Length p25 25th Per… 5.5#> 4 na_Species NA Sepal.Length p75 75th Per… 6.7#> 5 na_Species NA Sepal.Length min Min 4.6#> 6 na_Species NA Sepal.Length max Max 7.9#> ℹ 4 more variables: context, fmt_fn, warning, error
check_na_factor_levels<-function(data, variables) {
walk(
variables,
\(variable) {
if (is.factor(data[[variable]]) && any(is.na(levels(data[[variable]])))) {
cli::cli_abort(
"Factors with {.val {NA}} levels are not allowed, which are present in column {.val {variable}}.",
call= get_cli_abort_call()
)
}
}
)
}
I was thinking about a special (and annoying) case where factors have explicit levels, but that level does not have name. I think the most common case is when users may use
forcats::fct_na_value_to_level()
. We don't have an error when runningard_continuous()
BUT I think it'll throw a wrench into the shuffle functions (due to all the assumptions we make about NA values).What do you think we should do? I am fine with detecting a level without a name and returning an error. What do you think? @bzkrouse
Created on 2024-06-02 with reprex v2.1.0
The text was updated successfully, but these errors were encountered: