Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recode.haven_labelled(... , .combine_value_labels) fails if first value is NA #151

Closed
torfason opened this issue Aug 30, 2023 · 1 comment · Fixed by #152
Closed

recode.haven_labelled(... , .combine_value_labels) fails if first value is NA #151

torfason opened this issue Aug 30, 2023 · 1 comment · Fixed by #152

Comments

@torfason
Copy link

torfason commented Aug 30, 2023

I found a tricky bug in the recode function, when using .combine_value_labels. If the first value in the vector is NA, the recoded level disappears, instead of being revalued or combined with the target level.

First, a reprex:

library(dplyr, warn.conflict = FALSE)
library(labelled)

labs <- c("Yes" = 1, "No" = 2, "Don't know" = 3, "Won't answer" = 4)

works <- labelled(c(1, 2, 3, 4, NA), labels = labs)
fails <- labelled(c(NA, 1, 2, 3, 4), labels = labs)

# Correctly recodes the "No" level to zero
recode(works, `2` = 0, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1]  1  0  3  4 NA
#> 
#> Labels:
#>  value        label
#>      1          Yes
#>      3   Don't know
#>      4 Won't answer
#>      0           No

# The "No" level disappears
recode(fails, `2` = 0, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1] NA  1  0  3  4
#> 
#> Labels:
#>  value        label
#>      1          Yes
#>      3   Don't know
#>      4 Won't answer

# Correctly combines "Don't know" and "Won't answer"
recode(works, `4` = 3, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1]  1  2  3  3 NA
#> 
#> Labels:
#>  value                     label
#>      1                       Yes
#>      2                        No
#>      3 Don't know / Won't answer

# "Won't answer" disappears
recode(fails, `4` = 3, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1] NA  1  2  3  3
#> 
#> Labels:
#>  value      label
#>      1        Yes
#>      2         No
#>      3 Don't know

Created on 2023-08-30 with reprex v2.0.2

Then, a potential fix:

It seems that the NA element messes with the logic at the middle of the function for choosing which labels to keep. Below is a version of the function that removes the NA values from ret and .x before doing that processing. It seems that this fixes the issue, although it might be worth reviewing the fix to ensure that it does not introduce any other issues.

library(labelled)

rcd <- function (.x, ..., .default = NULL, .missing = NULL, .keep_value_labels = TRUE,
    .combine_value_labels = FALSE, .sep = " / ")
{
    ret <- dplyr::recode(.x = unclass(.x), ..., .default = .default,
        .missing = .missing)
    if (mode(.x) == mode(ret)) {
        if (.keep_value_labels) {
            ret <- copy_labels(.x, ret)
        }
        if (.combine_value_labels) {
            ret <- copy_labels(.x, ret)
            # START OF MODIFIED PART
            ret_noNA <- ret[!is.na(ret)]
            .x_noNA <- .x[!is.na(.x)]
            old_vals <- unique(.x_noNA)
            new_vals <- c()
            for (o in old_vals) {
                new_vals <- c(new_vals, ret_noNA[.x_noNA == o][1])
            }
            # END OF MODIFIED PART
            original_labels <- val_labels(.x)
            for (v in unique(new_vals)) {
                combined_label <- names(original_labels[original_labels %in%
                  old_vals[new_vals == v]])
                if (length(combined_label) > 0)
                  val_label(ret, v) <- paste(combined_label,
                    collapse = .sep)
            }
            ret <- drop_unused_value_labels(ret)
        }
    }
    else {
        var_label(ret) <- var_label(.x)
        if (.keep_value_labels || .combine_value_labels)
            warning("The type of .x has been changed and value labels attributes",
                "have been lost.")
    }
    ret
}


labs <- c("Yes" = 1, "No" = 2, "Don't know" = 3, "Won't answer" = 4)

works <- labelled(c(1, 2, 3, 4, NA), labels = labs)
fails <- labelled(c(NA, 1, 2, 3, 4), labels = labs)

# Correctly recodes the "No" level to zero
rcd(works, `2` = 0, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1]  1  0  3  4 NA
#> 
#> Labels:
#>  value        label
#>      1          Yes
#>      3   Don't know
#>      4 Won't answer
#>      0           No

# The "No" level no longer disappears
rcd(fails, `2` = 0, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1] NA  1  0  3  4
#> 
#> Labels:
#>  value        label
#>      1          Yes
#>      3   Don't know
#>      4 Won't answer
#>      0           No

# Correctly combines "Don't know" and "Won't answer"
rcd(works, `4` = 3, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1]  1  2  3  3 NA
#> 
#> Labels:
#>  value                     label
#>      1                       Yes
#>      2                        No
#>      3 Don't know / Won't answer

# "Won't answer" no longer disappears
rcd(fails, `4` = 3, .combine_value_labels = TRUE)
#> <labelled<double>[5]>
#> [1] NA  1  2  3  3
#> 
#> Labels:
#>  value                     label
#>      1                       Yes
#>      2                        No
#>      3 Don't know / Won't answer

Created on 2023-08-30 with reprex v2.0.2

larmarange added a commit that referenced this issue Aug 31, 2023
when `.x` contains `NA` and `.combine_value_labels = TRUE`

fix #151
@larmarange
Copy link
Owner

Thanks for the identification of the bug.

It should be fixed in #152

larmarange added a commit that referenced this issue Aug 31, 2023
* fix in `recode.haven_labelled()`

when `.x` contains `NA` and `.combine_value_labels = TRUE`

fix #151

* additinal test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants