Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses bug where the number of missings in a row is not calcu… #239

Merged
merged 1 commit into from Oct 21, 2019

Conversation

njtierney
Copy link
Owner

… properly, resolves 238 and 232. Solution also made prop_miss_case 3 times faster.

Description

Proportion of missings in a given case (row) had a line of code to protect against cases where there were all missings or some missings, but it had the logic reversed.

This was kind of hard to understand, so I wrote a new implementation using the rowSums(is.na(x)) pattern, which in my mind is easier to understand than using stats::complete.cases. It is also 3 times faster, which is a nice bonus.

library(naniar)

# This tests against
bad_air_quality <- tibble::tribble(
  ~Ozone, ~Solar.R, ~Wind, ~Temp, ~Month, ~Day,
  NA,      190,   7.4,    67,      5,    1,
  36,       NA,     8,    72,      5,    2,
  12,      149,    NA,    74,      5,    3,
  18,      313,  11.5,    NA,      5,    4,
  NA,       NA,  14.3,    56,     NA,    5,
  28,       NA,  14.9,    66,      5,   NA,
  NA,      190,   7.4,    67,      5,    1,
  36,       NA,     8,    72,      5,    2,
  12,      149,    NA,    74,      5,    3,
  18,      313,  11.5,    NA,      5,    4,
  NA,       NA,  14.3,    56,     NA,    5,
  28,       NA,  14.9,    66,      5,   NA
)

old <- function(x){
  
  temp <- x %>%
    # which rows are complete?
    stats::complete.cases() %>%
    mean()
  
  # Return 1 if temp is 1
  # Prevent error when all the rows contain a NA and then mean is 1
  # so (1 -1)*100 = 0, whereas function should return 1
  if (temp == 1) {
    return(0)
  }
  
  if (temp == 1) {
    # Return 0 if temp is 0
    # Prevent error when no row contains a NA and then mean is 0
    # so (1 -0)*1 = 1, whereas function should return 0.
    return(0)
  }
  
  return((1 - temp))
  
}

bm1 <- bench::mark(
  old = old(bad_air_quality),
  new = prop_miss_case(bad_air_quality)
)

bench:::autoplot.bench_mark(bm1)
#> Loading required namespace: tidyr

summary(bm1, relative = TRUE)
#> # A tibble: 2 x 6
#>   expression   min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 old         3.19   3.20      1         2.21     1   
#> 2 new         1      1         3.36      1        1.18

Created on 2019-10-21 by the reprex package (v0.3.0)

Related Issue

Closes #238 and #232

Example

naniar::prop_miss_case(tibble::tibble(x = NA))
#> [1] 1

Created on 2019-10-21 by the reprex package (v0.3.0)

Tests

Yes - see

NEWS + DESCRIPTION

Yes - both.

… properly, resolves 238 and 232. Solution also made prop_miss_case 3 times faster.
@njtierney njtierney changed the title Addresses bug where the number of missings in a row is not calculated… Addresses bug where the number of missings in a row is not calcu… Oct 21, 2019
@njtierney njtierney merged commit bccfe59 into master Oct 21, 2019
@njtierney njtierney deleted the fix-prop-miss branch October 21, 2019 00:03
@njtierney njtierney mentioned this pull request Oct 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prop_miss_case when each row has missing value
1 participant