Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prop_miss_case when each row has missing value #238

Closed
emilelatour opened this issue Oct 19, 2019 · 4 comments · Fixed by #239
Closed

prop_miss_case when each row has missing value #238

emilelatour opened this issue Oct 19, 2019 · 4 comments · Fixed by #239
Labels

Comments

@emilelatour
Copy link

Hey Nick!

Hope you're well!!

I was using the naniar package on a work data set that had at least one missing value in each observation. So >= 1 missing value on each row in the data frame. I expected that naniar::propr_miss_case would return 1.00 but instead it returns 0.00. Similar with naniar::prop_complete_case. I recreated this with the reprex below. I think that my intuition is correct and something odd might be going on.

Thanks for making the wonderful package!

Best,
Emile

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(naniar)
library(tibble)

bad_air_quality <- tibble::tribble(
  ~Ozone, ~Solar.R, ~Wind, ~Temp, ~Month, ~Day,
      NA,      190,   7.4,    67,      5,    1,
      36,       NA,     8,    72,      5,    2,
      12,      149,    NA,    74,      5,    3,
      18,      313,  11.5,    NA,      5,    4,
      NA,       NA,  14.3,    56,     NA,    5,
      28,       NA,  14.9,    66,      5,   NA,
      NA,      190,   7.4,    67,      5,    1,
      36,       NA,     8,    72,      5,    2,
      12,      149,    NA,    74,      5,    3,
      18,      313,  11.5,    NA,      5,    4,
      NA,       NA,  14.3,    56,     NA,    5,
      28,       NA,  14.9,    66,      5,   NA
  )

bad_air_quality %>% 
  naniar::vis_miss()

bad_air_quality %>% 
  summarise(
    n_missing = naniar::n_case_miss(.), 
    prop_missing = naniar::prop_miss_case(.), 
    n_complete = naniar::n_case_complete(.), 
    prop_complete = naniar::prop_complete_case(.)
  )
#> # A tibble: 1 x 4
#>   n_missing prop_missing n_complete prop_complete
#>       <int>        <dbl>      <int>         <dbl>
#> 1        12            0          0             1



naniar::prop_miss_case(bad_air_quality)
#> [1] 0
naniar::pct_miss_case(bad_air_quality)
#> [1] 0

Created on 2019-10-18 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/Los_Angeles         
#>  date     2019-10-18                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [2] CRAN (R 3.6.0)
#>  backports     1.1.5   2019-10-02 [2] CRAN (R 3.6.0)
#>  callr         3.3.2   2019-09-22 [1] CRAN (R 3.6.0)
#>  cli           1.1.0   2019-03-19 [2] CRAN (R 3.6.0)
#>  colorspace    1.4-1   2019-03-18 [2] CRAN (R 3.6.0)
#>  crayon        1.3.4   2017-09-16 [2] CRAN (R 3.6.0)
#>  curl          4.2     2019-09-24 [1] CRAN (R 3.6.0)
#>  desc          1.2.0   2018-05-01 [2] CRAN (R 3.6.0)
#>  devtools      2.2.1   2019-09-24 [2] CRAN (R 3.6.0)
#>  digest        0.6.21  2019-09-20 [1] CRAN (R 3.6.0)
#>  dplyr       * 0.8.3   2019-07-04 [2] CRAN (R 3.6.0)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
#>  evaluate      0.14    2019-05-28 [2] CRAN (R 3.6.0)
#>  fansi         0.4.0   2018-10-05 [2] CRAN (R 3.6.0)
#>  fs            1.3.1   2019-05-06 [2] CRAN (R 3.6.0)
#>  ggplot2       3.2.1   2019-08-10 [2] CRAN (R 3.6.0)
#>  glue          1.3.1   2019-03-12 [2] CRAN (R 3.6.0)
#>  gtable        0.3.0   2019-03-25 [2] CRAN (R 3.6.0)
#>  highr         0.8     2019-03-20 [2] CRAN (R 3.6.0)
#>  htmltools     0.4.0   2019-10-04 [2] CRAN (R 3.6.0)
#>  httr          1.4.1   2019-08-05 [2] CRAN (R 3.6.0)
#>  knitr         1.25    2019-09-18 [2] CRAN (R 3.6.0)
#>  labeling      0.3     2014-08-23 [2] CRAN (R 3.6.0)
#>  lazyeval      0.2.2   2019-03-15 [2] CRAN (R 3.6.0)
#>  lifecycle     0.1.0   2019-08-01 [2] CRAN (R 3.6.0)
#>  magrittr      1.5     2014-11-22 [2] CRAN (R 3.6.0)
#>  memoise       1.1.0   2017-04-21 [2] CRAN (R 3.6.0)
#>  mime          0.7     2019-06-11 [2] CRAN (R 3.6.0)
#>  munsell       0.5.0   2018-06-12 [2] CRAN (R 3.6.0)
#>  naniar      * 0.4.2   2019-02-15 [2] CRAN (R 3.6.0)
#>  pillar        1.4.2   2019-06-29 [2] CRAN (R 3.6.0)
#>  pkgbuild      1.0.6   2019-10-09 [2] CRAN (R 3.6.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
#>  pkgload       1.0.2   2018-10-29 [2] CRAN (R 3.6.0)
#>  prettyunits   1.0.2   2015-07-13 [2] CRAN (R 3.6.0)
#>  processx      3.4.1   2019-07-18 [2] CRAN (R 3.6.0)
#>  ps            1.3.0   2018-12-21 [2] CRAN (R 3.6.0)
#>  purrr         0.3.2   2019-03-15 [2] CRAN (R 3.6.0)
#>  R6            2.4.0   2019-02-14 [2] CRAN (R 3.6.0)
#>  Rcpp          1.0.2   2019-07-25 [1] CRAN (R 3.6.0)
#>  remotes       2.1.0   2019-06-24 [2] CRAN (R 3.6.0)
#>  rlang         0.4.0   2019-06-25 [2] CRAN (R 3.6.0)
#>  rmarkdown     1.16    2019-10-01 [2] CRAN (R 3.6.0)
#>  rprojroot     1.3-2   2018-01-03 [2] CRAN (R 3.6.0)
#>  scales        1.0.0   2018-08-09 [2] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 3.6.0)
#>  stringi       1.4.3   2019-03-12 [2] CRAN (R 3.6.0)
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 3.6.0)
#>  testthat      2.2.1   2019-07-25 [2] CRAN (R 3.6.0)
#>  tibble      * 2.1.3   2019-06-06 [2] CRAN (R 3.6.0)
#>  tidyr         1.0.0   2019-09-11 [2] CRAN (R 3.6.0)
#>  tidyselect    0.2.5   2018-10-11 [2] CRAN (R 3.6.0)
#>  usethis       1.5.1   2019-07-04 [2] CRAN (R 3.6.0)
#>  utf8          1.1.4   2018-05-24 [2] CRAN (R 3.6.0)
#>  vctrs         0.2.0   2019-07-05 [2] CRAN (R 3.6.0)
#>  visdat        0.5.3   2019-02-15 [2] CRAN (R 3.6.0)
#>  withr         2.1.2   2018-03-15 [2] CRAN (R 3.6.0)
#>  xfun          0.10    2019-10-01 [2] CRAN (R 3.6.0)
#>  xml2          1.2.2   2019-08-09 [1] CRAN (R 3.6.0)
#>  yaml          2.2.0   2018-07-25 [2] CRAN (R 3.6.0)
#>  zeallot       0.1.0   2018-01-28 [2] CRAN (R 3.6.0)
#> 
#> [1] /Users/latour/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
@njtierney njtierney added the bug label Oct 19, 2019
@njtierney
Copy link
Owner

Heya @emilelatour !

Thanks for the bug report :)

Can confirm that I get the same bug - I think this is also in #232

I'll try and get this sorted soon, thanks for taking the time to post a great issue.

Cheers,

Nick

@njtierney
Copy link
Owner

Minimal replication (drawing from @earowang)'s #232

library(naniar)
prop_miss_case(data.frame(x = NA))
#> [1] 0
n_case_complete(data.frame(x = NA))
#> [1] 0
prop_complete_case(data.frame(x = NA))
#> [1] 1

Created on 2019-10-19 by the reprex package (v0.3.0)

@njtierney
Copy link
Owner

Current behaviour:

# This tests against
bad_air_quality <- tibble::tribble(
  ~Ozone, ~Solar.R, ~Wind, ~Temp, ~Month, ~Day,
  NA,      190,   7.4,    67,      5,    1,
  36,       NA,     8,    72,      5,    2,
  12,      149,    NA,    74,      5,    3,
  18,      313,  11.5,    NA,      5,    4,
  NA,       NA,  14.3,    56,     NA,    5,
  28,       NA,  14.9,    66,      5,   NA,
  NA,      190,   7.4,    67,      5,    1,
  36,       NA,     8,    72,      5,    2,
  12,      149,    NA,    74,      5,    3,
  18,      313,  11.5,    NA,      5,    4,
  NA,       NA,  14.3,    56,     NA,    5,
  28,       NA,  14.9,    66,      5,   NA
)

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
library(naniar)

bad_air_quality %>%
  summarise(n_missing = n_case_miss(.),
            n_complete = n_case_complete(.),
            prop_missing = prop_miss_case(.),
            prop_complete = prop_complete_case(.))
#> # A tibble: 1 x 4
#>   n_missing n_complete prop_missing prop_complete
#>       <int>      <int>        <dbl>         <dbl>
#> 1        12          0            1             0

Created on 2019-10-21 by the reprex package (v0.3.0)

@emilelatour
Copy link
Author

Awesome!! Thanks @njtierney !! Confirmed that this is fixed with my work data set!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants