New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miss_var_summary should order by most missing by default #163

Closed
njtierney opened this Issue May 18, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@njtierney
Owner

njtierney commented May 18, 2018

I think I wrongly assumes that miss_var_summary returns the missings in order because those just so happen to be the first two variables in airquality:

library(naniar)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

miss_var_summary(airquality)
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Wind          0     0               44
#> 4 Temp          0     0               44
#> 5 Month         0     0               44
#> 6 Day           0     0               44

But as we can see - this is not the case:

airquality %>% 
  select(sample(names(.))) %>%
  miss_var_summary()
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Wind          0     0                0
#> 2 Ozone        37    24.2             37
#> 3 Day           0     0               37
#> 4 Month         0     0               37
#> 5 Solar.R       7     4.58            44
#> 6 Temp          0     0               44

Created on 2018-05-18 by the reprex package (v0.2.0).

@njtierney njtierney closed this in 2d6ee79 May 24, 2018

@njtierney

This comment has been minimized.

Owner

njtierney commented May 24, 2018

Here is the default behaviour now

library(naniar)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

miss_var_summary(airquality)
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Wind          0     0               44
#> 4 Temp          0     0               44
#> 5 Month         0     0               44
#> 6 Day           0     0               44

airquality %>% 
  select(sample(names(.))) %>%
  miss_var_summary()
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Day           0     0                0
#> 4 Month         0     0                0
#> 5 Wind          0     0                0
#> 6 Temp          0     0                0

Notice that n_miss_cumsum is calculated from the order of variables input.

Created on 2018-05-24 by the reprex package (v0.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment