Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miss_var_summary should order by most missing by default #163

Closed
njtierney opened this issue May 18, 2018 · 1 comment
Closed

miss_var_summary should order by most missing by default #163

njtierney opened this issue May 18, 2018 · 1 comment

Comments

@njtierney
Copy link
Owner

I think I wrongly assumes that miss_var_summary returns the missings in order because those just so happen to be the first two variables in airquality:

library(naniar)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

miss_var_summary(airquality)
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Wind          0     0               44
#> 4 Temp          0     0               44
#> 5 Month         0     0               44
#> 6 Day           0     0               44

But as we can see - this is not the case:

airquality %>% 
  select(sample(names(.))) %>%
  miss_var_summary()
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Wind          0     0                0
#> 2 Ozone        37    24.2             37
#> 3 Day           0     0               37
#> 4 Month         0     0               37
#> 5 Solar.R       7     4.58            44
#> 6 Temp          0     0               44

Created on 2018-05-18 by the reprex package (v0.2.0).

@njtierney
Copy link
Owner Author

Here is the default behaviour now

library(naniar)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

miss_var_summary(airquality)
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Wind          0     0               44
#> 4 Temp          0     0               44
#> 5 Month         0     0               44
#> 6 Day           0     0               44

airquality %>% 
  select(sample(names(.))) %>%
  miss_var_summary()
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Day           0     0                0
#> 4 Month         0     0                0
#> 5 Wind          0     0                0
#> 6 Temp          0     0                0

Notice that n_miss_cumsum is calculated from the order of variables input.

Created on 2018-05-24 by the reprex package (v0.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant