miss_var_summary should order by most missing by default #163

njtierney · 2018-05-18T00:29:05Z

I think I wrongly assumes that miss_var_summary returns the missings in order because those just so happen to be the first two variables in airquality:

library(naniar)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

miss_var_summary(airquality)
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Wind          0     0               44
#> 4 Temp          0     0               44
#> 5 Month         0     0               44
#> 6 Day           0     0               44

But as we can see - this is not the case:

airquality %>% 
  select(sample(names(.))) %>%
  miss_var_summary()
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Wind          0     0                0
#> 2 Ozone        37    24.2             37
#> 3 Day           0     0               37
#> 4 Month         0     0               37
#> 5 Solar.R       7     4.58            44
#> 6 Temp          0     0               44

Created on 2018-05-18 by the reprex package (v0.2.0).

The text was updated successfully, but these errors were encountered:

njtierney · 2018-05-24T06:18:27Z

Here is the default behaviour now

library(naniar)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

miss_var_summary(airquality)
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Wind          0     0               44
#> 4 Temp          0     0               44
#> 5 Month         0     0               44
#> 6 Day           0     0               44

airquality %>% 
  select(sample(names(.))) %>%
  miss_var_summary()
#> # A tibble: 6 x 4
#>   variable n_miss pct_miss n_miss_cumsum
#>   <chr>     <int>    <dbl>         <int>
#> 1 Ozone        37    24.2             37
#> 2 Solar.R       7     4.58            44
#> 3 Day           0     0                0
#> 4 Month         0     0                0
#> 5 Wind          0     0                0
#> 6 Temp          0     0                0

Notice that n_miss_cumsum is calculated from the order of variables input.

Created on 2018-05-24 by the reprex package (v0.2.0).

njtierney closed this as completed in 2d6ee79 May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

miss_var_summary should order by most missing by default #163

miss_var_summary should order by most missing by default #163

njtierney commented May 18, 2018

njtierney commented May 24, 2018

miss_var_summary should order by most missing by default #163

miss_var_summary should order by most missing by default #163

Comments

njtierney commented May 18, 2018

njtierney commented May 24, 2018