miss_var_summary should order by most missing by default #163
Comments
Here is the default behaviour now library(naniar)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
miss_var_summary(airquality)
#> # A tibble: 6 x 4
#> variable n_miss pct_miss n_miss_cumsum
#> <chr> <int> <dbl> <int>
#> 1 Ozone 37 24.2 37
#> 2 Solar.R 7 4.58 44
#> 3 Wind 0 0 44
#> 4 Temp 0 0 44
#> 5 Month 0 0 44
#> 6 Day 0 0 44
airquality %>%
select(sample(names(.))) %>%
miss_var_summary()
#> # A tibble: 6 x 4
#> variable n_miss pct_miss n_miss_cumsum
#> <chr> <int> <dbl> <int>
#> 1 Ozone 37 24.2 37
#> 2 Solar.R 7 4.58 44
#> 3 Day 0 0 0
#> 4 Month 0 0 0
#> 5 Wind 0 0 0
#> 6 Temp 0 0 0 Notice that n_miss_cumsum is calculated from the order of variables input. Created on 2018-05-24 by the reprex package (v0.2.0). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think I wrongly assumes that
miss_var_summary
returns the missings in order because those just so happen to be the first two variables in airquality:But as we can see - this is not the case:
Created on 2018-05-18 by the reprex package (v0.2.0).
The text was updated successfully, but these errors were encountered: