The moder package determines single or multiple modes (most frequent
values). By default, its mode_
functions check whether missing values
make this impossible, and return NA
in this case. They have no
dependencies.
Mode functions fill a gap in measures of central tendency in R. mean()
and median()
are built into the standard library, but there is a lack
of properly NA
-sensitive functions for calculating the mode. Use moder
for this!
You can install moder like so:
install.packages("moder")
library(moder)
Everything is fine here:
mode_first(c(7, 8, 8, 9, 9, 9))
#> [1] 9
But what if some values are missing? Maybe there are so many missings
that it’s impossible to tell which value is the most frequent one. If
both NA
s below are secretly 2
, then 2
is the (first) mode.
Otherwise, 1
is. The mode is unclear, so the function returns NA
:
mode_first(c(1, 1, 2, NA, NA))
#> [1] NA
Ignore NA
s using na.rm = TRUE
if there is a strong rationale for it:
mode_first(c(1, 1, 2, NA, NA), na.rm = TRUE)
#> [1] 1
The next example is different. Even if the NA
stands in for 8
, there
will only be three instances of 8
but four instances of 7
. The mode
is 7
, independent of the true value behind NA
.
mode_first(c(7, 7, 7, 7, 8, 8, NA))
#> [1] 7
This function captures multiple modes:
mode_all(c("a", "a", "b", "b", "c", "d", "e"))
#> [1] "a" "b"
If some values are missing but there would be multiple modes when
ignoring NA
s, mode_all()
returns NA
. That’s because missings can
easily create an imbalance between the equally-frequent known values:
mode_all(c(1, 1, 2, 2, NA))
#> [1] NA
If NA
masks either 1
or 2
, that number is the (single) mode. As
before, if the mode depends on missing values, the function returns
NA
.
Yet na.rm = TRUE
makes the function ignore this:
mode_all(c(1, 1, 2, 2, NA), na.rm = TRUE)
#> [1] 1 2
mode_single()
is stricter than mode_first()
: It returns NA
by
default if there are multiple modes. Otherwise, it works the same way.
mode_single(c(3, 4, 4, 5, 5, 5))
#> [1] 5
mode_single(c("x", "x", "y", "y", "z"))
#> [1] NA
These minimal and maximal sets of modes are possible given the missing value:
mode_possible_min(c("a", "a", "a", "b", "b", "c", NA))
#> [1] "a"
mode_possible_max(c("a", "a", "a", "b", "b", "c", NA))
#> [1] "a" "b"
Ken Williams’ mode functions on Stack Overflow were pivotal to moder.