Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max_unique parameter to all metadata functions #1

Closed
3 tasks done
lhdjung opened this issue Apr 25, 2023 · 0 comments
Closed
3 tasks done

Add max_unique parameter to all metadata functions #1

lhdjung opened this issue Apr 25, 2023 · 0 comments

Comments

@lhdjung
Copy link
Owner

lhdjung commented Apr 25, 2023

Currently, only mode_is_trivial() and mode_count_range() have a max_unique parameter. I think all metadata functions should have one. The remaining ones are:

  • mode_count()
  • mode_frequency()
  • mode_frequency_range()

Why do they need the parameter? Consider x <- c(7, 7, 8, 8, NA). Assuming that the NA is known to be either 7 or 8, i.e., max_unique = "known", the functions currently deal with x the wrong way because they are unsure whether NA represents one of the known values or not. If they knew that it does, they would work differently:

  • mode_count() would be able to conclude that NA breaks the tie so that the mode count is 1.
  • mode_frequency() would see its target statistic rise to 3.
  • mode_frequency_range() would know that the maximal frequency is the actual one and return c(3, 3); just like mode_count_range(x, max_unique = "known") already returns c(1, 1).

However, this is just a corner case: a single value is missing, and all known values are equally frequent. max_unique doesn't matter to these functions otherwise, except for mode_frequency_range() if all known values are equal.

I don't think that max_unique matters at all to mode_first(), mode_all(), and mode_single() – the functions that attempt to find actual modes. The metadata functions don't, which is why they are able to gain any information from vectors like x at all. In other words, mode_all() and friends would fail x anyways, so max_unique wouldn't help them.

I have not thought about mode_possible_min() and mode_possible_max() in this context. Would they benefit from max_unique?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant