New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return list of modes for a multimodal distribution instead of raising a StatisticsError #73142
Comments
return minimum of modes for a multimodal distribution instead of raising a StatisticsError |
What's the justification for this proposed change? Isn't it better to report the fact that there isn't an unambiguous result instead of returning a rather arbitrary one? |
A better choice would be to return a tuple of values (sliced from the Hope that's justifiable... Thanks & Regards Sent from Android On 13-Dec-2016 2:20 PM, "Wolfgang Maier" <report@bugs.python.org> wrote:
|
On Tue, Dec 13, 2016 at 09:35:22AM +0000, Srikanth Anantharam wrote:
The current mode() function is designed for a very basic use-case, where The problem with dealing with multiple modes is that its not easy to data = [1, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9] Assuming the sampling is fair, 8 is clearly the mode; but is it bimodal I have plans for introducing a binning function to collect data into Thanks for the suggestion. |
Please see the updated pull request PR 50, with the changes. Thanks & Regards Sent from Android On 13-Dec-2016 3:26 PM, "Srikanth Anantharam" <report@bugs.python.org>
|
data = [1, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9]
is clearly unimodal with mode 8 data would have been bimodal if 4 repeated exactly the same (7) number of in which case the new patch in PR 50 would return a tuple Thanks & Regards Sent from Android On 13-Dec-2016 3:24 PM, "Steven D'Aprano" <report@bugs.python.org> wrote: Steven D'Aprano added the comment: On Tue, Dec 13, 2016 at 09:35:22AM +0000, Srikanth Anantharam wrote:
The current mode() function is designed for a very basic use-case, where The problem with dealing with multiple modes is that its not easy to data = [1, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9] Assuming the sampling is fair, 8 is clearly the mode; but is it bimodal I have plans for introducing a binning function to collect data into Thanks for the suggestion. ---------- Python tracker <report@bugs.python.org> |
On Tue, Dec 13, 2016 at 10:08:10AM +0000, Srikanth Anantharam wrote:
I'm rejecting that pull request. As I said, mode() intentionally return tuple(value for value, frequency in table) with no way for the caller to tell which values might be a mode and (By the way, even if this function behaviour was acceptible, which I I'm sorry that I have to reject this, I am interested in having better Thanks for your interest. |
On Tue, Dec 13, 2016 at 10:17:21AM +0000, Srikanth Anantharam wrote:
Bimodal distributions do not require both modes to be exactly the same You shouldn't take my example too literally. With such a small sample of |
Srikanth, when you reply by email, please remove the quotation of the previous message. On the web page, it is just noise. The only exception should be when you reply to a specific sentence and need to quote that sentence for context. In my particular experience, mode() is unusally reserved for crudely describing unordered categorical data, where the concept of 'minimum' does not apply. Mode is useful for determining the winner of a vote (or other decision process), but in general, it is not a substitute for a more comprehensive look at a dataset. Problems with possibly returning a tuple of data items instead of a data item include:
>>> mode(((0,0), (0,0), (0,1)))
(0, 0) So, while StatisticsError is a nuisance, so are the apparent alternatives. I think we should leave mode alone and close this. |
What makes the minimum mode better than the maximum? |
Please review the new PR with tests. |
The problem remains that the function can return a number or a list for input that is a list of numbers. This means the user will need to handle both possibilities every time, which is a heavy burden for such a simple function. SciPy's mode function does return the minimum mode when there is a tie, which as far as I can tell is an arbitrary choice. But in that context, since the input is almost always numerical, a minimum is at least well defined, which is not true for an input with a mix of types. For the general use case, the current behavior - raising an exception - in case of tie conveys the most information. |
Yes, the mode function could ALWAYS return a list, but that breaks backward compatibility, as does the currently proposed change. |
See the competing proposal and PR at https://bugs.python.org/issue35892 and #12089 |
I'm closing this issue in favour of Raymond's bpo-35892, thank you to everyone even if your PRs didn't get used, I appreciate your efforts. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: