-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: stats: quartile coeff dispersion #13475
Conversation
I'd prefer to leave out the restrictions on quantile values, personally. The restriction seems unnecessary and it's simpler in documentation and code to leave out the restriction. If there is a strong reason to protect people from using the 20th and 80th percentiles, please correct me!
It is preferred to return a float than a 0d array when the axis of a 1d array gets consumed. That is the typical behavior of
Do you mean "should I expose them as parameters of this function?" It's OK to leave them out. For the sake of keeping the API simpler, I think it's preferable to leave them out. |
Nice start @YarivLevy81 ! Don't worry about some of the CI failures, as we're having issue in master right now due to the NumPy 1.20 release, but do check out the "scipy.scipy (Lint)" failure (PEP8). |
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
…evy81/scipy into quartile_coeff_dispersion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few other things to think about, but I think after this it will be very close!
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me after the suggestions are implemented (assuming CI passes).
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
|
@rkern I think this PR is in pretty good shape. Would you like to take a look? |
Actually, one thought after looking at the reference linked from Wikipedia - it might be desirable in the future to provide a way of computing a coefficient of dispersion confidence interval. If that's the case, please think about how that might be possible in a backwards-compatible way. One option is to have an optional argument that selects the option to return the confidence interval and, if that's the case, the function returns the confidence interval rather than just the sample statistic. Alternatively, other functions are beginning to return objects that have a method to evaluate the confidence interval if desired. (That might be overkill here, but if it might be desirable, it would be harder to do in a backwards-compatible way if we only return a scalar now.) Of course, there could always be a separate function - but in any case, it's something that deserves a little thought now. Update: email sent to developer mailing list. |
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
@smurfit89 Does this address your proposal on gh-13385? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been taking another look at this today, and I think we need to address the discrepancy in quartile calculation conventions. Currently this function cannot replicate the result given in [1]. I submitted YarivLevy81#1 to begin to address this, but I think we need to take a closer look at the literature and see if there is consensus on how to define the quartiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the change in opinion @YarivLevy81. I didn't notice before that a sample had been added in the data used in tests to make the results match those in Wikipedia.
This actually might change my opinion on whether we should make this quantile_coeff_variation
instead of quartile_coeff_variation
. A quartile_coeff_variation
that uses a more standard convention for calculating quartiles may be more useful than a quantile_coeff_variation
which just uses np.quantile
(and doesn't agree with examples in textbooks, etc.)
Do you have access to textbooks, etc. in which this statistic is defined and examples are given?
The Wikipedia article refers to this article, which refers to a book by Zwillinger and Kokoska . Here is how they define quartiles:
where
Then they define the Coefficient of Quartile Variation:
I'm beginning to think we should follow that and not attempt to generalize to arbitrary quantiles at this time. What do you think?
…ipy into quartile_coeff_dispersion
This absolutely makes sense, I'll work to readjust it. |
@YarivLevy81 were you still interested in completing this? If not, I understand. |
Closing since I haven't heard back, but feel free to reopen if you're still interested! |
Reference issue
Closes #13385
What does this implement/fix?
Added implementation for quartile coefficient of dispersion, as requested in #13385.
Added test cases, documentation like requested in contributing guidelines.
Tried to generalise things as good as possible, and to have good code coverage - It's not a very complicated issue so I hope it's good enough.
Additional information
I wasn't sure about few things:
quantile_coeff_dispersion
with other values expect of 0.25, 0.5, 0.75. What are your thoughts?np.quantile
function to compute the quartiles, this function has more arguments and but I don't necessarily want to use all of them. What would be the best practice here?