ENH: stats: quartile coeff dispersion #13475

YarivLevy81 · 2021-01-31T17:04:21Z

Reference issue

What does this implement/fix?

Added implementation for quartile coefficient of dispersion, as requested in #13385.
Added test cases, documentation like requested in contributing guidelines.
Tried to generalise things as good as possible, and to have good code coverage - It's not a very complicated issue so I hope it's good enough.

Additional information

I wasn't sure about few things:

We could possibly implement it as quantile_coeff_dispersion with other values expect of 0.25, 0.5, 0.75. What are your thoughts?
my function returns a float or an array_like of float, is it problematic? Is it better to just return an array with a single value?
I'm using np.quantile function to compute the quartiles, this function has more arguments and but I don't necessarily want to use all of them. What would be the best practice here?

), added unit test cases, documentation

scipy/stats/__init__.py

scipy/stats/morestats.py

scipy/stats/tests/test_morestats.py

mdhaber · 2021-01-31T21:40:14Z

We could possibly implement it as quantile_coeff_dispersion with other values expect of 0.25, 0.5, 0.75. What are your thoughts?

I'd prefer to leave out the restrictions on quantile values, personally. The restriction seems unnecessary and it's simpler in documentation and code to leave out the restriction. If there is a strong reason to protect people from using the 20th and 80th percentiles, please correct me!

my function returns a float or an array_like of float, is it problematic? Is it better to just return an array with a single value?

It is preferred to return a float than a 0d array when the axis of a 1d array gets consumed. That is the typical behavior of np.mean([1, 2, 3]), for example, unless keepdims=True, and I don't think we have functions with keepdims=True like NumPy does.

I'm using np.quantile function to compute the quartiles, this function has more arguments and but I don't necessarily want to use all of them. What would be the best practice here?

Do you mean "should I expose them as parameters of this function?" It's OK to leave them out. For the sake of keeping the API simpler, I think it's preferable to leave them out.

mdhaber · 2021-01-31T21:47:31Z

Nice start @YarivLevy81 ! Don't worry about some of the CI failures, as we're having issue in master right now due to the NumPy 1.20 release, but do check out the "scipy.scipy (Lint)" failure (PEP8).

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

…evy81/scipy into quartile_coeff_dispersion

scipy/stats/morestats.py

mdhaber

A few other things to think about, but I think after this it will be very close!

scipy/stats/morestats.py

scipy/stats/tests/test_morestats.py

scipy/stats/morestats.py

scipy/stats/tests/test_morestats.py

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

scipy/stats/tests/test_morestats.py

scipy/stats/morestats.py

scipy/stats/tests/test_morestats.py

scipy/stats/morestats.py

mdhaber

Looks good to me after the suggestions are implemented (assuming CI passes).

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

YarivLevy81 · 2021-02-09T07:23:13Z

Looks good to me after the suggestions are implemented (assuming CI passes).
All checks pass.

scipy/stats/tests/test_morestats.py

mdhaber · 2021-02-09T19:41:21Z

@rkern I think this PR is in pretty good shape. Would you like to take a look?

mdhaber · 2021-02-09T20:41:37Z

Actually, one thought after looking at the reference linked from Wikipedia - it might be desirable in the future to provide a way of computing a coefficient of dispersion confidence interval. If that's the case, please think about how that might be possible in a backwards-compatible way. One option is to have an optional argument that selects the option to return the confidence interval and, if that's the case, the function returns the confidence interval rather than just the sample statistic. Alternatively, other functions are beginning to return objects that have a method to evaluate the confidence interval if desired. (That might be overkill here, but if it might be desirable, it would be harder to do in a backwards-compatible way if we only return a scalar now.) Of course, there could always be a separate function - but in any case, it's something that deserves a little thought now.

Update: email sent to developer mailing list.

scipy/stats/morestats.py

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

mdhaber · 2021-02-18T20:24:01Z

@smurfit89 Does this address your proposal on gh-13385?

mdhaber

I've been taking another look at this today, and I think we need to address the discrepancy in quartile calculation conventions. Currently this function cannot replicate the result given in [1]. I submitted YarivLevy81#1 to begin to address this, but I think we need to take a closer look at the literature and see if there is consensus on how to define the quartiles.

mdhaber

Sorry for the change in opinion @YarivLevy81. I didn't notice before that a sample had been added in the data used in tests to make the results match those in Wikipedia.

This actually might change my opinion on whether we should make this quantile_coeff_variation instead of quartile_coeff_variation. A quartile_coeff_variation that uses a more standard convention for calculating quartiles may be more useful than a quantile_coeff_variation which just uses np.quantile (and doesn't agree with examples in textbooks, etc.)

Do you have access to textbooks, etc. in which this statistic is defined and examples are given?

The Wikipedia article refers to this article, which refers to a book by Zwillinger and Kokoska . Here is how they define quartiles:

where

Then they define the Coefficient of Quartile Variation:

I'm beginning to think we should follow that and not attempt to generalize to arbitrary quantiles at this time. What do you think?

…ispersion

…ipy into quartile_coeff_dispersion

YarivLevy81 · 2021-04-22T19:19:18Z

Sorry for the change in opinion @YarivLevy81. I didn't notice before that a sample had been added in the data used in tests to make the results match those in Wikipedia.

This actually might change my opinion on whether we should make this quantile_coeff_variation instead of quartile_coeff_variation. A quartile_coeff_variation that uses a more standard convention for calculating quartiles may be more useful than a quantile_coeff_variation which just uses np.quantile (and doesn't agree with examples in textbooks, etc.)

Do you have access to textbooks, etc. in which this statistic is defined and examples are given?

The Wikipedia article refers to this article, which refers to a book by Zwillinger and Kokoska . Here is how they define quartiles:

where

Then they define the Coefficient of Quartile Variation:

I'm beginning to think we should follow that and not attempt to generalize to arbitrary quantiles at this time. What do you think?

This absolutely makes sense, I'll work to readjust it.

mdhaber · 2022-02-19T10:05:45Z

@YarivLevy81 were you still interested in completing this? If not, I understand.

mdhaber · 2022-02-27T08:10:03Z

Closing since I haven't heard back, but feel free to reopen if you're still interested!

Yariv Levy added 4 commits January 31, 2021 17:58

Added implementation for quartile coefficient of dispersion (scipy#13385

6558646

), added unit test cases, documentation

DOC: Fixed sphinx issues with description and return value

b09f0e8

STY: stats: PEP8 line breaks

d1fa266

DOC: Added .. versionadded:: 1.7.0

64a2c69

YarivLevy81 changed the title ~~ENH: stats: Quartile coeff dispersion (#13385)~~ ENH: stats: Quartile coeff dispersion Jan 31, 2021

YarivLevy81 changed the title ~~ENH: stats: Quartile coeff dispersion~~ ENH: stats: quartile coeff dispersion Jan 31, 2021

TST: stats: tests failing for versionadded commit

14a4e5d

rgommers added enhancement A new feature or improvement scipy.stats labels Jan 31, 2021

mdhaber reviewed Jan 31, 2021

View reviewed changes

scipy/stats/__init__.py Outdated Show resolved Hide resolved

mdhaber reviewed Jan 31, 2021

View reviewed changes

scipy/stats/morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jan 31, 2021

View reviewed changes

Yariv Levy and others added 6 commits February 1, 2021 17:23

STY: flake8 line breaks, default values

5739983

DOC: return value description

0cd8485

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

MAINT: check shape instead of len(), remove redundant inner function

fabfc10

Merge branch 'quartile_coeff_dispersion' of https://github.com/YarivL…

21640a3

…evy81/scipy into quartile_coeff_dispersion

TST: unrestrict quartiles to 0 < q < 1

fe04815

TST: implemented assert_raises tests with context manager

224a999

mdhaber reviewed Feb 1, 2021

View reviewed changes

scipy/stats/morestats.py Outdated Show resolved Hide resolved

Yariv Levy added 4 commits February 2, 2021 17:50

DOC: changing Q3, Q1 to Q_high and Q_low

b5f1a26

DOC: default values for 'interpolation', 'axis'

e05f7f8

ENH: enforcing q[0] < q[1] + documentation

2fb06ac

DOC: move to summary statistics section

c66e1c5

mdhaber requested changes Feb 4, 2021

View reviewed changes

mdhaber mentioned this pull request Feb 4, 2021

A Solid Foundation for Statistics in Python with SciPy mdhaber/scipy#26

Closed

Yariv Levy and others added 3 commits February 6, 2021 12:07

STY: familiar name for first parameter

260b6ae

DOC: default value of axis param

83752c4

DOC: unnecessary indentation in interpolation doc

7b2a560

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

mdhaber reviewed Feb 7, 2021

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Feb 7, 2021

View reviewed changes

scipy/stats/morestats.py Show resolved Hide resolved

mdhaber reviewed Feb 7, 2021

View reviewed changes

scipy/stats/tests/test_morestats.py Show resolved Hide resolved

mdhaber reviewed Feb 7, 2021

View reviewed changes

scipy/stats/morestats.py Show resolved Hide resolved

mdhaber approved these changes Feb 7, 2021

View reviewed changes

YarivLevy81 and others added 2 commits February 8, 2021 21:13

STY: formatting of documentation, additional tets case for interpolation

88dfe00

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

TST: Fixed value of 2nd test in test_other_interpolation

5361ec5

mdhaber reviewed Feb 9, 2021

View reviewed changes

scipy/stats/tests/test_morestats.py Show resolved Hide resolved

mdhaber requested a review from rkern February 9, 2021 19:41

mdhaber reviewed Feb 17, 2021

View reviewed changes

scipy/stats/morestats.py Outdated Show resolved Hide resolved

DOC: reformatting quartile-quantile documentation

0f8e518

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

This was referenced Mar 10, 2021

Modified z-score #13548

Closed

robust_variation aka robust version of coeffient of variation for scipy.stats #13385

Open

Merge branch 'master' into quartile_coeff_dispersion

86d84bf

mdhaber closed this Apr 1, 2021

mdhaber reopened this Apr 1, 2021

mdhaber mentioned this pull request Apr 3, 2021

DOC: stats: quartile_coeff_dispersion -> quantile_coeff_dispersion; address discrepancy w/ reference [1] YarivLevy81/scipy#1

Closed

mdhaber reviewed Apr 3, 2021

View reviewed changes

mdhaber requested changes Apr 3, 2021

View reviewed changes

Yariv Levy added 2 commits April 22, 2021 22:16

Merge branch 'master' of github.com:scipy/scipy into quartile_coeff_d…

b47e2de

…ispersion

Merge branch 'quartile_coeff_dispersion' of github.com:YarivLevy81/sc…

30398a7

…ipy into quartile_coeff_dispersion

mdhaber mentioned this pull request Aug 19, 2021

ENH: add d2 statistical function #14604

Closed

mdhaber closed this Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: stats: quartile coeff dispersion #13475

ENH: stats: quartile coeff dispersion #13475

YarivLevy81 commented Jan 31, 2021

mdhaber commented Jan 31, 2021 •

edited

Loading

mdhaber commented Jan 31, 2021 •

edited

Loading

mdhaber left a comment

mdhaber left a comment •

edited

Loading

YarivLevy81 commented Feb 9, 2021

mdhaber commented Feb 9, 2021

mdhaber commented Feb 9, 2021 •

edited

Loading

mdhaber commented Feb 18, 2021

mdhaber left a comment

mdhaber left a comment •

edited

Loading

YarivLevy81 commented Apr 22, 2021

mdhaber commented Feb 19, 2022 •

edited

Loading

mdhaber commented Feb 27, 2022

ENH: stats: quartile coeff dispersion #13475

ENH: stats: quartile coeff dispersion #13475

Conversation

YarivLevy81 commented Jan 31, 2021

Reference issue

What does this implement/fix?

Additional information

mdhaber commented Jan 31, 2021 • edited Loading

mdhaber commented Jan 31, 2021 • edited Loading

mdhaber left a comment

Choose a reason for hiding this comment

mdhaber left a comment • edited Loading

Choose a reason for hiding this comment

YarivLevy81 commented Feb 9, 2021

mdhaber commented Feb 9, 2021

mdhaber commented Feb 9, 2021 • edited Loading

mdhaber commented Feb 18, 2021

mdhaber left a comment

Choose a reason for hiding this comment

mdhaber left a comment • edited Loading

Choose a reason for hiding this comment

YarivLevy81 commented Apr 22, 2021

mdhaber commented Feb 19, 2022 • edited Loading

mdhaber commented Feb 27, 2022

mdhaber commented Jan 31, 2021 •

edited

Loading

mdhaber commented Jan 31, 2021 •

edited

Loading

mdhaber left a comment •

edited

Loading

mdhaber commented Feb 9, 2021 •

edited

Loading

mdhaber left a comment •

edited

Loading

mdhaber commented Feb 19, 2022 •

edited

Loading