Fix stats.scoreatpercentile #3186

Merged
merged 4 commits into from Jan 31, 2014

Projects

None yet

5 participants

Owner
rgommers commented Jan 4, 2014

Fix ndarray input for per parameter and empty input. Closes gh-1897 and gh-2861.

Coverage Status

Coverage remained the same when pulling 07bd6eb on rgommers:fix-scoreatpercentile into fd99d3f on scipy:master.

@josef-pkt josef-pkt and 1 other commented on an outdated diff Jan 4, 2014
scipy/stats/tests/test_stats.py
- assert_equal(stats.scoreatpercentile(x, [0, 100, 50]), [0, 3.5, 1.75])
+ expected = np.array([0, 3.5, 1.75])
+ res = stats.scoreatpercentile(x, [0, 100, 50])
+ assert_equal(res, expected)
+ assert_(isinstance(res, np.ndarray))
+ # Test with ndarray. Regression test for gh-2861
+ assert_equal(stats.scoreatpercentile(x, np.array([0, 100, 50])),
+ expected)
+ # Also test combination of 2-D array, axis not None and array-like per
+ res2 = stats.scoreatpercentile(np.arange(12).reshape((3,4)),
+ np.array([0, 1, 100, 100]), axis=1)
+ expected2 = array([[0, 4, 8],
+ [0.03, 4.03, 8.03],
+ [3, 7, 11],
+ [3, 7, 11]])
+ assert_equal(res2, expected2)
josef-pkt
josef-pkt Jan 4, 2014 Member

I'm not sure if the _equal is always guaranteed.
Floating point calculations could still introduce a noise, but maybe not in this case (1. * x + 0. * y == x) ? IIUC

rgommers
rgommers Jan 4, 2014 Owner

Oops, copying test case without thinking. Will change to assert_allclose.

@josef-pkt josef-pkt commented on the diff Jan 4, 2014
scipy/stats/stats.py
if limit:
a = a[(limit[0] <= a) & (a <= limit[1])]
- if per == 0:
josef-pkt
josef-pkt Jan 4, 2014 Member

maybe we could keep the shortcut if per is scalar,
I don't know how much it's used

rgommers
rgommers Jan 4, 2014 Owner

I thought about that, but I don't like that kind of "performance optimization" for corner cases in this kind of function (since they're almost never performance critical).

josef-pkt
josef-pkt Jan 4, 2014 Member

It avoids a full sort which is not so cheap. I think statsmodels is likely using it in a loop.
However, statsmodels now requires a numpy version that has percentile and we are slowly switching to it.

rgommers
rgommers Jan 4, 2014 Owner

In a loop with per=100 or per=0? Why not directly write data.min/max() then?

Member

looks good to me

Coverage Status

Coverage remained the same when pulling 7e72d97 on rgommers:fix-scoreatpercentile into fd99d3f on scipy:master.

Contributor

scoreatpercentile should be implementable in terms of numpy.percentile in numpy 1.9. That supports all the usual numpy features including extended axis.

Owner
rgommers commented Jan 9, 2014

@juliantaylor that should be done as soon as numpy 1.9 is the minimum supported version I think. I don't see a reason to now do

if np.__version__ < '1.9':
    ....
else:
    # use np.percentile
    ....
Contributor

its significantly faster as it uses partition instead of sort. but it can still wait a while 1.9 isn't even out yet :)

Member

My general opinion is that we should point users to numpy.percentile as soon as 1.9 is out and tell them it's much faster, and deprecate this function as soon as possible.

(I haven't checked whether there could be anything specific "stats" that would make it worth keeping. maybe nan handling or masked arrays?)

Owner
rgommers commented Jan 9, 2014

Point out that users should use np.percentile as soon as 1.9 is out: definitely. Deprecating this function: only once we require >=1.9. That'll take a few years probably.

Member
ev-br commented Jan 9, 2014

What's wrong with wrapping it with a deprecationwarning as soon as 1.9 is out? (given that DeprecationWarning is off by default with py2.7 anyway)

Owner
rgommers commented Jan 9, 2014

We should only generate DeprecationWarnings for things that are fixable by users once they see them. And many of those users can't upgrade numpy easily (or at all, if their numpy is installed by their admin, or they ship an older numpy version with some product, or ...).

In this case there's also no hurry, given that scoreatpercentile is a perfectly fine function.

Member

I don't think we need to add a DeprecationWarning already, but we can mention it in the docstring so users/developers can start to switch, and benefit from the faster numpy version if they have it, and don't have to change all their code when we actually deprecate.

"This function is dooomed."

Owner
rgommers commented Jan 9, 2014

Agreed, will do.

Owner

docstring updated

Owner

Can we merge this one now?

@ev-br ev-br merged commit 578106c into scipy:master Jan 31, 2014
Member
ev-br commented Jan 31, 2014

Thanks, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment