Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


scoreateprecentile return wrong value #972

isofer opened this Issue · 5 comments

4 participants


scoreatprecentile of a series returns the wrong value

In [1341]: a = np.random.rand(100)

In [1342]: b = pd.Series(a)

In [1343]: a[:10]

array([ 0.6131142 ,  0.65266141,  0.24583156,  0.70179786,  0.33361506,
        0.65042728,  0.70192276,  0.02727854,  0.65948894,  0.44326182])

In [1348]: scoreatpercentile(a,1) 
Out[1348]: 0.010388922650144839 #correct value

In [1349]: scoreatpercentile(b,1) 
Out[1349]: 0.65226593993834392 #incorrect value

In [1350]: scoreatpercentile(a,2)
Out[1350]: 0.011971896338709577 #correct value

In [1351]: scoreatpercentile(b,2)
Out[1351]: 0.25396815348880808 #incorrect value

I'm not sure if this is a pandas issue or scipy issue, and I am aware of the quantile method, but I still wonder if it is possible to fix that.


The problem is the semantics of integer indexes with pandas objects . Either pass b.values to scoreatpercentile or use b.quantile(0.2). I think scipy.stats should be calling np.asarray on the input, you could raise an issue with them about it

@wesm wesm closed this

If you look at the scoreatpercentile code, issue can be spotted quickly.
A sort is done and afterwards indexed. Since b has a int index, the issue here is label <> positional indexing on the Series.

def scoreatpercentile(a, per, limit=()):
    values = np.sort(a,axis=0)
    if limit:
        values = values[(limit[0] <= values) & (values <= limit[1])]

    idx = per /100. * (values.shape[0] - 1)
    if (idx % 1 == 0):
        return values[idx]
        return _interpolate(values[int(idx)], values[int(idx) + 1], idx % 1)

If you give b a non integer index, the issue does not show up.

In [54]: b.index = pandas.util.testing.makeStringIndex(100)

In [55]: stats.scoreatpercentile(b, 1)
Out[55]: 0.063875501677037982

In [56]: stats.scoreatpercentile(a, 1)
Out[56]: 0.063875501677037982

I'll raise the issue in scipy


This is another instance of Series not quite being array-like. scoreatpercentile can't call asarray because it has to deal with array-like matrices and masked arrays. Maybe in the future if these go away (matrix likely isn't). Just thinking out loud, but it might be worth thinking if you really want to preserve the new sorted index for the default integer index in a Series/DataFrame. The again, it might not.


the reply of a scipy developer to this issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.