# public numpy /numpy

### Subversion checkout URL

You can clone with HTTPS or Subversion.

# Enhancement to percentile function.#2970

Closed
wants to merge 8 commits into from
 +182 67

### 6 participants

This PR adds the `limit` and `interpolation` parameters to the percentile function in NumPy. This harmonizes the features of NumPy's percentile function and SciPy's stats.scoreatpercentile function which was briefly discussed on the PR to introduce similar features into Scipy (scipy/scipy#374)

 jjhelmus `Enhancement to percentile function.` `e5d0518`
referenced this pull request in scipy/scipy Merged

### Pull Request #374: ENH: stats.scoreatpercentile

``` numpy/lib/function_base.py ```
 `@@ -2994,7 +2994,9 @@ def median(a, axis=None, out=None, overwrite_input=False):` 2994 2994 ` # and check, use out array.` 2995 2995 ` return mean(sorted[indexer], axis=axis, out=out)` 2996 2996 ` ` 2997 `-def percentile(a, q, axis=None, out=None, overwrite_input=False):` 2997 `+` 2998 `+def percentile(a, q, limit=(), interpolation_method='fraction', axis=None,`
 1 njsmith added a note February 07, 2013 Owner `limit=None` would be clearer I think. Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
commented on the diff
``` numpy/lib/function_base.py ```
 `@@ -3006,6 +3008,16 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3006 3008 ` Input array or object that can be converted to an array.` 3007 3009 ` q : float in range of [0,100] (or sequence of floats)` 3008 3010 ` Percentile to compute which must be between 0 and 100 inclusive.` 3011 `+ limit : tuple, optional` 3012 `+ Tuple of two scalars, the lower and upper limits within which to` 3013 `+ compute the percentile.`
 1 njsmith added a note February 07, 2013 Owner I have read this docstring and I still have no idea what `limit` does... Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
``` numpy/lib/function_base.py ```
 `@@ -3006,6 +3008,16 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3006 3008 ` Input array or object that can be converted to an array.` 3007 3009 ` q : float in range of [0,100] (or sequence of floats)` 3008 3010 ` Percentile to compute which must be between 0 and 100 inclusive.` 3011 `+ limit : tuple, optional` 3012 `+ Tuple of two scalars, the lower and upper limits within which to` 3013 `+ compute the percentile.` 3014 `+ interpolation : {'fraction', 'lower', 'higher'}, optional` 3015 `+ This optional parameter specifies the interpolation method to use,` 3016 `+ when the desired quantile lies between two data points `i` and `j`:` 3017 `+ - fraction: `i + (j - i)*fraction`, where `fraction` is the` 3018 `+ fractional part of the index surrounded by `i` and `j`.`
 2 njsmith added a note February 07, 2013 Owner I think a clearer name for this would be `interpolation="linear"` -- has scipy nailed this down already, though? jjhelmus added a note February 08, 2013 The interpolation_method keyword and syntax was added to SciPy with commit scipy/scipy@9650e63 about a year ago, I think it would be acceptable to change this. Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
``` numpy/lib/function_base.py ```
 `@@ -3074,6 +3086,9 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3074 3086 ` """` 3075 3087 ` a = np.asarray(a)` 3076 3088 ` ` 3089 `+ if limit:` 3090 `+ a = a[(limit[0] <= a) & (a <= limit[1])]` 3091 `+`
 1 njsmith added a note February 07, 2013 Owner This code does tell me what `limit` does, but I still have no intuitive sense of why this belongs in a percentile function. Can you give an example that shows why this is a natural thing people will want in their percentile function? Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
 jjhelmus `Syntax changes to percentile function.` `a44a251`

Added a test and additional documentation to explain limit parameter. It performs filtering of `a` prior to calculation the percentile. A masked array does not accomplish the same thing:

```In [7]: x = np.arange(10)

In [8]: np.percentile(x, 50, limit=(2, 5))
Out[8]: 3.5

In [9]: np.percentile([2, 3, 4, 5], 50)
Out[9]: 3.5

In [10]: np.percentile(np.ma.masked_outside(x, 2, 5), 50)
Out[10]: 4.5
```

Also, is this the expected/desired behavior for a masked array?

``` numpy/lib/function_base.py ```
 `@@ -3006,6 +3008,17 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3006 3008 ` Input array or object that can be converted to an array.` 3007 3009 ` q : float in range of [0,100] (or sequence of floats)` 3008 3010 ` Percentile to compute which must be between 0 and 100 inclusive.` 3011 `+ limit : tuple, optional` 3012 `+ Tuple of two scalars, the lower and upper limits within which to` 3013 `+ compute the percentile. Values outside of this range are ommitted from` 3014 `+ the percentile calculation. None includes all values in calculation. ` 3015 `+ interpolation : {'linear', 'lower', 'higher'}, optional` 3016 `+ This optional parameter specifies the interpolation method to use,` 3017 `+ when the desired quantile lies between two data points `i` and `j`:` 3018 `+ - linear: `i + (j - i) * fraction`, where `fraction` is the`
 1 seberg added a note February 10, 2013 Collaborator Don't know sphinx well, but I think the list may need to be indented (i.e. with two spaces) and probably `fractional` aligned with `linear`. Also it may need a blank line before the list? Maybe compare with `can_cast` or such. Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
``` numpy/lib/function_base.py ```
 `@@ -3111,16 +3128,29 @@ def _compute_qth_percentile(sorted, q, axis, out):` 3111 3128 ` indexer = [slice(None)] * sorted.ndim` 3112 3129 ` Nx = sorted.shape[axis]` 3113 3130 ` index = q*(Nx-1)` 3131 `+` 3132 `+ if int(index) != index:` 3133 `+ # round fractional indices according to interpolation method` 3134 `+ if interpolation == 'lower':` 3135 `+ index = np.floor(index)` 3136 `+ elif interpolation == 'higher':` 3137 `+ index = np.ceil(index)` 3138 `+ elif interpolation == 'linear':` 3139 `+ pass # keep index as fraction and interpolate` 3140 `+ else:` 3141 `+ raise ValueError("interpolation_method can only be 'linear', "`
commented Collaborator

Looks good, just wondering if something like `interpolation='mid_point'` or such should be added, just because that is what R and matlab use? Wondering a bit why the function is not properly vectorized along `q`, it seems that would be just a matter of using fancy indexing instead of slices (I admit it forces a copy, but that should hardly matter).

To achieve that, reshaping the input `q = np.asarray(q); q.shape += (1,)` and then reducing along `axis+q.ndim-1` (since you may add new ones) should do it I think, plus it works for any `q` not just 1-d ones.

I don't think one can expect these kind of function to work for masked arrays. That would require a way a canonical way of getting the (number of) valid entries for such an object.

 jjhelmus `fixed percentile docs` `d627ba6` jjhelmus `added midpoint interpolation to percentile` `c55b583`

@seberg I'm not seeing the benefit to vectorize along `q`. The indexer needs to be a list of slice objects so either we do a list comprehension over `q`, or we have a comprehension to build up the indexer. Can you expand on the method you had in mind?

commented Collaborator

I think this is separate from this anyway (even if you are interested, it may be better in a different PR). The names and that limit argument are more interesting to the discussion here :).

I did not check it thoroughly, but what I meant was that you do not need to use slices, but can also use a fancy index. The slice is just `slice(n, n+1)` or `slice(n, n+2)`. That is (but for a copy) equivalent to `np.array([n, n+1], dtype=np.intp)` or `np.array([n], dtype=np.intp)`. But if you write it using such an array, you should be able to then add more dimensions to it to support arbitrary input shapes (and what I wrote on `q` would be how to get there).

Yes, I'm seeing how it would be possible now. But I think that would be a different PR

commented Collaborator

Sorry, that screwed up there, all of it... midpoint, if anything, would mean that the knots are mid point, it is still linear interpolation. I think I got fooled by some (probably also wrong) memory about boxplots. But the definition as is is wrong. R has a lot of other options of course, but if anyone thinks they matter to numpy, then interpolation might be a bad name, but I don't know this stuff well enough anyway...

(aside to vectorized
http://mail.scipy.org/pipermail/scipy-user/2012-December/033904.html
handles only 1d or 2d IIRC, would need to be checked if it's really faster if we have only a few q.
I am not working on this)

commented Collaborator

I vectorized it (based on the state in this PR) at: https://gist.github.com/seberg/4966984

Do not want to spam with a graphic, but for negligible size of the other array (avoiding sorting cost), it is a bit more then a factor of two slower for a single value. For a sequence (list), the crossover point is at 3-4 items after which it gets vastly faster. At least the old scipy 0.10.1's percentile is much faster then numpy's. The crossover with it + list comprehension is at about 7-8 items. So it does seem worth it, unless it is very typical to require only a single value.

to the vectorized solution (looks nice): statsmodels currently uses it mainly for 2 values of q (for interquartile range), but if we know there is an efficient version for a large number of q, then we will start to use it at other places.

 jjhelmus `q-vectorized percentile function` `Based on seberg gist (https://gist.github.com/seberg/4966984)` `83e1524`

Adopted the vectorized version based on seberg's version. Had to add a check for values in i > a.shape[0] which occur when 100 is in q.

closed this
commented Owner

ping travisbot.

@seberg, do you think this is ready?

reopened this
commented Collaborator

Oh, forgot about this... I would say pretty much yes (but did not go over it again carefully right now).

What bothers me a bit is that limit argument, but I am fine with adding it for consistency with scipy (I don't quite see the use to be honest). In its current form it does however not support support vectorization of the haystack (`axis != None`) and it maybe bugs for `axis=0`. If that is caught (or checked that it gives an error already), I am happy though.

commented Collaborator

Ah, one other thing. I guess this should probably be mentioned in the release notes, since the output changes from list to array?

``` numpy/lib/function_base.py ```
 `((53 lines not shown))` 3117 3132 ` weights = array(1)` 3118 3133 ` sumval = 1.0` 3119 3134 ` else:` 3120 `- indexer[axis] = slice(i, i+2)` 3121 `- j = i + 1` 3122 `- weights = array([(j - index), (index - i)],float)` 3123 `- wshape = [1]*sorted.ndim` 3124 `- wshape[axis] = 2` 3125 `- weights.shape = wshape` 3126 `- sumval = weights.sum()` 3127 `-` 3135 `+ i = index.astype(np.intp) + arange(2)` 3136 `+ indexer = (i, Ellipsis)` 3137 `+ weights = index - i[...,::-1]` 3138 `+ weights[..., 0] *= -1` 3139 `+ weights.shape = weights.shape + (1,) * (sorted.ndim - 1)` 3140 `+ sumval = weights.sum(i.ndim-1) # numerical accuracy reasons?` 3141 `+ i[np.where(i > (len(sorted) - 1))] = len(sorted) - 1`
commented Collaborator

:(, there is a small issue with the vectorization. I think this is the right way to vectorize it, but it is not compatible with some ol cases:

``````In [2]: np.percentile([[1,2], [3,4]], [1,2,3,4], axis=0)
Out[2]:
[array([ 1.02,  2.02]),
array([ 1.04,  2.04]),
array([ 1.06,  2.06]),
array([ 1.08,  2.08])]
``````

while the vectorized version would put the haystack vectorization first (which feels more correct to me). For starters I guess one could swap the order to keep better compatibility, but wondering if its worth it to try to change it for the future. (I doubt many even vectorize the haystack and needle at the same time right now, but just switching the axes order silently is certainly wrong)

commented Collaborator

Nvm that, I guess the vectorization is compatible, even if I am not sure if maybe we should try to switch it around in the long run.

 jjhelmus `removed extraneous np.where function` `152fba2` jjhelmus `added release not about array output of np.percentile` `fc3dd2e`

I removed the extra np.where and documented the output type change in the release docs. I'm not sure what should be done with the haystack and needle vectorization.

commented Collaborator

Not sure either, I think the current way is compatible, but it feels wrong to me, I certainly would expect the haystack vectorization to come first. I doubt it really affects many (since it only applies if both are vectorized), but maybe we should have a transition period, possible with a temporary keyword argument like histogram? If you still have patience, I think this should go to the mailing list.

Sent an email to the numpy list asking for input on this PR

commented Owner

@jjhelmus @seberg I didn't catch the email. What is the status of this?

commented Collaborator

To be honest, not sure, I guess we could just decide on something more or less @josef-pkt was the only one who had an opinion. He also suggested the plausible thing to insert the quantile's dimensions at the axis. Isn't there any function that already has such vectorization? np.searchsorted unfortunately doesn't ;).

To summarize the discussion on the email list with @josef-pkt and @seberg the question is what should be the shape of the output of the following:

``````a = np.arange(24).reshape(2,3,4)
q = np.array([15., 30., 45., 60., 75.])
np.percentile(a, q, axis=-1)
``````

`a`, the haystack, has a shape of (2, 3, 4), `q`, the needle, has a shape of (5,).

The options are:

(2, 3, 5) : haystack, `a`, dimension first.
(5, 2, 3) : needle, `q`, dimension first.
Third option : insert the `q` dimension at `axis`. This option was not liked.

The haystack dimension first version seems to be the most logical, and has the function acting like a reduceat, which might be considered the most "numpythonic".

The needle dimension first version is what is currently implemented. It allows for unrolling the percentiles, @josef-pkt liked this one the best.

I can see both ways, and if we can find a function that has similar vectorization I say we go with that.

The closest function I could find was np.take. It uses the third option, the indices are inserted at the axis dimension. Is it reasonable to think of np.percentile as performing "percentile" indexing along the given dimension?

commented Collaborator

That is true, that makes sense, somehow missed that example. We may need a kwarg to be able to warn about a default change and allow the old method of q's dimension coming first?

commented Owner
commented Collaborator

Yeah, but if axis=-1 inserting at the axis does too. Also I guess it simply isn't vectorized like generalized ufuncs (normal broadcasting), since like in np.take there is no broadcasting going and instead it is "applied along the given axis". The new np.linalg.solve does something a little like it, but it is probably too complex anyway:

``````def percentile(arr, q):
# axis = -1 of arr is eaten away, the other dimensions can broadcast

# The core of the ufunc (result) are those dimensions that were not broadcasted

``````

I.e. the gufuncs signature would change be `(Ellipsis1, N), (Ellipsis1, Ellipsis) -> (Ellipsis)`. Where the Ellipsis1 could not expand the first operand (but it can expand the second operand, then core_shape = ()).

 jjhelmus `np.percentile now inserts new axis at axis parameter` `1607c2f`

Percentile is now inserting the `q` dimension at `axis`. This does change the behaviour of percentile rather dramatically, namely an array is always returned.

One question is if we should support two dimension+ q arrays? As written they are not supported, but no Error is raised if they are passed. I might be possible to add support, weights_shape would need to be set to the shape of `a` with the shape of `q` replacing the `axis` dimension.

commented Collaborator

the percentile function is related to my attempt at getting a partitioning/selection function into numpy, see
http://mail.scipy.org/pipermail/numpy-discussion/2013-May/066645.html
the code is here
https://github.com/juliantaylor/numpy/tree/select-median

percentile is much faster using iterative partitions than sorting once, even if you want many percentiles.

``````prev = 0
for p in percentiles:
idx = p - prev
r.append(data[prev:].partition(idx)[idx])
prev = idx + 1
``````

If the functionality is deemed worthwhile this branch should probably adapted and merged after the partition merge.

referenced this pull request in statsmodels/statsmodels Open

### Issue #596: add interquartile range, or scoreatpercentiles to statsmodels

I have a version of percentile which works with partition, https://github.com/jjhelmus/numpy/tree/percentile_with_partition. The version is messy (treat it like a private repo, I'll be rebasing), I'll clean it up when the partition PR is accepted.

referenced this pull request Merged

### Pull Request #3658: ENH: percentile function with additional parameters and vecorization

I rewrote the percentile function to use the new percentile functionality in PR #3658. It was easier to start from a clean fork than working off this branch.

commented Owner

@jjhelmus This should be closed then?

@charris Yes, I'll close this PR. New discussion should begin in PR #3658.

closed this
referenced this pull request Merged

### Pull Request #3769: BUG: ensure percentile has same output structure as in 1.8

Showing 8 unique commits by 1 author.

Feb 07, 2013
`Enhancement to percentile function.` `e5d0518`
Feb 08, 2013
`Syntax changes to percentile function.` `a44a251`
Feb 15, 2013
`fixed percentile docs` `d627ba6`
`added midpoint interpolation to percentile` `c55b583`
Feb 19, 2013
`q-vectorized percentile function`
`Based on seberg gist (https://gist.github.com/seberg/4966984)`
`83e1524`
Apr 16, 2013
`removed extraneous np.where function` `152fba2`
`added release not about array output of np.percentile` `fc3dd2e`
May 13, 2013
`np.percentile now inserts new axis at axis parameter` `1607c2f`
 `@@ -22,6 +22,8 @@ Compatibility notes` 22 22 ` numpy.diag, np.diagonal, and the diagonal method of ndarrays return a view` 23 23 ` onto the original array, instead of producing a copy.` 24 24 ` ` 25 `+numpy.percentile returns an array instead of a list.` 26 `+` 25 27 ` selecting multiple fields out of an array also produces a view.` 26 28 ` ` 27 29 ` The hash function of numpy.void scalars has been changed. Previously the`
 `@@ -15,13 +15,14 @@` 15 15 ` import numpy.core.numeric as _nx` 16 16 ` from numpy.core import linspace` 17 17 ` from numpy.core.numeric import ones, zeros, arange, concatenate, array, \` 18 `- asarray, asanyarray, empty, empty_like, ndarray, around` 18 `+ asarray, asanyarray, empty, empty_like, ndarray, around, floor, \` 19 `+ ceil, take` 19 20 ` from numpy.core.numeric import ScalarType, dot, where, newaxis, intp, \` 20 21 ` integer, isscalar` 21 22 ` from numpy.core.umath import pi, multiply, add, arctan2, \` 22 23 ` frompyfunc, isnan, cos, less_equal, sqrt, sin, mod, exp, log10` 23 24 ` from numpy.core.fromnumeric import ravel, nonzero, choose, sort, mean` 24 `-from numpy.core.numerictypes import typecodes, number` 25 `+from numpy.core.numerictypes import typecodes, number, intp` 25 26 ` from numpy.core import atleast_1d, atleast_2d` 26 27 ` from numpy.lib.twodim_base import diag` 27 28 ` from _compiled_base import _insert, add_docstring` `@@ -2994,7 +2995,9 @@ def median(a, axis=None, out=None, overwrite_input=False):` 2994 2995 ` # and check, use out array.` 2995 2996 ` return mean(sorted[indexer], axis=axis, out=out)` 2996 2997 ` ` 2997 `-def percentile(a, q, axis=None, out=None, overwrite_input=False):` 2998 `+` 2999 `+def percentile(a, q, limit=None, interpolation='linear', axis=None,` 3000 `+ out=None, overwrite_input=False):` 2998 3001 ` """` 2999 3002 ` Compute the qth percentile of the data along the specified axis.` 3000 3003 ` ` `@@ -3006,9 +3009,21 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3006 3009 ` Input array or object that can be converted to an array.` 3007 3010 ` q : float in range of [0,100] (or sequence of floats)` 3008 3011 ` Percentile to compute which must be between 0 and 100 inclusive.` 3012 `+ limit : tuple, optional` 3013 `+ Tuple of two scalars, the lower and upper limits within which to` 3014 `+ compute the percentile. Values outside of this range are ommitted from` 3015 `+ the percentile calculation. None includes all values in calculation.` 3016 `+ interpolation : {'linear', 'lower', 'higher', 'midpoint'}, optional` 3017 `+ This optional parameter specifies the interpolation method to use,` 3018 `+ when the desired quantile lies between two data points `i` and `j`:` 3019 `+` 3020 `+ * linear: `i + (j - i) * fraction`, where `fraction` is the` 3021 `+ fractional part of the index surrounded by `i` and `j`.` 3022 `+ * lower: `i`.` 3023 `+ * higher: `j`.` 3009 3024 ` axis : int, optional` 3010 3025 ` Axis along which the percentiles are computed. The default (None)` 3011 `- is to compute the median along a flattened version of the array.` 3026 `+ is to compute the percentiles along a flattened version of the array.` 3012 3027 ` out : ndarray, optional` 3013 3028 ` Alternative output array in which to place the result. It must` 3014 3029 ` have the same shape and buffer length as the expected output,` `@@ -3024,7 +3039,7 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3024 3039 ` ` 3025 3040 ` Returns` 3026 3041 ` -------` 3027 `- pcntile : ndarray` 3042 `+ percentile : ndarray` 3028 3043 ` A new array holding the result (unless `out` is specified, in` 3029 3044 ` which case that array is returned instead). If the input contains` 3030 3045 ` integers, or floats of smaller precision than 64, then the output` `@@ -3050,11 +3065,12 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3050 3065 ` array([[10, 7, 4],` 3051 3066 ` [ 3, 2, 1]])` 3052 3067 ` >>> np.percentile(a, 50)` 3053 `- 3.5` 3054 `- >>> np.percentile(a, 0.5, axis=0)` 3068 `+ array([3.5])` 3069 `+ >>> np.percentile(a, 50, axis=0)` 3055 3070 ` array([ 6.5, 4.5, 2.5])` 3056 3071 ` >>> np.percentile(a, 50, axis=1)` 3057 `- array([ 7., 2.])` 3072 `+ array([[ 7.],` 3073 `+ [2.]])` 3058 3074 ` ` 3059 3075 ` >>> m = np.percentile(a, 50, axis=0)` 3060 3076 ` >>> out = np.zeros_like(m)` `@@ -3065,20 +3081,20 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3065 3081 ` ` 3066 3082 ` >>> b = a.copy()` 3067 3083 ` >>> np.percentile(b, 50, axis=1, overwrite_input=True)` 3068 `- array([ 7., 2.])` 3084 `+ array([[ 7.,` 3085 `+ [2.]])` 3069 3086 ` >>> assert not np.all(a==b)` 3070 3087 ` >>> b = a.copy()` 3071 3088 ` >>> np.percentile(b, 50, axis=None, overwrite_input=True)` 3072 `- 3.5` 3089 `+ array([3.5])` 3073 3090 ` ` 3074 3091 ` """` 3075 `- a = np.asarray(a)` 3092 `+ a = asarray(a)` 3076 3093 ` ` 3077 `- if q == 0:` 3078 `- return a.min(axis=axis, out=out)` 3079 `- elif q == 100:` 3080 `- return a.max(axis=axis, out=out)` 3094 `+ if limit: # filter a based on limits` 3095 `+ a = a[(limit[0] <= a) & (a <= limit[1])]` 3081 3096 ` ` 3097 `+ # sort a` 3082 3098 ` if overwrite_input:` 3083 3099 ` if axis is None:` 3084 3100 ` sorted = a.ravel()` `@@ -3091,43 +3107,47 @@ def percentile(a, q, axis=None, out=None, overwrite_input=False):` 3091 3107 ` if axis is None:` 3092 3108 ` axis = 0` 3093 3109 ` ` 3094 `- return _compute_qth_percentile(sorted, q, axis, out)` 3095 `-` 3096 `-# handle sequence of q's without calling sort multiple times` 3097 `-def _compute_qth_percentile(sorted, q, axis, out):` 3098 `- if not isscalar(q):` 3099 `- p = [_compute_qth_percentile(sorted, qi, axis, None)` 3100 `- for qi in q]` 3101 `-` 3102 `- if out is not None:` 3103 `- out.flat = p` 3104 `-` 3105 `- return p` 3106 `-` 3110 `+ q = atleast_1d(q)` 3107 3111 ` q = q / 100.0` 3108 `- if (q < 0) or (q > 1):` 3112 `+ if (q < 0).any() or (q > 1).any():` 3109 3113 ` raise ValueError("percentile must be either in the range [0,100]")` 3110 3114 ` ` 3111 `- indexer = [slice(None)] * sorted.ndim` 3112 3115 ` Nx = sorted.shape[axis]` 3113 `- index = q*(Nx-1)` 3114 `- i = int(index)` 3115 `- if i == index:` 3116 `- indexer[axis] = slice(i, i+1)` 3117 `- weights = array(1)` 3118 `- sumval = 1.0` 3116 `+ indices = q * (Nx - 1)` 3117 `+` 3118 `+ # round fractional indices according to interpolation method` 3119 `+ if interpolation == 'lower':` 3120 `+ indices = floor(indices).astype(intp)` 3121 `+ elif interpolation == 'higher':` 3122 `+ indices = ceil(indices).astype(intp)` 3123 `+ elif interpolation == 'linear':` 3124 `+ pass # keep index as fraction and interpolate` 3119 3125 ` else:` 3120 `- indexer[axis] = slice(i, i+2)` 3121 `- j = i + 1` 3122 `- weights = array([(j - index), (index - i)],float)` 3123 `- wshape = [1]*sorted.ndim` 3124 `- wshape[axis] = 2` 3125 `- weights.shape = wshape` 3126 `- sumval = weights.sum()` 3127 `-` 3128 `- # Use add.reduce in both cases to coerce data type as well as` 3129 `- # check and use out array.` 3130 `- return add.reduce(sorted[indexer]*weights, axis=axis, out=out)/sumval` 3126 `+ raise ValueError("interpolation can only be 'linear', 'lower' "` 3127 `+ "or 'higher'")` 3128 `+` 3129 `+ if indices.dtype == intp: # take the points along axis` 3130 `+ return take(sorted, indices, axis=axis, out=out)` 3131 `+ else: # weight the points above and below the indices` 3132 `+ indices_below = floor(indices).astype(intp)` 3133 `+ indices_above = indices_below + 1` 3134 `+ indices_above[indices_above > Nx - 1] = Nx - 1` 3135 `+` 3136 `+ weights_above = indices - indices_below` 3137 `+ weights_below = 1.0 - weights_above` 3138 `+` 3139 `+ weights_shape = [1, ] * sorted.ndim` 3140 `+ weights_shape[axis] = len(indices)` 3141 `+ weights_below.shape = weights_shape` 3142 `+ weights_above.shape = weights_shape` 3143 `+` 3144 `+ x1 = take(sorted, indices_below, axis=axis) * weights_below` 3145 `+ x2 = take(sorted, indices_above, axis=axis) * weights_above` 3146 `+ if out is not None:` 3147 `+ return add(x1, x2, out=out)` 3148 `+ else:` 3149 `+ return add(x1, x2)` 3150 `+` 3131 3151 ` ` 3132 3152 ` def trapz(y, x=None, dx=1.0, axis=-1):` 3133 3153 ` """`
 `@@ -1383,26 +1383,119 @@ def compare_results(res, desired):` 1383 1383 ` assert_array_equal(res[i], desired[i])` 1384 1384 ` ` 1385 1385 ` ` 1386 `-def test_percentile_list():` 1387 `- assert_equal(np.percentile([1, 2, 3], 0), 1)` 1388 `-` 1389 `-def test_percentile_out():` 1390 `- x = np.array([1, 2, 3])` 1391 `- y = np.zeros((3,))` 1392 `- p = (1, 2, 3)` 1393 `- np.percentile(x, p, out=y)` 1394 `- assert_equal(y, np.percentile(x, p))` 1395 `-` 1396 `- x = np.array([[1, 2, 3],` 1397 `- [4, 5, 6]])` 1398 `-` 1399 `- y = np.zeros((3, 3))` 1400 `- np.percentile(x, p, axis=0, out=y)` 1401 `- assert_equal(y, np.percentile(x, p, axis=0))` 1402 `-` 1403 `- y = np.zeros((3, 2))` 1404 `- np.percentile(x, p, axis=1, out=y)` 1405 `- assert_equal(y, np.percentile(x, p, axis=1))` 1386 `+class TestScoreatpercentile(TestCase):` 1387 `+` 1388 `+ def test_basic(self):` 1389 `+ x = np.arange(8) * 0.5` 1390 `+ assert_equal(np.percentile(x, 0), 0.)` 1391 `+ assert_equal(np.percentile(x, 100), 3.5)` 1392 `+ assert_equal(np.percentile(x, 50), 1.75)` 1393 `+` 1394 `+ def test_2D(self):` 1395 `+ x = np.array([[1, 1, 1],` 1396 `+ [1, 1, 1],` 1397 `+ [4, 4, 3],` 1398 `+ [1, 1, 1],` 1399 `+ [1, 1, 1]])` 1400 `+ assert_array_equal(np.percentile(x, 50, axis=0), [[1, 1, 1]])` 1401 `+` 1402 `+ def test_limit(self):` 1403 `+ x = np.arange(10)` 1404 `+ assert_equal(np.percentile(x, 50, limit=(2, 5)), 3.5)` 1405 `+ assert_equal(np.percentile([2, 3, 4, 5], 50), 3.5)` 1406 `+` 1407 `+ assert_equal(np.percentile(x, 50, limit=(-1, 8)), 4)` 1408 `+ assert_equal(np.percentile([0, 1, 2, 3, 4, 5, 6, 7, 8], 50), 4)` 1409 `+` 1410 `+ assert_equal(np.percentile(x, 50, limit=(4, 11)), 6.5)` 1411 `+ assert_equal(np.percentile([4, 5, 6, 7, 8, 9], 50, ), 6.5)` 1412 `+` 1413 `+ def test_linear(self):` 1414 `+` 1415 `+ # Test defaults` 1416 `+ assert_equal(np.percentile(range(10), 50), 4.5)` 1417 `+ assert_equal(np.percentile(range(10), 50, (2, 7)), 4.5)` 1418 `+ assert_equal(np.percentile(range(100), 50, limit=(1, 8)), 4.5)` 1419 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, (10, 100)), 55)` 1420 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, (1, 10)), 5.5)` 1421 `+` 1422 `+ # explicitly specify interpolation_method 'fraction' (the default)` 1423 `+ assert_equal(np.percentile(range(10), 50,` 1424 `+ interpolation='linear'), 4.5)` 1425 `+ assert_equal(np.percentile(range(10), 50, limit=(2, 7),` 1426 `+ interpolation='linear'), 4.5)` 1427 `+ assert_equal(np.percentile(range(100), 50, limit=(1, 8),` 1428 `+ interpolation='linear'), 4.5)` 1429 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, (10, 100),` 1430 `+ interpolation='linear'), 55)` 1431 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, (1, 10),` 1432 `+ interpolation='linear'), 5.5)` 1433 `+` 1434 `+ def test_lower_higher(self):` 1435 `+` 1436 `+ # interpolation_method 'lower'/'higher'` 1437 `+ assert_equal(np.percentile(range(10), 50,` 1438 `+ interpolation='lower'), 4)` 1439 `+ assert_equal(np.percentile(range(10), 50,` 1440 `+ interpolation='higher'), 5)` 1441 `+ assert_equal(np.percentile(range(10), 50, (2, 7),` 1442 `+ interpolation='lower'), 4)` 1443 `+ assert_equal(np.percentile(range(10), 50, limit=(2, 7),` 1444 `+ interpolation='higher'), 5)` 1445 `+ assert_equal(np.percentile(range(100), 50, (1, 8),` 1446 `+ interpolation='lower'), 4)` 1447 `+ assert_equal(np.percentile(range(100), 50, (1, 8),` 1448 `+ interpolation='higher'), 5)` 1449 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, (10, 100),` 1450 `+ interpolation='lower'), 10)` 1451 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, limit=(10, 100),` 1452 `+ interpolation='higher'), 100)` 1453 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, (1, 10),` 1454 `+ interpolation='lower'), 1)` 1455 `+ assert_equal(np.percentile(np.array([1, 10, 100]), 50, limit=(1, 10),` 1456 `+ interpolation='higher'), 10)` 1457 `+` 1458 `+ def test_sequence(self):` 1459 `+ x = np.arange(8) * 0.5` 1460 `+ assert_equal(np.percentile(x, [0, 100, 50]), [0, 3.5, 1.75])` 1461 `+` 1462 `+ def test_axis(self):` 1463 `+ x = np.arange(12).reshape(3, 4)` 1464 `+` 1465 `+ assert_equal(np.percentile(x, (25, 50, 100)), [2.75, 5.5, 11.0])` 1466 `+` 1467 `+ r0 = [[2, 3, 4, 5], [4, 5, 6, 7], [8, 9, 10, 11]]` 1468 `+ assert_equal(np.percentile(x, (25, 50, 100), axis=0), r0)` 1469 `+` 1470 `+ r1 = [[0.75, 1.5, 3], [4.75, 5.5, 7], [8.75, 9.5, 11]]` 1471 `+ assert_equal(np.percentile(x, (25, 50, 100), axis=1), r1)` 1472 `+` 1473 `+ def test_exception(self):` 1474 `+ assert_raises(ValueError, np.percentile, [1, 2], 56,` 1475 `+ interpolation='foobar')` 1476 `+ assert_raises(ValueError, np.percentile, [1], 101)` 1477 `+ assert_raises(ValueError, np.percentile, [1], -1)` 1478 `+` 1479 `+ def test_percentile_list(self):` 1480 `+ assert_equal(np.percentile([1, 2, 3], 0), 1)` 1481 `+` 1482 `+ def test_percentile_out(self):` 1483 `+ x = np.array([1, 2, 3])` 1484 `+ y = np.zeros((3,))` 1485 `+ p = (1, 2, 3)` 1486 `+ np.percentile(x, p, out=y)` 1487 `+ assert_equal(y, np.percentile(x, p))` 1488 `+` 1489 `+ x = np.array([[1, 2, 3],` 1490 `+ [4, 5, 6]])` 1491 `+` 1492 `+ y = np.zeros((3, 3))` 1493 `+ np.percentile(x, p, axis=0, out=y)` 1494 `+ assert_equal(y, np.percentile(x, p, axis=0))` 1495 `+` 1496 `+ y = np.zeros((2, 3))` 1497 `+ np.percentile(x, p, axis=1, out=y)` 1498 `+ assert_equal(y, np.percentile(x, p, axis=1))` 1406 1499 ` ` 1407 1500 ` ` 1408 1501 ` def test_median():`