Closes issue #586. Updates handling of nan behaviour in numpy.lib.function_base.median. #4287

empeeu · 2014-02-13T00:44:30Z

Please review and add to main branch. I also included a unit test to help with the review process.

(also, I'm considering writing a nanmedian function which would ignore nans, similar to the nansum, nanmax, etc. series of functions.)

…. This closes ticket# 2126 (issue #586)

of nans in reference to ticket #586

jaimefrio · 2014-02-13T03:14:20Z

numpy/lib/function_base.py

+    #Check if the array contains any nan's
+    ids = None
+    if axis is None or a.ndim == 1:
+        if np.any(np.isnan(a)):


This is probably not the best option performance wise, consider the following:

In [14]: a = np.random.rand(1000) In [15]: np.median(a) Out[15]: 0.51385702922222753 In [16]: np.partition(a, (499, 500))[499:501].sum() / 2 Out[16]: 0.51385702922222753 In [17]: %timeit np.partition(a, (499, 500))[499:501].sum() / 2 10000 loops, best of 3: 21.9 µs per loop In [18]: %timeit np.any(np.isnan(a)) 100000 loops, best of 3: 14 µs per loop In [19]: %timeit np.partition(a, (499, 500, -1))[499:501].sum() / 2 10000 loops, best of 3: 24.4 µs per loop

Calling np.isnan takes about 66% of the time to call np.partition on this array, a significant slowdown. If, on the other hand, you add -1 to the list of indices to place in sorted position in the call to np.partition, as in line 19 above, the slowdown is much less noticeable, and because NAN is always placed at the end of an array when sorting, you can now check if there is a NAN in the array by checking if its last item after the call to partition is.

You may want to check for a wider variety of sizes of the input, to confirm if the advantage remains.

Because of the performance penalty, it is probably also a good idea to only do something different if the dtype can hold a NAN, i.e. only for floating point types.

charris · 2014-02-13T05:22:47Z

I wonder if it would be better to have an explicit nanmedian function?

njsmith · 2014-02-13T10:50:58Z

@charris: nanmedian would also be good (see commit message), but IIRC this
is about median's output being flat out wrong currently in the presence of
NaNs.
On 13 Feb 2014 00:22, "Charles Harris" notifications@github.com wrote:

I wonder if it would be better to have an explicit nanmedian function?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4287#issuecomment-34949529
.

juliantaylor · 2014-02-13T12:12:38Z

I also don't like the performance impact of this, as @jaimefrio mentioned, nan is always the maximum of a sort so it is better to just add the maximum to the partition call and then check if the last element is a nan, as the partition is iterative the cost is not so high.
It can be reduced even more by specializing maximum, currently only minimum (small kth) is specialized.

this also allows implementing nanmedian trivially by simply shifting the index by the number of nans in the axis

note this diff probably conflicts with #3908

empeeu · 2014-02-13T13:20:09Z

Ok. I will re-write while keeping performance in mind. I will check a wider variety of array sizes to see the performance impact. Adding -1 index to the partition is brilliant! I wouldn't have guessed that isnan is so slow.

@jaimefrio Since I'm new at this, is there there a simple way to check if the dtype if floating? Something that would see float, float32, etc?

@juliantaylor shifting the index is exactly what I had in mind for nanmedian. However, it gets a little tricky when you're dealing with multi-dimensional arrays, because each dimension might have a different number of nans in its axis. Adding a loop was my best idea so far, but I cannot imagine that being efficient. So, I'll take suggestions there. I'll also look at specializing maximum, though I'm not sure what that means.

njsmith · 2014-02-13T13:27:31Z

The types that can have NaNs are the same as those that return true from
np.issubdtype(a.dtype, np.inexact). See
http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html

Does the partition trick work for complex values? I worry that complex
values with one nan element and one non-nan element might get sorted in
with the rest of the values...
On 13 Feb 2014 08:20, "Matt Ueckermann" notifications@github.com wrote:

Ok. I will re-write while keeping performance in mind. I will check a
wider variety of array sizes to see the performance impact. Adding -1 index
to the partition is brilliant! I wouldn't have guessed that isnan is so
slow.

@jaimefrio https://github.com/jaimefrio Since I'm new at this, is there
there a simple way to check if the dtype if floating? Something that would
see float, float32, etc?

@juliantaylor https://github.com/juliantaylor shifting the index is
exactly what I had in mind for nanmedian. However, it gets a little
tricky when you're dealing with multi-dimensional arrays, because each
dimension might have a different number of nans in its axis. Adding a loop
was my best idea so far, but I cannot imagine that being efficient. So,
I'll take suggestions there. I'll also look at specializing maximum, though
I'm not sure what that means.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/4287#issuecomment-34976845
.

juliantaylor · 2014-02-13T13:37:44Z

the sort order for complex is:

[R + Rj, R + nanj, nan + Rj, nan + nanj]

so one needs to select the size - 3 kth element to catch all nans

With maximum specialization I meant the selection code in selection.c.src, see dumbselect.
You can skip that for this PR if you are unfamiliar with C, its an optimization we can add latter.

I don't think there is a way to avoid an explicit loop for nanmedian, though maybe you can do a pseudo binary search. (searchsorted won't work as the partitioned array is not fully sorted)

empeeu · 2014-02-13T16:53:54Z

A procedural question (again, since I'm new here): Should I close this push request until I make the discussed changes? Or is there a way to update this request (avoid duplicates) once I have a newer version?

juliantaylor · 2014-02-13T16:56:54Z

you can continue to push to the branch and this issue will update
if you amend or rebase you have to use git push -f

jaimefrio · 2014-02-13T17:18:42Z

I don't think np.nanmedian can avoid a call to np.isnan. The indices in the call to np.partition refer to positions in the array, not to positions in the relative ordering of unique items. So calling np.partition with -1 will make sure that,the last item of the array will be in its correct sorted position, but if there are duplicates of that largest value, there are no guarantees to where those will end up:

>>> a = np.random.rand(100)
>>> a[np.random.randint(100, size=(10,))] = np.nan
>>> np.partition(a, -1)
array([ 0.15734533,  0.37424167,  0.2857852 ,  0.43048991,  0.02425887,
        ...
        0.90945659,  0.71758622,  0.99375881,         nan,  0.99419513,
               nan,         nan,         nan,  0.9975807 ,         nan,
               nan,         nan,         nan,         nan,         nan])

(You may have to repeat the above test several times, as it often does group all NANs at the end, although not always)

Because we cannot group all NANs at the end, it is irrelevant, but I think that a binary search for a needle, on a haystack that has been partitioned around that needle, would return correct results. An application that would require searchsorted taking a multidimensional haystack, which it also doesn't...

I am not sure if it would have much use outside of counting NANs, but you could run the quicksort partition algorithm with pivots supplied by the user, on an unsorted array, and return a partially sorted array, and the positions where those pivots would have to be inserted in the sorted array to preserve order, kind of a searchunsorted function. It may allow speeding up some of the set operations.

juliantaylor · 2014-02-13T18:09:58Z

right, I was thinking of partial sort semantics which would be much slower than running isnan first, how dumb of me...

Also added a flag "check_nan" that can be turned off to regain original performance. Also added nanmedian function to calculate the median while ignorning nans.

empeeu · 2014-02-17T01:15:59Z

Couple of updates. I tried a number of approaches to speed up the nan-checking in the median function. I use a little benchmark code:

import timeit
import numpy as np
#%%
setup_even = lambda N: """
import numpy as np
import itertools
a = np.random.rand(%d)
a[a.size // 2] = np.nan
""" % N
setup_nonan = lambda N: """
import numpy as np
import itertools
a = np.random.rand(%d)
""" % N
stmt_min = """#min 
ids = np.isnan(np.min(a))
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition((sz - 1) // 2)
"""
stmt_max = """#max 
ids = np.isnan(np.max(a))
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition((sz - 1) // 2)
"""
stmt_sum = """#sum 
ids = np.isnan(np.sum(a))
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition((sz - 1) // 2)
"""
stmt_dot = """#dot 
ids = np.isnan(np.dot(a, a))
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition((sz - 1) // 2)
"""
stmt_any = """#any
ids = np.any(np.isnan(a))
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition((sz - 1) // 2)
"""
stmt_iter = """#argwhere
ids = np.where(np.isnan(a))
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition((sz - 1) // 2)
"""
stmt_part = """#part 
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh, -1))
else:
    a.partition(((sz - 1) // 2, -1))
"""
stmt_base = """#base
sz = a.size
if sz % 2 == 0:
    szh = sz // 2
    a.partition((szh - 1, szh))
else:
    a.partition(((sz - 1) // 2))
"""
#%%
NN = np.logspace(1, 8)
NN = NN - NN % 2 #Make all of it even
even=[]
#odd=[]
nonan = []
stmts = [stmt_part, stmt_dot, stmt_sum, stmt_max, stmt_min, stmt_any, stmt_iter, stmt_base]
for N in NN:
    even.append([])
    #odd.append([])
    nonan.append([])
    setup_e = setup_even(N)
    setup_o = setup_nonan(N)
    number = int(10 ** (4 - np.log10(N) // 2))
    for stmt in stmts:
        even[-1].append(
            np.min(timeit.repeat(
                stmt=stmt, setup=setup_e, number=number, repeat=5)
            ) / number
        )
        nonan[-1].append(
            np.min(timeit.repeat(
                stmt=stmt, setup=setup_o, number=number, repeat=5)
            ) / number
        )
        print "N", N, stmt[:4], 'even', even[-1][-1], 'nonan', nonan[-1][-1]
#%%
even_arr = np.array(even)
nonan_arr = np.array(nonan)
import matplotlib.pyplot as plt
#%%
plt.figure(1)
plt.subplot(211)
plt.loglog(NN, (even_arr.T/ NN).T * 1e6)
plt.loglog(NN, (even_arr[:, -1]/ NN).T * 1e6, '--', linewidth=2)
plt.legend(['part', 'dot', 'sum', 'max', 'min', 'any', 'argwhere', 'base'], loc=0)
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Execution time per element $\mu s$')
plt.title('With nans')
plt.subplot(212)
plt.loglog(NN, (nonan_arr.T/ NN).T * 1e6)
plt.loglog(NN, (nonan_arr[:, -1]/ NN).T * 1e6, '--', linewidth=2)
plt.legend(['part', 'dot', 'sum', 'max', 'min', 'any', 'argwhere', 'base'], loc=0)
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Execution time per element $\mu s$')
plt.title('Without nans')
plt.savefig("Benchmarks.png")
plt.savefig("Benchmarks.svg")
#%%
plt.figure(2)
plt.semilogx(NN, (even_arr[:, [0, 3, -1]].T / (even_arr[:, -1].T)).T)
plt.legend(['part', 'max', 'base'], loc=0)
plt.ylim([0, 4])
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Fractional penalty $(t / t_b)$')
plt.savefig("Benchmarks2.png")
plt.savefig("Benchmarks2.svg")

Results are here:

With the normalized close-up here:

So, while there was a cross-over array size where the np.isnan(np.max(a)) was a little bit faster, I didn't implement a cross over because I figured it would be different on different machines, and it makes the implementation uglier. So, because of the basically unavoidable performance hit, I added a flag that you could turn off to no longer check for nans. At the same time, I'm only checking for nans for datatypes that support them (thanks @njsmith)

Okay, then I added a nanmedian function. Someone pointed out that scipy already had this, so I took some hints from their implementation, but mine seems to be considerably faster. To compare the performance of the original median, my new median, nanmedian, and scipy's nanmedian, I wrote this little script, where I put my development version in a 'numpy2' folder:

import timeit
import numpy as np
import os
os.chdir('Projects/numpy')
#%%
setup_even = lambda N: """
import numpy as np
import numpy2 as np2
from scipy.stats import nanmedian
a = np.random.rand(%d)
a[a.size // 2] = np.nan
""" % N
setup_nonan = lambda N: """
import numpy as np
import numpy2 as np2
from scipy.stats import nanmedian
a = np.random.rand(%d)
""" % N
stmt_nochk = """#no check
np2.median(a, check_nan=False)
"""
stmt_dochk = """#do check 
np2.median(a)
"""
stmt_nanmed = """#nan median
np2.nanmedian(a)
"""
stmt_spnanmed = """#sp nan median
nanmedian(a)
"""
stmt_base = """#base
np.median(a)
"""
#%%
NN = np.logspace(1, 8, 25)
NN = NN - NN % 2 #Make all of it even
even=[]
nan = []
stmts = [stmt_nochk, stmt_dochk, stmt_nanmed, stmt_spnanmed, stmt_base]
for N in NN:
    even.append([])    
    nan.append([]) 
    setup_e = setup_even(N)    
    setup_o = setup_nonan(N)    
    number = int(10 ** (4 - np.log10(N) // 2))
    for stmt in stmts:
        even[-1].append(
            np.min(timeit.repeat(
                stmt=stmt, setup=setup_e, number=number, repeat=5)
            ) / number
        )
        nan[-1].append(
            np.min(timeit.repeat(
                stmt=stmt, setup=setup_o, number=number, repeat=5)
            ) / number
        )
        print "N", N, stmt[:4], 'even', even[-1][-1], 'nan', nan[-1][-1]
#%%
even_arr = np.array(even)
nan_arr = np.array(nan)
import matplotlib.pyplot as plt
#%%
plt.figure(1)
plt.subplot(211)
plt.loglog(NN, (even_arr.T/ NN).T * 1e6)
plt.loglog(NN, (even_arr[:, -1]/ NN).T * 1e6, '--', linewidth=2)
plt.legend(['no check', 'check nan', 'nan median', 'sp nan median', 'base'], loc=0)
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Execution time per element $\mu s$')
plt.title('With nans')
plt.subplot(212)
plt.loglog(NN, (nan_arr.T/ NN).T * 1e6)
plt.loglog(NN, (nan_arr[:, -1]/ NN).T * 1e6, '--', linewidth=2)
plt.legend(['no check', 'check nan', 'nan median', 'sp nan median', 'base'], loc=0)
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Execution time per element $\mu s$')
plt.title('Without nans')
plt.savefig("Benchmarksbb.png")
plt.savefig("Benchmarksbb.svg")
#%%
plt.figure(2)
plt.semilogx(NN, (even_arr.T / (even_arr[:, -1].T)).T)
plt.legend(['no check', 'check nan', 'nan median', 'sp nan median', 'base'], loc=0)
plt.ylim([0, 4])
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Fractional penalty $(t / t_b)$')
plt.title('with nan')
plt.savefig("Benchmarksbb2a.png")
plt.savefig("Benchmarksbb2a.svg")
#%%
plt.figure(3)
plt.semilogx(NN, (nan_arr.T / (nan_arr[:, -1].T)).T)
plt.legend(['no check', 'check nan', 'nan median', 'sp nan median', 'base'], loc=0)
plt.ylim([0, 4])
plt.xlabel('Size of "a" (number of elements)')
plt.ylabel('Fractional penalty $(t / t_b)$')
plt.title('without nan')
plt.savefig("Benchmarksbb2b.png")
plt.savefig("Benchmarksbb2b.svg")

Results are here:

So, with the flag set to False I seem to recover the performance of the original median function. What do you guys think.

(Oh, and I still need to update the comments on nanmedian)

argriffing · 2014-02-17T04:55:03Z

This competes with #4307?

empeeu · 2014-02-17T12:26:00Z

It looks like we both implemented a version of nanmedian, but I also dealt with the handling of NaNs in the plain-old median function. @dfreese, do you have any suggestions?

juliantaylor · 2014-02-17T14:01:54Z

it would be best if the nanmedian can be put into a separate PR, it is independent of checking for nan in regular median and my tiny brain is getting confused :(

also this should be put on hold until #3908 is resolved, any change to median will conflict with it.
I imagine it could get tricky to get nan support into percentile which is already very convoluted.
This may be an argument to not implement median in terms of percentile as it just makes modification of the more important function harder.

you aren't seeding the rng in your benchmarks or? partition performance depends on the layout of the data array, for better benchmarks you should use the same random seed.
also if we add a special max case to the partition code the nan check should be almost free so the benchmarks will be invalidated by it.
I'll try to add the specialization to master soon. Then meaningful comparisons can be made.

dfreese · 2014-02-17T18:03:24Z

@empeeu, your implementation using apply_along_axis which was suggested for mine, so your version should probably be used going forward. I have do have unit tests for nanmedian that could prove useful.

juliantaylor · 2014-02-17T22:16:31Z

here the nan specialization I was talking about:

--- a/numpy/core/src/npysort/selection.c.src
+++ b/numpy/core/src/npysort/selection.c.src
@@ -322,6 +322,19 @@ int
         store_pivot(kth, kth, pivots, npiv);
         return 0;
     }
+    else if (kth == num - 1) {
+        npy_intp k;
+        npy_intp maxidx = low;
+        @type@ maxval = v[IDX(low)];
+        for (k = low + 1; k < num; k++) {
+            if (!@TYPE@_LT(v[IDX(k)], maxval)) {
+                maxidx = k;
+                maxval = v[IDX(k)];
+            }
+        }
+        SWAP(SORTEE(kth), SORTEE(maxidx));
+        return 0;
+    }

     /* dumb integer msb, float npy_log2 too slow for small parititions */
     {

that gets you the -1 partition almost for free, before:

In [2]: np.random.seed(56); d = np.arange(200000); np.random.shuffle(d)
In [5]: %timeit np.partition(d, (d.size // 2))
1000 loops, best of 3: 1.91 ms per loop
In [6]: %timeit np.partition(d, (d.size // 2, -1))
100 loops, best of 3: 2.41 ms per loop

after

In [2]: np.random.seed(56); d = np.arange(200000); np.random.shuffle(d)
In [5]: %timeit np.partition(d, (d.size // 2))
1000 loops, best of 3: 1.94 ms per loop
In [6]: %timeit np.partition(d, (d.size // 2, -1))
100 loops, best of 3: 2.03 ms per loop

empeeu · 2014-02-18T00:21:41Z

@dfreese : I agree with juliantaylor that nanmedian should go in a separate PR, so I will remove it from this one. Since you have a PR open for nanmedian, why don't you take what you find useful from my nanmedian and combine it with yours? I'll just focus on median, nanmedian is all yours.

@juliantaylor Nice specialization. I ventured into the c code a little bit the other day, but got lost, so I appreciate seeing where the partition code it (I was thinking of writing a count_nan function, so I went looking for count_nonzero). I'll benchmark to see what happens.

…dling of nans in median.

dfreese · 2014-02-18T21:03:19Z

@empeeu: Sure, that sounds good.

for ARGOUTVIEWM?_FARRAY[34] typemaps, the included require_fortran check was inverted -- failing if the resulting array had fortran ordering, not if it did not. This modification inverts these checks.

juliantaylor · 2014-02-22T16:04:38Z

the partition special case for maximum element has been merged into master

charris · 2014-02-22T20:17:45Z

@juliantaylor How about we put this in and then revisit #3908 to see if we can thrash out the issues.

juliantaylor · 2014-02-23T13:47:10Z

I would prefer sorting out 3908 first, also this PR is not ready to be merged yet

… checking is now pretty much as fast as the old one. Hence, removed the check_nan flag from previous commits.

empeeu · 2014-02-26T00:42:06Z

@juliantaylor You really know your stuff. Benchmark for the latest commit:

I was a little worried by how close they were, but I checked it, and it seems to be correct.

Waiting on 3908 now. Let me know if there's anything else to do here.

breaking the api breaks matplotlib build and pip installation. Introduce npy_PyFile_Dup2 and npy_PyFile_DupClose2 as replacements

numpy.i bugfix: fortran ordering check

charris · 2014-03-05T22:41:32Z

@empeeu Needs a rebase.

BUG: restore api for file npy_PyFile_Dup and npy_PyFile_DupClose

Caused SciPy tests to fail when built with this NumPy.

BUG: Revert PR #4421: causes SciPy tests to fail

run manual apt-get update to pick up the latest py3 security update

… residuals array

DOC: Fixed documentation on lstsq function on when it return an empty re...

work around outdated travis boxes

ENH: Added an output argument for numpy.outer

…. This closes ticket# 2126 (issue #586)

of nans in reference to ticket #586

Also added a flag "check_nan" that can be turned off to regain original performance. Also added nanmedian function to calculate the median while ignorning nans.

…dling of nans in median.

… checking is now pretty much as fast as the old one. Hence, removed the check_nan flag from previous commits.

empeeu · 2014-03-08T02:39:42Z

Seems like I've completely messed up the rebase. I'm going to close this up and try again.

empeeu added 2 commits February 12, 2014 19:10

ENH: added proper handling of nans for numpy.lib.function_base.median…

a9eff96

…. This closes ticket# 2126 (issue #586)

ENH: added unit test for previous commit which updates median's handling

54aff25

of nans in reference to ticket #586

jaimefrio reviewed Feb 13, 2014
View reviewed changes

ENH: #586. Improved performance of nan handling in median function.

10902b5

Also added a flag "check_nan" that can be turned off to regain original performance. Also added nanmedian function to calculate the median while ignorning nans.

DOC: Updated doc string for nanmedian.

8c2f007

REV: Reverting nanmedian from this PR. Now deals exclusively with han…

d8e3957

…dling of nans in median.

numpy.i bugfix: fortran ordering check

aa615ad

for ARGOUTVIEWM?_FARRAY[34] typemaps, the included require_fortran check was inverted -- failing if the resulting array had fortran ordering, not if it did not. This modification inverts these checks.

dfreese mentioned this pull request Feb 23, 2014

ENH: added functionality nanmedian to numpy #4307

Merged

ENH: #586 Pulls in juliantaylor's specialization. New median with nan…

89d4d1e

… checking is now pretty much as fast as the old one. Hence, removed the check_nan flag from previous commits.

juliantaylor and others added 2 commits March 5, 2014 23:18

BUG: restore api for file npy_PyFile_Dup and npy_PyFile_DupClose

ea89757

breaking the api breaks matplotlib build and pip installation. Introduce npy_PyFile_Dup2 and npy_PyFile_DupClose2 as replacements

Merge pull request #4334 from jspreston/patch-1

0bf8fec

numpy.i bugfix: fortran ordering check

charris and others added 15 commits March 5, 2014 16:40

Merge pull request #4450 from juliantaylor/api-restore

683bc33

BUG: restore api for file npy_PyFile_Dup and npy_PyFile_DupClose

Revert "Merge pull request #4421 from meltingwax/meltingwax/4382"

4b2b77e

Caused SciPy tests to fail when built with this NumPy.

Merge pull request #4451 from meltingwax/meltingwax/revert-4421

a6f9b78

BUG: Revert PR #4421: causes SciPy tests to fail

TST: work around outdated travis boxes

bf2a953

run manual apt-get update to pick up the latest py3 security update

DOC: Fixed documentation on lstsq function on when it return an empty…

ea75479

… residuals array

Merge pull request #4457 from cimarronm/linalg_lstsq_docfix

b099017

DOC: Fixed documentation on lstsq function on when it return an empty re...

Merge pull request #4456 from juliantaylor/travis-updates

0007cd6

work around outdated travis boxes

Merge pull request #4459 from Sankarshan-Mudkavi/fix-gh-1367

4da29a8

ENH: Added an output argument for numpy.outer

ENH: added proper handling of nans for numpy.lib.function_base.median…

c35542c

…. This closes ticket# 2126 (issue #586)

ENH: added unit test for previous commit which updates median's handling

210f534

of nans in reference to ticket #586

ENH: #586. Improved performance of nan handling in median function.

22eeef0

Also added a flag "check_nan" that can be turned off to regain original performance. Also added nanmedian function to calculate the median while ignorning nans.

DOC: Updated doc string for nanmedian.

2d8f3ed

REV: Reverting nanmedian from this PR. Now deals exclusively with han…

52ccb4e

…dling of nans in median.

ENH: #586 Pulls in juliantaylor's specialization. New median with nan…

cbca46c

… checking is now pretty much as fast as the old one. Hence, removed the check_nan flag from previous commits.

Merge branch 'master' of github.com:empeeu/numpy

d76ed36

empeeu closed this Mar 8, 2014

empeeu mentioned this pull request Mar 8, 2014

ENH: Added proper handling of nans for numpy.lib.function_base.median #4460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes issue #586. Updates handling of nan behaviour in numpy.lib.function_base.median. #4287

Closes issue #586. Updates handling of nan behaviour in numpy.lib.function_base.median. #4287

empeeu commented Feb 13, 2014

jaimefrio Feb 13, 2014

jaimefrio Feb 13, 2014

charris commented Feb 13, 2014

njsmith commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

empeeu commented Feb 13, 2014

njsmith commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

empeeu commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

jaimefrio commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

empeeu commented Feb 17, 2014

argriffing commented Feb 17, 2014

empeeu commented Feb 17, 2014

juliantaylor commented Feb 17, 2014

dfreese commented Feb 17, 2014

juliantaylor commented Feb 17, 2014

empeeu commented Feb 18, 2014

dfreese commented Feb 18, 2014

juliantaylor commented Feb 22, 2014

charris commented Feb 22, 2014

juliantaylor commented Feb 23, 2014

empeeu commented Feb 26, 2014

charris commented Mar 5, 2014

empeeu commented Mar 8, 2014

Closes issue #586. Updates handling of nan behaviour in numpy.lib.function_base.median. #4287

Closes issue #586. Updates handling of nan behaviour in numpy.lib.function_base.median. #4287

Conversation

empeeu commented Feb 13, 2014

jaimefrio Feb 13, 2014

Choose a reason for hiding this comment

jaimefrio Feb 13, 2014

Choose a reason for hiding this comment

charris commented Feb 13, 2014

njsmith commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

empeeu commented Feb 13, 2014

njsmith commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

empeeu commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

jaimefrio commented Feb 13, 2014

juliantaylor commented Feb 13, 2014

empeeu commented Feb 17, 2014

argriffing commented Feb 17, 2014

empeeu commented Feb 17, 2014

juliantaylor commented Feb 17, 2014

dfreese commented Feb 17, 2014

juliantaylor commented Feb 17, 2014

empeeu commented Feb 18, 2014

dfreese commented Feb 18, 2014

juliantaylor commented Feb 22, 2014

charris commented Feb 22, 2014

juliantaylor commented Feb 23, 2014

empeeu commented Feb 26, 2014

charris commented Mar 5, 2014

empeeu commented Mar 8, 2014