ENH: improve nanmedian performance #3335

juliantaylor · 2014-02-15T13:33:13Z

remove unnecessary copy, an unnecessary sort and use inplace median on
the compressed copy.

remove unnecessary copy, an unnecessary sort and use inplace median on the compressed copy.

juliantaylor · 2014-02-15T13:35:12Z

median has the overwrite_input argument and sorts by itself since at least numpy 1.4
what is the oldest supported numpy?

josef-pkt · 2014-02-15T13:40:39Z

scipy/stats/stats.py

    if x.size == 0:
        return np.nan
-    return np.median(x)
+    return np.median(x, overwrite_input=True)


about overwrite_input=True
Do we have a copy of the original array without any nans and if it is 1d? or 2-d and non nans in a columns?

np.compress returns a copy without the nans, this function only works on 1d arrays

Numpy 1.5 is the oldest targeted version, so overwrite_input should be fine

coveralls · 2014-02-15T14:13:09Z

Coverage remained the same when pulling 3ca2774 on juliantaylor:nanmedian into 6fa1a7c on scipy:master.

juliantaylor · 2014-02-15T14:33:10Z

as normally you have less nans than data in your arrays it would be faster to just select the nans (np.where(np.isnan(x))) and move them to the end of the array with a little of fancy indexing.

Though its probably best to do a faster nanmedian in numpy and then reuse it in scipy.
I guess performance sensitive users use bottleneck anyway.

ev-br · 2014-02-19T11:39:13Z

Re bottleneck: I'm actually wondering if we want to bundle bottleneck or (better) have it as a dependency.

rgommers · 2014-02-22T10:51:24Z

-1 for a new dependency, that costs much more than it's worth. Replacing some scipy functions with relevant parts of bottleneck may make sense.

ENH: improve nanmedian performance

rgommers · 2014-02-23T17:43:16Z

This looks correct, so merging for 0.14.x. Thanks Julian.

ENH: improve nanmedian performance

3ca2774

remove unnecessary copy, an unnecessary sort and use inplace median on the compressed copy.

josef-pkt reviewed Feb 15, 2014
View reviewed changes

rgommers added scipy.stats labels Feb 16, 2014

juliantaylor mentioned this pull request Feb 17, 2014

ENH: added functionality nanmedian to numpy numpy/numpy#4307

Merged

pv added the PR label Feb 19, 2014

rgommers added a commit that referenced this pull request Feb 23, 2014

Merge pull request #3335 from juliantaylor/nanmedian

dfd2e2b

ENH: improve nanmedian performance

rgommers merged commit dfd2e2b into scipy:master Feb 23, 2014

rgommers added this to the 0.14.0 milestone Feb 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: improve nanmedian performance #3335

ENH: improve nanmedian performance #3335

juliantaylor commented Feb 15, 2014

juliantaylor commented Feb 15, 2014

josef-pkt Feb 15, 2014

juliantaylor Feb 15, 2014

pv Feb 15, 2014

coveralls commented Feb 15, 2014

juliantaylor commented Feb 15, 2014

ev-br commented Feb 19, 2014

rgommers commented Feb 22, 2014

rgommers commented Feb 23, 2014

ENH: improve nanmedian performance #3335

ENH: improve nanmedian performance #3335

Conversation

juliantaylor commented Feb 15, 2014

juliantaylor commented Feb 15, 2014

josef-pkt Feb 15, 2014

Choose a reason for hiding this comment

juliantaylor Feb 15, 2014

Choose a reason for hiding this comment

pv Feb 15, 2014

Choose a reason for hiding this comment

coveralls commented Feb 15, 2014

juliantaylor commented Feb 15, 2014

ev-br commented Feb 19, 2014

rgommers commented Feb 22, 2014

rgommers commented Feb 23, 2014