Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve nanmedian performance #3335

Merged
merged 1 commit into from
Feb 23, 2014
Merged

Conversation

juliantaylor
Copy link
Contributor

remove unnecessary copy, an unnecessary sort and use inplace median on
the compressed copy.

remove unnecessary copy, an unnecessary sort and use inplace median on
the compressed copy.
@juliantaylor
Copy link
Contributor Author

median has the overwrite_input argument and sorts by itself since at least numpy 1.4
what is the oldest supported numpy?

if x.size == 0:
return np.nan
return np.median(x)
return np.median(x, overwrite_input=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about overwrite_input=True
Do we have a copy of the original array without any nans and if it is 1d? or 2-d and non nans in a columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.compress returns a copy without the nans, this function only works on 1d arrays

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy 1.5 is the oldest targeted version, so overwrite_input should be fine

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 3ca2774 on juliantaylor:nanmedian into 6fa1a7c on scipy:master.

@juliantaylor
Copy link
Contributor Author

as normally you have less nans than data in your arrays it would be faster to just select the nans (np.where(np.isnan(x))) and move them to the end of the array with a little of fancy indexing.

Though its probably best to do a faster nanmedian in numpy and then reuse it in scipy.
I guess performance sensitive users use bottleneck anyway.

@ev-br
Copy link
Member

ev-br commented Feb 19, 2014

Re bottleneck: I'm actually wondering if we want to bundle bottleneck or (better) have it as a dependency.

@pv pv added the PR label Feb 19, 2014
@rgommers
Copy link
Member

-1 for a new dependency, that costs much more than it's worth. Replacing some scipy functions with relevant parts of bottleneck may make sense.

rgommers added a commit that referenced this pull request Feb 23, 2014
ENH: improve nanmedian performance
@rgommers rgommers merged commit dfd2e2b into scipy:master Feb 23, 2014
@rgommers
Copy link
Member

This looks correct, so merging for 0.14.x. Thanks Julian.

@rgommers rgommers added this to the 0.14.0 milestone Feb 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants