Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: adds np.nancumsum and np.nancumprod #7421

Merged
merged 1 commit into from
Mar 26, 2016
Merged

Conversation

pwolfram
Copy link
Contributor

This PR adds an implementation of nancumsum and nancumprod.

The actual function is a two-liner adapted from nansum and nanprod.

Its structure is adapted from PR: #5418 ( a minor typo in the doc string from this PR is fixed too)

@pwolfram
Copy link
Contributor Author

cc @shoyer and pydata/xarray#791

pwolfram added a commit to pwolfram/numpy that referenced this pull request Mar 16, 2016
@shoyer
Copy link
Member

shoyer commented Mar 16, 2016

Just a note -- since this is new API, somebody might ask you to write the numpy-discussions mailing list. But I think this should be pretty uncontroversial 👍.

@shoyer
Copy link
Member

shoyer commented Mar 16, 2016

Consistency checks with integer arrays are great, but these also need tests on arrays with NaNs.

@pwolfram
Copy link
Contributor Author

Thanks @shoyer, I accidentally left that out. I updated this to include tests on arrays with NaNs following nanprod's lead. Does this work?

@charris
Copy link
Member

charris commented Mar 16, 2016

Could you put the documentation commit in this PR as well? Note that you can keep adding commits to the PR, and later clean up the history with git rebase -i if needed.

@pwolfram
Copy link
Contributor Author

@charris, done. @shoyer, I've also added the testing capability as you requested. Thanks!


One is returned for slices that are all-NaN or empty.

.. versionadded:: 1.11.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already at the release candidate stage for 1.11, so this will make it into 1.12

@pwolfram
Copy link
Contributor Author

@charris and @shoyer, is there anything else that needs done on this PR?

Return the cumulative sum of array elements over a given axis treating Not a
Numbers (NaNs) as zero.

One is returned for slices that are all-NaN or empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is supposed to say zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg, thanks for finding that typo (should have been plural form of verb too, not singular-- same also for nancumprod)

@njsmith
Copy link
Member

njsmith commented Mar 17, 2016

@@ -548,7 +551,7 @@ def nanprod(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
Parameters
----------
a : array_like
Array containing numbers whose sum is desired. If `a` is not an
Array containing numbers whose product is desired. If `a` is not an
array, a conversion is attempted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, nans are treated like "one", not "zero" on line 542/545.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I am not mistaken, axis, two lines below, accepts a tuple now. The docs should be updated according to np.prod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for the .accumulate() method used by the cumxxx functions: only one axis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. By the way, would it make sense to apply the _ureduce function from numpy.lib.function_base to other places, like most of numpy.core.from_numeric? That way we could add multi-axis support without waiting for the gufunc redesign that seems to be coming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which functions do you have in mind? I don't see any obvious place where _ureduce could help...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptp, argmin, argmax. Also, the following might benefit as well, but the internal order of the array would matter: partition, argpartition, sort, argsort, searchsorted, cumsum, cumprod. I was even thinking that it might be worth allowing ravel to do a partial ravel along a subset of the axes (basically like _ureduce does), but that is probably going to require some major rewriting to do properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madphysicist, change made by 8fd1ee8 fixes the typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only that I made a mistake. Axis only accepts one value here. You were absolutely right.

Return the cumulative sum of array elements over a given axis treating Not a
Numbers (NaNs) as zero.

Zeros are returned for slices that are all-NaN or empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that the sum does not change when nans are encountered and that any leading nans are replaced by zeros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madphysicist, thanks, should be fixed by 41adc0f

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you pushed that commit yet?

@madphysicist
Copy link
Contributor

Aside from the comments about docs, LGTM. And by all means feel free to ignore the comments about functions other than your own.

@pwolfram
Copy link
Contributor Author

@madphysicist, I've pushed commits to address your comments (including for nansum and nanprod). Please let me know if you see any further room for improvement. Thanks!

@pwolfram
Copy link
Contributor Author

@madphysicist, actually there are more following a page refresh... hold on a bit. I'll let you know when the others are pushed.

nansum_along_axis : ndarray.
A new array holding the result is returned unless `out` is
specified, in which it is returned. The
result has the same size as `a`, and the same shape as `a` if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nitpick, but the line lengths look funny here. Also, _along_axis is inconsistent with the other functions. I think you should just leave it as nansum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed I wondered about that myself but didn't want to depart from convention. Fixed in 29cca2d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in 6ae90ec

@pwolfram
Copy link
Contributor Author

Changes made, thanks for the keen attention to detail and for helping me improve the code @shoyer and @seberg!

@shoyer
Copy link
Member

shoyer commented Mar 24, 2016

Reading PEP8 more carefully, it looks like I'm wrong about spacing around binary operators:
https://www.python.org/dev/peps/pep-0008/#other-recommendations

Hmm....

@pwolfram
Copy link
Contributor Author

@shoyer, reverted spacing back on * because I don't think this makes the code easier to read but left it for % because it adds clarity in accordance with the pep8 guidelines.

@@ -22,6 +22,18 @@
np.array([0.1042, -0.5954]),
np.array([0.1610, 0.1859, 0.3146])]

# Rows of _ndat with nans converted to ones
_rdat_ones = [np.array([0.6244, 1.0, 0.2692, 0.0116, 1.0, 0.1170]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably be all one big array -- it's only not one array for _rdat because _rdat is ragged

@shoyer
Copy link
Member

shoyer commented Mar 24, 2016

@pwolfram I think this is very close, though a test for negative axes would be nice. Otherwise this looks good to me. Sorry for leading your astray on pep8!

assert_almost_equal(res, tgt)
tgt = np.cumsum(_ndat_zeros,axis=axis)
res = np.nancumsum(_ndat, axis=axis)
assert_almost_equal(res, tgt)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer, was this what you were thinking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly

This PR adds an implementation of `nancumsum` and `nancumprod`.

The actual function is a two-liner adapted from `nansum`.

Its structure is adapted from PR: numpy#5418
res = nf(mat, axis=axis, out=resout)
assert_almost_equal(res, resout)
assert_almost_equal(res, tgt)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer, I also generalized the test here for consistency and greater coverage.

@pwolfram
Copy link
Contributor Author

@shoyer, please see above tests added over the negative axis. Thanks for helping refine this!

@pwolfram
Copy link
Contributor Author

Also, btw @shoyer, no worries about the pep8 confusion. I installed the plugin for vim and I have a clearer idea of formatting standards so it was certainly value-added!

@shoyer
Copy link
Member

shoyer commented Mar 24, 2016

OK, this looks good to me. I'll merge this once tests pass unless anyone speaks up with objections...

@pwolfram
Copy link
Contributor Author

Thanks @shoyer!

@shoyer shoyer added this to the 1.12.0 release milestone Mar 26, 2016
@shoyer shoyer merged commit ef389ee into numpy:master Mar 26, 2016
pwolfram added a commit to pwolfram/xarray that referenced this pull request Mar 28, 2016
Needed until numpy v1.12, see
numpy/numpy#7421
pwolfram added a commit to pwolfram/xarray that referenced this pull request Sep 20, 2016
Needed until numpy v1.12, see
numpy/numpy#7421
pwolfram added a commit to pwolfram/xarray that referenced this pull request Sep 20, 2016
pwolfram added a commit to pwolfram/xarray that referenced this pull request Sep 20, 2016
Needed until numpy v1.12, see
numpy/numpy#7421
pwolfram added a commit to pwolfram/xarray that referenced this pull request Sep 20, 2016
pwolfram added a commit to pwolfram/xarray that referenced this pull request Sep 20, 2016
pwolfram added a commit to pwolfram/xarray that referenced this pull request Oct 3, 2016
shoyer pushed a commit to pydata/xarray that referenced this pull request Oct 3, 2016
* Adds nancumsum, nancumprod for numpy compatability

Needed until numpy v1.12, see
numpy/numpy#7421

* Adds nancumsum, nancumprod to xarray functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants