Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add ndenumerate specialization for masked arrays #20020

Merged
merged 3 commits into from
May 11, 2022
Merged

ENH: add ndenumerate specialization for masked arrays #20020

merged 3 commits into from
May 11, 2022

Conversation

joukewitteveen
Copy link
Contributor

Using ndenumerate on a masked array does not yield ma.masked for masked values, but rather the actual value of the underlying data array. This makes sense from a performance perspective, but is potentially undesirable. Rather than filling the masked array before using ndenumerate and incurring a copy, we can use a simple ndenumerate wrapper specifically for masked arrays. This MR suggests such a specialization of ndenumerate.

An 'upcoming changes' message is still missing as this is this functionality is proposed without prior discussion and I want to know what people think first.

At least in 2011 I could find someone else wanting this functionality too: https://stackoverflow.com/questions/8620798/numpy-ndenumerate-for-masked-arrays/8621173.

Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .pyi addition looks good to me.
As far as the general concept is concerned: I would like to see the opinion of someone more involved with masked arrays.

Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is most definitely missing from this PR is a set of tests.
Three useful test cases would be masked arrays with all-False, all-True and mixed True/False masks.

@joukewitteveen
Copy link
Contributor Author

Ah, I thought the basic example in the docstring could maybe be sufficient, but you're right: there is no reason not to have some real tests. I struggled to recognize a coding style in the existing tests and did my best to write something sensible.

I have no idea why the CircleCI tests are failing.

Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError: Missing SVML submodule! Run git submodule update --init to fix this.

The CircleCI error seems to be unrelated, though I'm wondering if it is a fluke or not.

In any case, I don't have any particularly strong feelings about masked arrays, but the new np.ma.ndenumerate implementation looks solid enough.

doc/release/upcoming_changes/20020.new_function.rst Outdated Show resolved Hide resolved
@mattip
Copy link
Member

mattip commented Oct 14, 2021

The CircleCI error seems to be unrelated, though I'm wondering if it is a fluke or not.

Maybe a rebase/merge off main would fix it?

@joukewitteveen
Copy link
Contributor Author

The CircleCI error seems to be unrelated, though I'm wondering if it is a fluke or not.

Maybe a rebase/merge off main would fix it?

That did the trick, thank you!

@mattip
Copy link
Member

mattip commented Nov 17, 2021

I think we should put this in. If it exists on ndarray, it should exist on ma.

@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Nov 17, 2021
@seberg seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Nov 17, 2021
@mattip mattip added this to the 1.23.0 release milestone Nov 17, 2021
@mattip mattip removed the triage review Issue/PR to be discussed at the next triage meeting label Nov 17, 2021
@mattip
Copy link
Member

mattip commented Nov 23, 2021

From the mailing list discussion:

This version skips all masked elements. An alternative could be to
return np.ma.masked for masked elements?

Would it be a bad idea to add a kwarg that specifies this behaviour (i.e. offering both alternatives)? Assuming people might need the masked items to be there under certain circumstances. Perhaps when zipping masked data with dense data?

@joukewitteveen
Copy link
Contributor Author

I added a commit on top of this MR with the additional kwarg. I hope this is in line with expectations.

@mattip
Copy link
Member

mattip commented Apr 18, 2022

Was there consensus around the compressed=True kwarg? Isn't np.ma.ndenumerate(a, compressed=False) equivalent to np.ndenumerate(a) ? If so maybe we could drop the kwarg and add a documentation note to use np.ndenumerate() to get that behavior

@joukewitteveen
Copy link
Contributor Author

Was there consensus around the compressed=True kwarg? Isn't np.ma.ndenumerate(a, compressed=False) equivalent to np.ndenumerate(a) ? If so maybe we could drop the kwarg and add a documentation note to use np.ndenumerate() to get that behavior

There was no consensus, just a request for the functionality. One of the key reasons to have the specialization is that with np.ndenumerate you'd get element == a.data[i], whereas with np.ma.ndenumerate you'd get element == a[i]. This makes a difference for masked elements, where only in the second case we'd have element == ma.masked.

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label May 4, 2022
@seberg
Copy link
Member

seberg commented May 4, 2022

We discussed this briefly in the triage meeting, but did not want to decide on this. Maybe we should ping the mailing list once more on the exact addition (since it is new API). Not that I particularly expect a lot of input.

In general, the PR seems pretty much ready, and we should just make a decision to merge or not soon, though.

@charris
Copy link
Member

charris commented May 10, 2022

This sounds reasonable to me. @mattip Any reservations?

numpy/ma/extras.py Outdated Show resolved Hide resolved
@mattip
Copy link
Member

mattip commented May 10, 2022

Thanks for the change in the example. This looks good to me, I will leave it for a little while in case anyone else wants to take a look.

@mattip mattip merged commit 02d1204 into numpy:main May 11, 2022
@mattip
Copy link
Member

mattip commented May 11, 2022

Thanks @joukewitteveen

@InessaPawson InessaPawson added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels May 12, 2022
@InessaPawson
Copy link
Member

Hi-five on merging your first pull request to NumPy, @joukewitteveen! We hope you stick around! Your choices aren’t limited to programming – you can review pull requests, help us stay on top of new and old issues, develop educational material, work on our website, add or improve graphic design, create marketing materials, translate website content, write grant proposals, and help with other fundraising initiatives. For more info, check out: https://numpy.org/contribute
Also, consider joining our mailing list. This is a great way to connect with other cool people in our community and be part of important conversations that affect the development of NumPy: https://mail.python.org/mailman/listinfo/numpy-discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. component: numpy.ma masked arrays triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants