Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: MA ufuncs should set mask to False, not array([False]) #7350

Merged
merged 1 commit into from
Jun 13, 2016

Conversation

ahaldane
Copy link
Member

Example of the problem:

>>> m = np.ma.array([1])
>>> (m+1).mask
False
>>> (m/2).mask
array([False], dtype=bool)

(This PR is split off from #5706. It fixes a problem that turned up in later commits there)

@charris
Copy link
Member

charris commented Feb 26, 2016

Scalar/array scalar False rather than array?

@ahaldane
Copy link
Member Author

Right, this PR changes it so the second example gives just False.

@@ -2968,8 +2968,8 @@ def __array_wrap__(self, obj, context=None):
np.copyto(result, fill_value, where=d)
# Update the mask
if m is nomask:
if d is not nomask:
m = d
if any(d):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up calling MaskedArray.any(); maybe cleaner to just write d.any()?

More generally, for the common case of largish arrays with no domain problems, the whole domain now gets checked twice. Might it be an idea to combine with the copyto and the whole finding of fill_value? I.e.,

if d.any():
    try:
      ... get fill_value ...
    np.copyto(result, fill_value, where=d)
    if m is nomask:
        m = True if d.all() else d
    else:
        m = (m | d)

(Haven't thought deeply enough to be sure whether or not one could more the result = result.copy() inside the if-statement as well.)

@mhvk
Copy link
Contributor

mhvk commented Feb 26, 2016

It is good to look at things separately. See comment about possible speed-up.

if d is not nomask:
m = d
if any(d):
m = True if all(d) else d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part necessary? Or is it OK to set the mask to a full array of True? It is not tested. (also, write as d.all()?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it is OK to set the mask of a full array of True to True. At the very least it would break assignments, since they check nomask/False explicitly but not allmasked or whatever you would call it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've been rethinking these lines. The original point of this PR was to turn an array of False into False.

But for an array of True taken care of here, I probably don't want to turn it into just True. It's possible I want to leave it as an array of True, but masked is also a possibility.

I'm also investigating the original problem that this PR is meant to fix (it was making it difficult to reimplement MaskedArray.var) to see if there is a better way to fix that altogether.

@ahaldane
Copy link
Member Author

ahaldane commented Mar 4, 2016

OK, I went for an even more drastic change: I removed the whole section of code which tries to update the out-of-domain data.

MaskedArray makes no guarantees about the data values at masked positions. We say so explicitly in the docs, and there are many operations currently that will clobber masked data. So why even bother trying to fill in those positions with the fill value?

I updated with @mhvk's comments about using d.any() instead of any(d), and I think you are both right that I shouldn't make an array of true into a single True.

Appveyor seems down. Tests pass on my system in python 2 though.

@seberg
Copy link
Member

seberg commented Mar 4, 2016

I am actually a bit confused about this right now with "masked" and "nomask" and all, I guess masked is only a singleton, and never _mask or whatever. That is to say, I am not 100% sure right now its wrong, but if you agree ;).

The branch can't be merged, you will have to do a rebase.

@ahaldane
Copy link
Member Author

ahaldane commented Mar 4, 2016

Rebased on master, checks pass now.

@seberg, yes, perhaps technically this isn't a bug. But I think this PR makes more sense, and also it is a performance/sanity boost for all the cases throughout the maskedarray code where there is a test of the form if mask is nomask. Those tests will all fail if the mask is actually an array of False, even though such a mask is equivalent to just nomask. Prior to this PR all ufuncs with a domain (ie half of the ufuncs) would return an array of False as the mask, instead of just nomask, causing all that code to go down the 'slow' path.

This PR makes my updated var and mean in #5706 much easier to implement since I can simply check if mask is nomask instead of also having to check for an array of False, when looking at the output of true_divide.

It this looks good I will squash the last commit, we can wait for tests to run, and then I will address your comments in #5706 :)

@mhvk
Copy link
Contributor

mhvk commented Mar 5, 2016

I've never been enamored of these attempts to keep values in place when masked, so in that sense am happy with removing all that code. It is a bit of a change in the way it works, though...

Otherwise, this looks all OK.

@njsmith
Copy link
Member

njsmith commented Mar 5, 2016

I think there might be people who use masked arrays as a way to say "I want
to apply this operation to only these items, while leaving the rest
unchanged"? Like where= but much older and more annoying and finicky. I
don't know enough about masked arrays to really have an opinion but fyi in
case this rings a bell or you want to investigate further.

On Fri, Mar 4, 2016 at 5:08 PM, Marten van Kerkwijk <
notifications@github.com> wrote:

I've never been enamored of these attempts to keep values in place when
masked, so in that sense am happy with removing all that code. It is a bit
of a change in the way it works, though...

Otherwise, this looks all OK.


Reply to this email directly or view it on GitHub
#7350 (comment).

Nathaniel J. Smith -- https://vorpus.org http://vorpus.org

@ahaldane
Copy link
Member Author

ahaldane commented Mar 5, 2016

@njsmith , I think that's what's described in the MaskedArray docs:

Arithmetic and comparison operations are supported by masked arrays. As much as possible, invalid entries of a masked array are not processed, meaning that the corresponding data entries should be the same before and after the operation.

Warning:

We need to stress that this behavior may not be systematic, that masked data may be affected by the operation in some cases and therefore users should not rely on this data remaining unchanged.

The behavior promised is actually tested, in test_datafriendly_div and related tests.

My guess is the lines I removed are related. It's not exactly the same though - the promise above is to (sometimes) avoid modifying masked elements, while the lines I removed place the fill_value at out-of-domain masked data values. That latter behavior isn't documented or tested (or useful, in my opinion).


I actually don't see much point in promising to try to leave masked elements alone sometimes, but then in the next line telling users that promise is unreliable. I would rather remove such code and make no such promises. But this PR doesn't even go that far, since I am only removing different undocumented behavior.

So, I still favor removing the code. The behavior related to leaving masked values alone is unaffected.

@mhvk
Copy link
Contributor

mhvk commented Mar 5, 2016

@ahaldane - ah, yes, you're right, this is even more specific than trying to leave values in place. I now see no reason not to remove the lines.

@ahaldane
Copy link
Member Author

ahaldane commented Mar 7, 2016

Pinging this one. If we merge it, then I can rebase #5706 and continue work over there.

@mhvk
Copy link
Contributor

mhvk commented Mar 7, 2016

@ahaldane - sorry to go back on what I wrote before, but my sense would be, after all, to keep the filling lines. The reason is that I think we should either remove it altogether (i.e., not even construct the ufunc_fills dict and remove it also in the methods) or leave it in. I think for now it is better to just keep it out of this PR, keeping this one very simple.

@ahaldane
Copy link
Member Author

ahaldane commented Mar 7, 2016

That's fair, I updated it along those lines.

@mhvk
Copy link
Contributor

mhvk commented Mar 7, 2016

OK, as far as I can see, that makes it entirely uncontroversial.

@charris
Copy link
Member

charris commented Jun 13, 2016

Let's give it a shot. Thanks Allan. I note that #5706 has already been merged.

@ahaldane
Copy link
Member Author

Thanks Chuck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants