ENH: Make no unshare mask future warnings less noisy #7363

seberg · 2016-02-28T17:38:08Z

Mostly a rebase of the other one, mostly in case anyone still cares (yeah, I can't stop myself from just not liking ):

I don't care about the refcount logic, but it reduces noise. Actually "fixes" the spurious warnings that those extra _sharedmask = False calls also fix in extras.py. Those warnings are caused by intermediate arrays without any user control.
I don't care about a MaskedArrayFutureWarning as its own class, I believe at the point where we feel that we have to make it easier to suppress a warning, there is probably something not quite right (the warning should be avoided not silenced). But it left it because it does not hurt.
Removed the "oldmask" logic. I think one warning per array should be enough in all cases and I doubt it will help with debugging possible issues much (or that anyone would bother anyway).
Left the pointer to setting _sharedmask = False it is messing with internals, so it is ugly and I don't mind removing it.
Current version is hit a lot by things that never call unshare_mask, i.e. field-indexing, hardmasks assign "masked". Putting the warning where the call is just does not break my brain and gets down the warning hits in the test suit to something like 6 (~28 without the refcount logic), from 65 hits (counting test failures when turning to error).

njsmith · 2016-02-28T18:41:05Z

doc/release/1.11.0-notes.rst

+  the mask of ``child`` is a view into the mask of ``ma_array``.
+  Also when creating ``ma_array = MaskedArray(data, mask)`` will
+  make ``ma_array`` view the data in ``mask`` and setting an item
+  will currently copy ``mask`` but in the future modify the original.


Few issues with this text:

it's in the wrong place --it's currently under "things that will happen in 1.12", but there's no way we're going to flip the switch on this in 1.12, it's way too early.

the description here is hard to understand i think. "Setting items of the array using the square bracket notation" to me sounds like __setitem__. I'm not 100% sure what the part about MaskedArray is saying either. Maybe: "Currently, taking a view of a masked array produces a confusing result. For example, if we write masked_view = masked_original[:], then masked_view's data array will be a view of masked_original's data array, so modifications to one array's data will also affect the other: masked_view[0] = 123; assert masked_original[0] == 123. But currently, the mask array is copied, rather than taking a view, so changes to one array's mask will not affect the other: masked_view[0] = np.ma.masked; assert masked_original[0] is not np.ma.masked. A similar situation happens when explicitly constructing a masked array using MaskedArray(data, mask) -- the returned array will have a view of data but a copy of mask. In the future, these cases will be normalized so that the data and mask arrays are treated the same way, and modifications to either will propagate between views. In 1.11, numpy will issue a whatever class it is warning whenever user code modifies the mask of a view. To silence these warnings, and make your code robust against the upcoming changes, you have two options: if you want to keep the current behavior, call masked_view.unshare_mask() before modifying the mask. If you want to get the future behavior early, do masked_view._sharedmask = False."

Also: do unshare_mask and/or _sharedmask work in older versions of numpy? If so then we should say that, because downstream will be working about how to handle both old numpy and 1.11 in the same code and that will make their lives easier.

Well, there is a twist. The mask is a view until you call setitem (or unshare_mask explicitly). I am trying to adapt your text a bit to be clear on this.

Updated, but frankly, I am not sure it can be understood. It is tricky business....

Yeah, that logic was there since a very long time (probably forever). but setting _sharedmask to false, might break explicit calls to unshare_mask, hmmpf, not hat I assume there are many, but still.

Could just document that unshare_mask is unreliable after this, I can't
imagine people will want to intentionally be using both. Could also make at
least the new version of unshare_mask copy unconditionally (or maybe it
already does this) in preparation for removing _sharedmask logic entirely.
On Feb 28, 2016 11:53, "seberg" notifications@github.com wrote:

In doc/release/1.11.0-notes.rst
#7363 (comment):

@@ -41,8 +41,21 @@ Future Changes
The following changes are scheduled for Numpy 1.12.0.

Support for Python 2.6, 3.2, and 3.3 will be dropped.
-* Slicing a MaskedArray will return views of both data and mask.

Currently the mask is returned as a copy.
+* When setting items of a MaskedArray using the square bracket

notation NumPy will in the future never copy the mask. Some

examples of this are child = ma_array[::2]; view[3] = 3, where

the mask of child is a view into the mask of ma_array.

Also when creating ma_array = MaskedArray(data, mask) will

make ma_array view the data in mask and setting an item

will currently copy mask but in the future modify the original.

Yeah, that logic was there since a very long time (probably forever). but
setting _sharedmask to false, might break explicit calls to unshare_mask,
hmmpf, not hat I assume there are many, but still.

—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/7363/files#r54355225.

Yeah, added a mention. No, it only copies when the _sharedmask is True. This is the case far more often then necessary, but I think it is reliable to always copy the mask if it may be necessary (to not propagate to parent). As you said before, there might be cases where unshare mask may be wanted, though I guess in those cases an unconditional copy is likely to be perfectly fine, so making it unconditional may be an option, hmmmm (we would just not call it unless also giving the warning otherwise).

njsmith · 2016-02-28T18:59:41Z

Overall:

the release notes text in this PR needs some fixing, but so does the text that's currently in 1.11, I think.
the warning logic here seems clearer than that in master -- I'm not sure what oldsharedmask is about, and with this PR I don't have to figure it out :-)
agreed that the refcnt check is klugey and optional. But I guess if it's a temporary measure that reliably increases the specificity of the warning messages then I'm +0.5.
the changes to extras.py seem obviously beneficial (not sure how we were getting away without these before? Are these functions not tested?)

So modulo comments above my impression is that something like this should probably be merged.

Cc @jakirkham - thoughts?

seberg · 2016-02-28T19:31:44Z

@njsmith we are "getting away" with it, because our test suit is broken when it comes to warning testing, especially FutureWarnings, see my suppressed warnings PR ;).

Incorporates Nathaniels suggestions for a longer explanation in the release notes.

seberg · 2016-03-01T18:46:14Z

Bah, one other annoying thing, though maybe we can ignore it. The mask is of course never a view if it is nomask.

jakirkham · 2016-03-01T19:03:02Z

So, I haven't had lots of time to take a close look. Though my impression after looking through is this looks very similar to the previous version ( #7187 ). My criticisms are pretty much the same.

Not a fan of reference counting as this could result in weird behavior in terms of performance and validity (of warning). Concerned as this will not raise warnings when there are views of views (and so on). This is what _oldsharedmask was for @njsmith. There seem to be some functions that are tweaking internal state _sharedmask, which seems at best risky. Though those could probably be easily fixed by calling unshare_mask instead. Finally, it looks like a change is being added to the covariance/correlation computation, which seems orthogonal to the warning (feel free to correct me). Maybe that is a separate desirable change, but it should probably be discussed first and independently of this change.

Though the big question for me right now is whether this change is needed. I haven't heard any complaints about the warning level on the initial issue. Though that doesn't mean that there are no complaints. So, my question is, are there still problems downstream? Are there problems here? If so, what are they?

jakirkham · 2016-03-01T19:04:31Z

Bah, one other annoying thing, though maybe we can ignore it. The mask is of course never a view if it is nomask.

This appears to already be addressed here thanks to the fact that the first branch rules that out ( https://github.com/numpy/numpy/pull/7363/files#diff-e85f023d9c07df014a190b485db72c3dR3229 ).

seberg · 2016-03-01T20:08:29Z

@jakirkham, it annoys me because it is inconsistent in the final behaviour. Your desired end result is, that mask changes tend to propagate back to the original array, but they won't in the case of nomask.

As to the rest, as I said, the reference counting is only a further filter, and yes, it is safe even for views of views, because that is what it checks. That only "self" holds a reference to the mask and the mask is not a view. If someone had a view into that same mask, they would also -- indirectly -- have a reference to it. Feel free to proof me wrong, but I predict unless you use wekrefs you won't find a loophole. You are right about bad code smell, but I will just claim my nose is better trained.

If you check carefully, all the tweaks in _sharedmask are for temporary arrays, and they are necessary (though probably not with the refcount in place) since they cause warnings that users cannot avoid, e.g. [1]. As for the removal of the unshare_mask calls, they are obviously not needed, since the mask is replaced in the following line, and all unshare mask does is replace the mask and set _sharedmask to False.

About _oldsharedmask, if you really believe it is necessary, could you please explain if it can possibly do more then keep warning for the same masked array? That part I can see, but I don't really see it as necessary (and I am not even sure I think it is not worse).

[1]:

# have import warnings; warnings.simplefilter("always") in place.
In [43]: m = np.ma.masked_greater(np.random.random((10, 10)), 0.5)
In [44]: np.ma.median(m, axis=0)
/home/sebastian/.local/lib/python2.7/site-packages/numpy/ma/core.py:3195: MaskedArrayFutureWarning: Currently, slicing will try to return a view of the data, but will return a copy of the mask. In the future, it will try to  return both as views. This means that using `__setitem__` will propagate values back through all masks that are present.
  MaskedArrayFutureWarning
Out[44]: 
masked_array(...)

In [45]: np.ma.corrcoef(m)
/home/sebastian/.local/lib/python2.7/site-packages/numpy/ma/core.py:3195: MaskedArrayFutureWarning: Currently, slicing will try to return a view of the data, but will return a copy of the mask. In the future, it will try to  return both as views. This means that using `__setitem__` will propagate values back through all masks that are present.
  MaskedArrayFutureWarning
+ extra warnings.
Out[45]: 
masked_array(...)

In [46]:  np.nanmedian([[np.nan], [np.nan]], axis=1)
/home/sebastian/.local/lib/python2.7/site-packages/numpy/ma/core.py:3195: MaskedArrayFutureWarning: Currently, slicing will try to return a view of the data, but will return a copy of the mask. In the future, it will try to  return both as views. This means that using `__setitem__` will propagate values back through all masks that are present.
  MaskedArrayFutureWarning
/home/sebastian/.local/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:769: RuntimeWarning: All-NaN slice encountered
  warnings.warn("All-NaN slice encountered", RuntimeWarning)
/home/sebastian/.local/lib/python2.7/site-packages/numpy/lib/nanfunctions.py:769: RuntimeWarning: All-NaN slice encountered
  warnings.warn("All-NaN slice encountered", RuntimeWarning)
Out[46]: array([ nan,  nan])

jakirkham · 2016-03-01T21:38:38Z

it annoys me because it is inconsistent in the final behaviour. Your desired end result is, that mask changes tend to propagate back to the original array, but they won't in the case of nomask.

I thought you were musing over quieting the warning by a check for nomask. You are right. @mhvk raised this as well and IIRC suggested that we look into getting rid of that next.

As to the rest, as I said, the reference counting is only a further filter...

Yeah, still feel like ref counting is a bit sketchy. Not sure how much we can trust this given how masked arrays are constructed.

If you check carefully, all the tweaks in _sharedmask are for temporary arrays, and they are necessary (though probably not with the refcount in place) since they cause warnings that users cannot avoid...

Again haven't had lots of time to take a close look. It seems like a call to unshare_mask would have the same effect here and be less sketchy though. This is just my gut reaction.

About _oldsharedmask, if you really believe it is necessary, could you please explain if it can possibly do more then keep warning for the same masked array

When I have more time I'll try to give a better explanation. To say it simply and roughly, it tries to mimic what the future behavior will be. I suppose it could be tweaked more...

Though when these improvements were contemplated the question was raised about how accurate we really want to make the warning and whether more accurate might not be worse by lulling people into reliance on a nearly (not entirely) accurate warning. This brings me to my original question, which I am still not seeing the answer to. What problems are we still solving? If it is to silence a few false alarms in covariance, median, etc., maybe we should just silence those by following the recommend procedure from before namely calling unshared_mask in those places.

charris · 2016-03-16T02:31:52Z

What to do about this? Looking at Pauli's tests I see two occurances of MaskedArrayFutureWarning, one each in MatPlotLib and StatsModels, which doesn't seem much. Whether they are valid warnings, I don't know, but they vanish in comparison to the other warnings. @njsmith The _oldsharedmask thing is pretty straight forward once you figure out what is going on, which admittedly took me a while. A bit more explanation in the PR would probably have helped there. All in all, I'm inclined to leave things as they are, although I have some niggling doubts as to whether we really want to make the change.

There is mention of other independent parts of this PR that look desireable. Could someone expand on that?

jakirkham · 2016-03-16T02:52:54Z

Sorry, I was trying to be clever using _oldsharedmask, but it does make things a bit difficult to follow at first pass.

jakirkham · 2016-03-16T02:53:36Z

As for as the proposal presented here, below are my thoughts. Others may of course think differently.

Seem like good ideas:

The release notes write up seems nice.
The update warning text could be included.
The stacklevel flag for the warning could be added.

Subject of discussion:

Movement of the warning into the branch.

Seem like not good ideas:

Reference counting for warning determination.
Workarounds using "private" member variables.
Suggested use of "private" member variables.

njsmith · 2016-03-16T04:34:03Z

Movement of the warning into the branch.

My impression is that the main effect of this is to put the warning directly onto the same code path that is actually going to be changed in the future, which is usually the simplest and most accurate way to do things. Is that wrong? Can you elaborate on what the actual trade-offs are here?

Reference counting for warning determination.

It seems safe and accurate to me, but meh, whatever, I'm not so excited that I want to argue about it :-)

Workarounds using "private" member variables.
Suggested use of "private" member variables.

IIUC, what you're saying here is that you don't like the recommendation that users do marr._sharedmask = False if they want to silence the warning by opting-in to the future behavior early. (a) do I in fact understand correctly? (b) if I do, then do you have some other suggestion for how to do this? It's indeed unfortunate that it's an underscore member variable, but sometimes these things happen, and AFAIK it is safe and works correctly for all past and future versions of numpy, which is an extremely valuable property...

saimn · 2016-03-16T08:47:15Z

Hi,
I and a colleague were scratching our heads to try to get a mask shared with a masked array, so this upcoming change is a great news ! I just have one remark about this sentence in the changelog:

If you want to get the future behavior early, use masked_view._sharedmask = False

At first when reading this PR I thought we would have to wait for 1.12, and this sentence was not clear because of the meaning of "_sharedmask": setting it to False seems like it prevent the mask to be shared, though it is actually the contrary. So now that I read the code I understand, but maybe the sentence should be more explicit ?
Also it seems that this workaround can be compatible with at least 1.10 ?

seberg · 2016-03-16T10:28:56Z

If anyone wants me to do explicit fixup stuff here (i.e. remove the
refcounting check or whatever), I will do it.
My last argument: I have gh-7099, and posted the numbers here. I feel
it is easier to add this here and putting in <10 (ok, wihtout
refcounting check it was quite a bit worse) suppressions into our tests
than putting in almost 70. Could try to figure out module wide
suppressions, but I am not sure it works reliably. Plus it helps
others.

The general point whether we want it. Dunno, in general I think I
prefer the behaviour, but there are also inconsistencies with MAs that
do not have anything masked, so I would prefer if actual users like
@ahaldane spoke up.

ahaldane · 2016-03-16T17:25:54Z

I've been meaning to review this but just haven't had time! I'll take a look once I get a chance.

(by the way I wouldn't consider myself a MaskedArray user really... I got interested in MaskedArrays due to changes I wanted to make elsewhere in numpy, which snowballed)

charris · 2016-03-18T18:16:32Z

I'm going to stick with what we have for 1.11.0. If there are internal uses that raise warnings, we should deal with that at some point, but the number of warnings currently generated downstream don't strike me as excessive. I will try to update the release notes, but with some editing as I find the version here confusing, but otherwise an improvement as it offers a more extended explanation.

charris · 2016-03-18T22:09:00Z

Updated release notes in #7430, comments welcome.

charris · 2016-03-19T03:51:33Z

@seberg @jakirkham I'm thinking we might solve the nomask problem by making the _mask attribute a property and using an object that may, or may not, contain an actual mask. If no mask is contained the property getter will return nomask, otherwise an actual mask. Since a reference to that object can be retained in a view, the setter can set the mask in the view and it will show up in the original. As the maxim goes, "All problems in computer science can be solved by another level of indirection".

EDIT, We might want to control access with a lock, or maybe not, we don't bother to do that with ordinary views and I don't know if array assignments or property getters/setters are atomic.

seberg · 2016-03-19T17:32:09Z

Hehe, yeah but another level of indirection would make it even harder
to maintain ;). I am sure if you fixed up the confusing stuff in the
current text, it is good.

Ceterum censeo, that moving the place to warn has only advantages,
and those silencing _sharedmask = False things are "obviously correct"
(i.e. on temporaries) and really should be there.
But I really don't want to delay the release over this, though I always
hoped that you guys would rethink and simply agree....
But "obvious" in the eye of the beholder I guess ;).

ENH: Make no unshare mask future warnings less noisy

charris · 2016-03-19T18:33:35Z

I went and checked both Matplotlib and StatsModels for both versions and this PR does a better job. So merging. Thanks Sebastian.

jakirkham · 2016-03-19T18:40:46Z

I and a colleague were scratching our heads to try to get a mask shared with a masked array, so this upcoming change is a great news !

@saimn, that warms my heart. 😄

jakirkham · 2016-03-19T18:41:14Z

@charris, ok, if you think it is better, I trust your judgement.

charris · 2016-03-19T18:42:14Z

In particular, the StatsModel warning went away, which was good because it didn't seem associated with any test. For Matplotlib, the warning was in a different place and looked valid whereas the warning without this PR did not look correct. Not sure what the difference was there. Setting the stacklevel also helped.

charris · 2016-03-19T18:44:01Z

@jakirkham I figured the only valid way to settle the question was to gather some actual data and it does look like this PR performs better.

EDIT: I was hoping they would both work the same, but it didn't turn out that way.

jakirkham · 2016-03-19T18:45:36Z

No need to keep selling it. 😄 I have learned my lesson after the disaster that was my first attempt at this warning. 😉

ahaldane · 2016-03-19T18:48:32Z

A little late here since you just merged, but I was just looking through the history of changes here and this PR looks ok to me.

I was at first a little worried about the refcounting, but @seberg's comment that this is merely a further filter on the warning satisfied me: There are no cases where we should emit a warning but don't (it can't happen that the mask is really shared but appears unshared by refounting), although there are still some rare cases where we probably don't need to warn but do (eg if the users manually makes a reference to arr._mask, but doesn't write to it).

charris · 2016-03-19T18:52:15Z

@seberg Just to be sure, assignment to a masked data value can also change the mask, does this cover that case?

seberg · 2016-03-19T19:11:38Z

@charris sorry, a bit daft here, can you give an example? My logic here always was: Put the warning where the future change is and trust that it must be right.

charris · 2016-03-19T19:21:33Z

@SeberT Actually, looks like a miss...

In [1]: import warnings

In [2]: warnings.simplefilter('always')

In [3]: x = ma.array(eye(2), mask=1)

In [4]: x
Out[4]: 
masked_array(data =
 [[-- --]
 [-- --]],
             mask =
 [[ True  True]
 [ True  True]],
       fill_value = 1e+20)

In [5]: x[0,0] = 1

In [6]: x
Out[6]: 
masked_array(data =
 [[1.0 --]
 [-- --]],
             mask =
 [[False  True]
 [ True  True]],
       fill_value = 1e+20)

No warning issued.

charris · 2016-03-19T19:25:32Z

NVM, bad example...

EDIT: Looks OK.

seberg · 2016-03-19T19:28:31Z

Heh, almost difficult to get a good example:

In [1]: mask = np.ones((2, 2), dtype=bool)

In [2]: import warnings; warnings.simplefilter("always")

In [3]: x = np.ma.MaskedArray(np.eye(2), mask)

In [4]: x[0, 0] = 1
runtests.py:1: MaskedArrayFutureWarning: setting an item on a masked array which has a shared mask will not copy the mask and also change the original mask array in the future.
Check the NumPy 1.11 release notes for more information.
  #!/usr/bin/env python

seberg mentioned this pull request Feb 28, 2016

ENH: Make no unshare mask future warnings less noisy #7187

Closed

njsmith reviewed Feb 28, 2016
View reviewed changes

seberg force-pushed the masked-silence-2 branch from 6913d75 to 11ba029 Compare February 28, 2016 19:30

seberg force-pushed the masked-silence-2 branch from 11ba029 to c8c338b Compare February 28, 2016 20:29

charris added component: numpy.ma masked arrays 03 - Maintenance labels Feb 29, 2016

ENH: Make no unshare mask future warnings less noisy

ed083cf

Incorporates Nathaniels suggestions for a longer explanation in the release notes.

seberg force-pushed the masked-silence-2 branch from c8c338b to ed083cf Compare March 1, 2016 18:44

charris added a commit that referenced this pull request Mar 19, 2016

Merge pull request #7363 from seberg/masked-silence-2

a7c8600

ENH: Make no unshare mask future warnings less noisy

charris merged commit a7c8600 into numpy:master Mar 19, 2016

homu mentioned this pull request Mar 19, 2016

DOC: Update 1.11.0-notes. #7430

Merged

seberg deleted the masked-silence-2 branch March 19, 2016 19:11

jakirkham mentioned this pull request Sep 7, 2016

BUG, DEP: Fix masked arrays to properly edit views. ( #5558 ) #5580

Merged

ENH: Make no unshare mask future warnings less noisy #7363

ENH: Make no unshare mask future warnings less noisy #7363

Conversation

seberg commented Feb 28, 2016

njsmith Feb 28, 2016

Choose a reason for hiding this comment

njsmith Feb 28, 2016

Choose a reason for hiding this comment

seberg Feb 28, 2016

Choose a reason for hiding this comment

seberg Feb 28, 2016

Choose a reason for hiding this comment

seberg Feb 28, 2016

Choose a reason for hiding this comment

njsmith Feb 28, 2016

Choose a reason for hiding this comment

seberg Feb 28, 2016

Choose a reason for hiding this comment

njsmith commented Feb 28, 2016

seberg commented Feb 28, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

jakirkham commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

charris commented Mar 16, 2016

jakirkham commented Mar 16, 2016

jakirkham commented Mar 16, 2016

njsmith commented Mar 16, 2016

saimn commented Mar 16, 2016

seberg commented Mar 16, 2016

ahaldane commented Mar 16, 2016

charris commented Mar 18, 2016

charris commented Mar 18, 2016

charris commented Mar 19, 2016

seberg commented Mar 19, 2016

charris commented Mar 19, 2016

jakirkham commented Mar 19, 2016

jakirkham commented Mar 19, 2016

charris commented Mar 19, 2016

charris commented Mar 19, 2016

jakirkham commented Mar 19, 2016

ahaldane commented Mar 19, 2016

charris commented Mar 19, 2016

seberg commented Mar 19, 2016

charris commented Mar 19, 2016

charris commented Mar 19, 2016

seberg commented Mar 19, 2016