Skip to content

ma.empty/ones/zeros_like to copy mask properly #3404

Closed
wants to merge 3 commits into from

4 participants

@tomyun
tomyun commented Jun 6, 2013

As reported in #3145 and #2572, ma.empty_like does not copy, but merely assigns a reference to the original mask, leading to undesirable results.

This patch makes numpy.ma.empty_like() to work properly, in addition to newly introduced numpy.ma.ones_like() and numpy.ma.zeros_like().

However, numpy.empty_like() and siblings still do not handle masks correctly. This could be a work demanding more lower-level changes.

tomyun added some commits Jun 5, 2013
@tomyun tomyun Force empty/ones/zeros_like() to copy mask
As underlying `empty_like()` knows nothing about mask, it would be
`_convert2ma`'s job to properly set up the output mask. `copy` argument
is added to figure out which functions need this behavior.

`ones_like()` and `zeros_like()` which were mere aliases of regular
numpy functions now go through `_convert2ma` as well.

Hopefully this fixes #3145, #2572.
a5a3457
@tomyun tomyun Include ones_like, zeros_like public for import * 0908295
@tomyun tomyun Add a regression test for #3145 9543818
@njsmith njsmith commented on the diff Jun 6, 2013
numpy/ma/tests/test_regression.py
@@ -70,5 +70,17 @@ def test_ddof_corrcoef(self):
# ddof should not have an effect (it gets cancelled out)
assert_allclose(r0.data, r1.data)
+ def test_like_mask_copy(self):
+ """Issue gh-3145"""
+ m = [False, False, False, True]
+ a = np.ma.array([1, 2, 3, 4], mask=m)
+ a_empty = np.ma.empty_like(a)
+ a_ones = np.ma.ones_like(a)
+ a_zeros = np.ma.zeros_like(a)
+ a.mask = False
+ assert_array_equal(a_empty.mask, m)
+ assert_array_equal(a_ones.mask, m)
+ assert_array_equal(a_zeros.mask, m)
@njsmith
NumPy member
njsmith added a note Jun 6, 2013

So this says that all the *_like functions return arrays in which everything that was masked in the input, is masked in the output. This seems odd to me. Shouldn't e.g. ones_like give you an array whose contents is all-ones and whose mask is all-False?

@tomyun
tomyun added a note Jun 6, 2013

That might depend on the design of API. Yet, many users would expect *_like functions return arrays of same shape and mask. In either way, current implementation returns a new array with old mask, which is definitely confusing.

@njsmith
NumPy member
njsmith added a note Jun 14, 2013

Just to follow up, it looks like everyone on the mailing list prefers the API where the *_like functions return a new array with the mask set to nomask, rather than copying the input array's mask, and that's my intuition as well. Want to update the PR to do that? (Or if you have an argument for why it should be the other way around, bring that up on the list?)

Also Pierre GM made a suggestion for how the _convert2ma function might be changed that you might want to take a look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@seberg
NumPy member
seberg commented Jun 6, 2013

What I am wondering here is, if the mask should not be copied whenever in array_finalize the memory between the two arrays is not shared. Also in the case where it is shared we should create a view due to reshape bugs (there are some issues about this open). The latter I once started (but have not much clue if it is right nor the time to think much about it in seberg@master...safemask . Though in that code I used apparently used the owndata flag, which I think should likely be replaced with the may_share_memory check mentioned.

Just if you have some time and interested and use/know masked arrays. I think it might solve other smaller issues elsewhere, too. Though I don't know if someone might actually work with the fact that masks are not copied sometimes when the array is (but it seems weird to me).

@njsmith
NumPy member
@njsmith
NumPy member
njsmith commented Jul 17, 2013

A follow-up that won't be captured in the above thread, with a dissenting opinion:
http://mail.scipy.org/pipermail/numpy-discussion/2013-July/067136.html

@charris
NumPy member
charris commented Aug 16, 2013

Needs decision.

@charris
NumPy member
charris commented Mar 28, 2014

Well, that's no help ;) The majority voted for nomask, whereas Gregorio wants a mask copy and has code that uses that assumption. Hmm..., I think backward compatibility (almost) requires the copy, whereas changing the mask values would require deprecation. So I vote for this patch as is. Thoughts?

@charris
NumPy member
charris commented Mar 28, 2014

Or at least for empty_like, and having the new zeros_like and ones_like behave differently would be confusing. Of course, once they are all in, I suspect further changes are off the table.

@charris
NumPy member
charris commented Mar 28, 2014

Alternatively, we could use an ma specific keyword to determine which behavior to use, say clearmask=0 or nomask=0.

@njsmith njsmith added a commit to njsmith/numpy that referenced this pull request Oct 18, 2014
@njsmith njsmith BUG: copy inherited masks in MaskedArray.__array_finalize__
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in gh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in gh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
2745807
@njsmith
NumPy member
njsmith commented Oct 18, 2014

Closing in favor of #5203. The commentary in this PR is still relevant, but progress is stalled and the actual change currently in the branch is a subset of what's in #5203.

@njsmith njsmith closed this Oct 18, 2014
@njsmith njsmith added a commit to njsmith/numpy that referenced this pull request Oct 19, 2014
@njsmith njsmith BUG: copy inherited masks in MaskedArray.__array_finalize__
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in gh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in gh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
f4f38f6
@njsmith njsmith added a commit to njsmith/numpy that referenced this pull request Oct 21, 2014
@njsmith njsmith BUG: copy inherited masks in MaskedArray.__array_finalize__
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in gh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in gh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
3205c89
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.