Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing a MaskedArray using mask=np.bool_(False) gives inconsistent mask result #13758

Closed
taldcroft opened this issue Jun 12, 2019 · 10 comments

Comments

@taldcroft
Copy link

Initializing a MaskedArray using mask=np.bool_(False) gives a different result than mask=False. Note that np.bool_(False) is what one gets for MaskedArray([1, 2]).mask.

Reproducing code example:

>>> import numpy as np
>>> np.ma.MaskedArray([1,2], mask=np.bool_(False)).mask
False

>>> np.ma.MaskedArray([1,2], mask=False).mask
array([False, False])

Numpy/Python version information:

1.16.4 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

@mhvk
Copy link
Contributor

mhvk commented Jun 12, 2019

The cause is that a test is done for whether the input mask is np.ma.nomask and...

np.ma.nomask is np.bool_(False)
# True

So, at least it is consistent that if you define ma = MaskedArray([1, 2]) and then do MaskedArray(ma.data, mask=ma.mask) you get the same result out.

But it might be good to document this...

@mattip
Copy link
Member

mattip commented Jun 12, 2019

I don't think we can change the np.ma.nomask value to a sentinal rather than np.bool_(0). Adding documentation to both the constant and creating a masked array (which could use more examples anyway) describing the pitfall would be good.

@taldcroft taldcroft changed the title Initializing a MaskedArray using mask=np.bool_(False) gives wrong mask result Initializing a MaskedArray using mask=np.bool_(False) gives inconsistent mask result Jun 12, 2019
@taldcroft
Copy link
Author

Thanks for the explanations! For me the root issue is really the MaskedArray behavior of the mask attribute not having the data shape by default and instead being a scalar False. But that ship has very long ago sailed..

@mhvk
Copy link
Contributor

mhvk commented Jun 12, 2019

But that ship has very long ago sailed..

Agreed that that was a design mistake.

Since all tests that we have are of the form self.mask is nomask, I think that we're stuck with nomask being a singleton, but I think that in principle we could break the link to np.bool_(False). If we then redefine nomask to be a singleton that just returns itself on slicing, and transforms into a full mask on setting, we could (perhaps) have our cake (of not using more memory than needed) and eat it (of looking like an array) too! Though somehow keeping a link between the data and the singleton would be tricky.

Anyway, for now definitely important to document it.

@taldcroft
Copy link
Author

The sliceable singleton seems like a pretty clever idea if you can make it work!

@miccoli
Copy link
Contributor

miccoli commented Jul 24, 2019

Please consider also issue #7781. The fact that nomask is a singleton has also the side-effect that in some cases the mask is not properly propagated between views/slices of the same masked array.

>>> import numpy as np
>>> a = np.ma.array(np.arange(10).reshape(2,-1))
>>> a
masked_array(
  data=[[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]],
  mask=False,
  fill_value=999999)
>>> a[0][0] = np.ma.masked  # does not work because mask is False
>>> a
masked_array(
  data=[[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]],
  mask=False,
  fill_value=999999)
>>> a[-1,-1] = np.ma.masked  # this obviously works
>>> a
masked_array(
  data=[[0, 1, 2, 3, 4],
        [5, 6, 7, 8, --]],
  mask=[[False, False, False, False, False],
        [False, False, False, False,  True]],
  fill_value=999999)
>>> a[0][0] = np.ma.masked  # now mask is an array, so this works as expected
>>> a
masked_array(
  data=[[--, 1, 2, 3, 4],
        [5, 6, 7, 8, --]],
  mask=[[ True, False, False, False, False],
        [False, False, False, False,  True]],
  fill_value=999999)

The fact that a[0][0] = np.ma.masked gives different results depending on the value of a.mask is troublesome

@panpiort8
Copy link
Contributor

As #13980 seems stale, can I make my own PR?

@taldcroft
Copy link
Author

@panpiort8 - sure!

@rgommers
Copy link
Member

rgommers commented May 4, 2020

This behavior was documented.

If we then redefine nomask to be a singleton that just returns itself on slicing, and transforms into a full mask on setting, we could (perhaps) have our cake (of not using more memory than needed) and eat it (of looking like an array) too! Though somehow keeping a link between the data and the singleton would be tricky.

This idea is possibly still actionable. It does go well beyond the scope of of this issue though, and it's not clear if it's worth the effort.

Removing good first issue, because the easy part was done. I'll leave this open, but pleaseclose it @mhvk if you don't think it's needed anymore.

@mhvk
Copy link
Contributor

mhvk commented May 4, 2020

The documentation part was the most important - the auto-conversion is a nice idea, but I think hard to get right and given that there are efforts to start afresh with MaskedArray, probably better to leave it to those. (Though it would probably be good to document how one actually wants a masked array to behave first....) [sorry for writing in the wrong place first!]

@mhvk mhvk closed this as completed May 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants