Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: / and /= behaves different in masked arrays with nan #20506

Open
scratchmex opened this issue Dec 2, 2021 · 1 comment · May be fixed by #20551
Open

BUG: / and /= behaves different in masked arrays with nan #20506

scratchmex opened this issue Dec 2, 2021 · 1 comment · May be fixed by #20551

Comments

@scratchmex
Copy link
Contributor

scratchmex commented Dec 2, 2021

Describe the issue:

I think the operators / and /= should behave the same as one is just a shorthand for the other. They do not with masked arrays and with nans as you see in the code example. The / auto masks the nan and /= does not.
We should see why __truedivide__ and __itruedivide__ are implemented different for MaskedArray class.

Reproduce the code example:

>> a = np.ma.array([1, np.nan])
>> b = [1, np.nan]

>>> a / b
[1.0 --]

>> a /= b
>>> a
[ 1. nan]

Error message:

No response

NumPy/Python version information:

this happens in main right now.
1.22.0.dev0+1977.g114d91919 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0]

@scratchmex scratchmex changed the title BUG: / and /= behaves different with nan BUG: / and /= behaves different in masked arrays with nan Dec 2, 2021
@WarrenWeckesser WarrenWeckesser added the component: numpy.ma masked arrays label Dec 4, 2021
@math2001
Copy link

math2001 commented Dec 9, 2021

The code that handles __itruediv__ and __truediv__ is very different unfortunately. Two different strategies are used to protect against division by zero, which have different side-effects on NaN, causing this bug.

__itruediv__. Mask bad values pre division

numpy/numpy/ma/core.py

Lines 4325 to 4341 in d565c4b

def __itruediv__(self, other):
"""
True divide self by other in-place.
"""
other_data = getdata(other)
dom_mask = _DomainSafeDivide().__call__(self._data, other_data)
other_mask = getmask(other)
new_mask = mask_or(other_mask, dom_mask)
# The following 3 lines control the domain filling
if dom_mask.any():
(_, fval) = ufunc_fills[np.true_divide]
other_data = np.where(dom_mask, fval, other_data)
self._mask |= new_mask
self._data.__itruediv__(np.where(self._mask, self.dtype.type(1),
other_data))
return self

The whole thing is in self._data.__itruediv__(np.where(self._mask, self.dtype.type(1), other_data)), noting that self._mask has been updated ORed with the other's mask, and the safe domain. Dividing by NaN is considered safe by _DomainSafeDivide, which results in NaN post-division. It isn't masked.

__truediv__. Filter out !umath.isfinite values post division

numpy/numpy/ma/core.py

Lines 1153 to 1164 in d565c4b

(da, db) = (getdata(a), getdata(b))
# Get the result
with np.errstate(divide='ignore', invalid='ignore'):
result = self.f(da, db, *args, **kwargs)
# Get the mask as a combination of the source masks and invalid
m = ~umath.isfinite(result)
m |= getmask(a)
m |= getmask(b)
# Apply the domain
domain = ufunc_domain.get(self.f, None)
if domain is not None:
m |= domain(da, db)

Problem is here: m = ~umath.isfinite(result). umath.isfinite(np.nan) is False, which gets negated, and so NaN is masked off.

Solving

I couldn't it explicitly in the docs, but best on the masked arrays' rationale, we should get a masked value, not NaN.

So the problem comes from _DomainSafeDivide:

numpy/numpy/ma/core.py

Lines 842 to 851 in d565c4b

def __call__(self, a, b):
# Delay the selection of the tolerance to here in order to reduce numpy
# import times. The calculation of these parameters is a substantial
# component of numpy's import time.
if self.tolerance is None:
self.tolerance = np.finfo(float).tiny
# don't call ma ufuncs from __array_wrap__ which would fail for scalars
a, b = np.asarray(a), np.asarray(b)
with np.errstate(invalid='ignore'):
return umath.absolute(a) * self.tolerance >= umath.absolute(b)

NaN > NaN is False (safe), but we want True (unsafe). So, we should extend the return like so: return <check_small_values> | isnan(a) | isnan(b)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants