Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: migration guide and errors for ndarray.ptp() should mention np.ma.ptp() #26530

Closed
megies opened this issue May 25, 2024 · 9 comments
Closed

Comments

@megies
Copy link
Contributor

megies commented May 25, 2024

Describe the issue:

EDIT: See below comment on np.ma.ptp()

np.ptp() changed its behavior (compared to the old ndarray.ptp()) when used on masked arrays with axis parameter if the expected result is a partially masked output array. It ignores the mask in the computation and also returns a masked array but without a proper mask.

Note that without the reshape and axis parameter in the below example it gets even a bit weirder. Old behavior was to return a scalar int64, new behavior is to return a masked array with a single item and no mask.

I believe this is a bug, since neither the maintenance_2.x changelog nor the issue tracker show anything about behavior changes for ptp.

Stumbled across this in our test suite when getting our downstream package obspy ready for numpy 2.0

Reproduce the code example:

import numpy as np

# example with reshape and axis
x = np.arange(12).reshape((3, 4))
x = np.ma.masked_less(x, 6)
print(x)
ptp = np.ptp(x, axis=1)
print(ptp)
print(ptp.mask)

# example with a 1d array and no axis parameter
x = np.arange(12)
x = np.ma.masked_less(x, 6)
print(x)
ptp = np.ptp(x)
print(ptp)
print(type(ptp))
print(ptp.mask)

Error message:

## Old output numpy 1.23.2

#### example with reshape and axis

[[-- -- -- --]
 [-- -- 6 7]
 [8 9 10 11]]

[-- 1 3]

[ True False False]


#### example with 1d array

[-- -- -- -- -- -- 6 7 8 9 10 11]

5

<class 'numpy.int64'>

Traceback (most recent call last):
  File "/tmp/npptp.py", line 9, in <module>
    print(ptp.mask)
AttributeError: 'numpy.int64' object has no attribute 'mask'


## New output numpy 2.0.0rc2

#### example with reshape and axis

[[-- -- -- --]
 [-- -- 6 7]
 [8 9 10 11]]

[3 3 3]

False

#### example with 1d array


[-- -- -- -- -- -- 6 7 8 9 10 11]
11
<class 'numpy.ma.MaskedArray'>
False

Python and NumPy Versions:

Above output compares 1.23.2 and 2.0.0rc2

Runtime Environment:

[{'numpy_version': '2.0.0rc2',
  'python': '3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) '
            '[GCC 12.3.0]',
  'uname': uname_result(system='Linux', node='mother', release='5.10.0-25-amd64', version='#1 SMP Debian 5.10.191-1 (2023-08-16)', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2',
                                'AVX512F',
                                'AVX512CD',
                                'AVX512_SKX',
                                'AVX512_CLX',
                                'AVX512_CNL',
                                'AVX512_ICL'],
                      'not_found': ['AVX512_KNL', 'AVX512_KNM', 'AVX512_SPR']}},
 {'architecture': 'SkylakeX',
  'filepath': '/home/megies/miniconda3/envs/np2/lib/libopenblasp-r0.3.27.so',
  'internal_api': 'openblas',
  'num_threads': 8,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.27'}]
None

Context for the issue:

Well, it seems to make in ptp unusable on masked arrays and it would need major contortions to restore the old behavior (like filling masked values with the minimum along that axis and then also copying the mask over to the result), which doesn't seem to be what numpy wants. Masked arrays "just work" and this issue would mean a breach of that rule.

EDIT: When migrating code that at runtime encounters a mix of regular and masked arrays, just applying the proposed change from the migration guide ("ptp -- Use np.ptp(arr, ...) instead") will break users' codes if they don't realize that np.ma.ptp() is needed for masked arrays.

@megies megies changed the title BUG: np.ptp() changed behavior on masked arrays with axis parameter with v2.0.0rc2 BUG: np.ptp() changed behavior on masked arrays with v2.0.0rc2 May 25, 2024
@megies
Copy link
Contributor Author

megies commented May 25, 2024

I just realized that np.ma.ptp() exists and seems to do the old behavior on masked arrays.

So, then it seems that old behavior can be reproduced by doing something bloaty like this:

def good_old_ptp(x, *args, **kwargs):
    if np.ma.is_masked(x):
        return np.ma.ptp(x, *args, **kwargs)
    return np.ptp(x, *args, **kwargs)

In any case, if this goes like this into 2.0 unchanged, then I would at least change the shown message and changelog..

from..

ptp -- Use np.ptp(arr, ...) instead.

to..

ptp -- Use np.ptp(arr, ...) or np.ma.ptp(arr, ...) instead.

@megies megies changed the title BUG: np.ptp() changed behavior on masked arrays with v2.0.0rc2 BUG: if/else needed for new np.ptp() / np.ma.ptp() if potential masked arrays with v2.0.0rc2 May 25, 2024
@ngoldbaum ngoldbaum changed the title BUG: if/else needed for new np.ptp() / np.ma.ptp() if potential masked arrays with v2.0.0rc2 DOC: migration guide for ndarray.ptp() should mention np.ma.ptp() May 28, 2024
@ngoldbaum ngoldbaum changed the title DOC: migration guide for ndarray.ptp() should mention np.ma.ptp() DOC: migration guide and errors for ndarray.ptp() should mention np.ma.ptp() May 28, 2024
@ngoldbaum
Copy link
Member

I don't think np.ptp should grow the ability to deal with masked arrays, since that choice follows the pattern where the ufuncs exposed in the top-level namespace don't handle masked arrays but the ones in np.ma do. Instead, I updated the title and classified this as a doc issue. It was definitely missed that this adds a bit of subtlety for code dealing with masked arrays and the migration guide and AttributeError should mention this subtlety.

@ngoldbaum ngoldbaum added this to the 2.0.0 release milestone May 28, 2024
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label May 28, 2024
@megies
Copy link
Contributor Author

megies commented May 29, 2024

I don't think np.ptp should grow the ability to deal with masked arrays, since that choice follows the pattern where the ufuncs exposed in the top-level namespace don't handle masked arrays but the ones in np.ma do. Instead, I updated the title and classified this as a doc issue. It was definitely missed that this adds a bit of subtlety for code dealing with masked arrays and the migration guide and AttributeError should mention this subtlety.

I can follow that reasoning. In that case, personally I'd rather see an ValueError if a regular routine runs into a MaskedArray though. At least I would argue np.ptp() on a masked array should return a regular array then, not a MaskedArray without a mask.

where the ufuncs exposed in the top-level namespace don't handle masked arrays

However, please consider this:
np.mean() called on a MaskedArray behaves exactly like np.ma.mean() and not like a routine oblivious to what a MaskedArray is.

import numpy as np

x = np.arange(12).reshape((3, 4))
x = np.ma.masked_less(x, 6)
print(repr(x))
print(repr(np.mean(x, axis=1)))

Screenshot from 2024-05-29 12-52-05

Not trying to be a pain, just trying to help this be a good release. ❤️

I can and will add a helper function in our project that picks the correct routine to use from amongst np.ptp and np.ma.ptp depending upon the array type, so for me personally this is solved.

@ngoldbaum
Copy link
Member

Ah, I didn't realize it returned a masked array with no mask. That doesn't seem great.

I think mean works because it tries to check for a mean attribute on the operand.

np.ptp is also written in Python, so maybe it could grow a similar code path? That makes sense to do given this issue with masked arrays. My only doubt is that might add some additional overhead, especially for ndarray subclasses like astropy quantities that don't need special treatment.

PRs welcome but it would be nice to see a benchmark with e.g. a trivial ndarray subclass that doesn't define a ptp method to see if the overhead of checking for the attribute is significant for small arrays.

Also a case could be made for bringing back ndarray.ptp, I don't think this subtlety for masked arrays was caught when we decided to remove it.

Ping @rgommers do you have a take on what the best way forward here is?

@rgommers
Copy link
Member

np.ptp is also written in Python, so maybe it could grow a similar code path?

Please no. What np.mean and a few other reductions do is terrible, and we should get rid of that at some point.

The problem boils down to this:

>>> import numpy as np
>>> x = np.arange(12).reshape((3, 4))
... x = np.ma.masked_less(x, 6)
>>> np.asarray(x)
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

What we should do is make np.asarray & co raise on masked array input. There is no way that this makes sense or is fixable.

@rgommers
Copy link
Member

Nothing here is specific to ptp, it's just a symptom of a large structural issue.

@ngoldbaum
Copy link
Member

OK, glad I checked. Ralf had a lot more context on this stuff than I do so I'll defer to him.

@megies
Copy link
Contributor Author

megies commented May 29, 2024

Thanks @ngoldbaum and @rgommers, looks like you got a good way forward figured out. And yes, for me it seems like one of the good ways forward to just raise on calling a non ma routine on a masked array. 👍

@rgommers
Copy link
Member

I opened gh-26669 for the structural fix. np.ptp(a_masked_array) was never a correct or supported thing to do, so I don't think there is a need to do anything last-minute for 2.0 here or leave this issue open.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants