Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

Open
mhvk opened this issue Oct 21, 2023 · 8 comments
Open

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

mhvk opened this issue Oct 21, 2023 · 8 comments

Comments

@mhvk
Copy link
Contributor

mhvk commented Oct 21, 2023

Proposed new feature or change:

(Discussed in #23912 (comment))

The function np.clip arguably has surprising casting behaviour:

a = np.arange(5, dtype='u1')
np.clip(a, -1, 3)
# OverflowError with NEP 50
np.clip(a, np.int64(-1), np.int64(3))
# array([0, 1, 2, 3, 3])  
# int64 dtype with NEP 50
# (before NEP 50, both examples gave int16)

I would naively have expected for the output dtype to always be the same as the input one. That this does not happen is because internally np.clip calls a ufunc:

def _clip(a, min=None, max=None, out=None, **kwargs):
if min is None and max is None:
raise ValueError("One of max or min must be given")
if min is None:
return um.minimum(a, max, out=out, **kwargs)
elif max is None:
return um.maximum(a, min, out=out, **kwargs)
else:
return um.clip(a, min, max, out=out, **kwargs)

and these treat the arguments symmetrically.

It is possible to get the output dtype by setting out or dtype, but in the current implementation that still gives either the OverflowError or casting errors:

np.clip(a, np.int64(-1), np.int64(3), out=a)  # or dtype=a.dtype
# UFuncTypeError: Cannot cast ufunc 'clip' output from dtype('int64') to dtype('uint8') with casting rule 'same_kind'

adding casting="unsafe" gives the wrong answer, because -1 becomes 255.

I think it should be possible to make the np.clip function (probably not the ufunc) cast the min and max to a.dtype, but ensure that the valid ranges are respected (i.e., negative integers would become 0 if the dtype is unsigned). This would be similar to what was done in #24915, i.e., ideally we have the behaviour of np.clip be identical to

min = -1
max = 3
out = a.copy()
out[a<min] = min
out[a>max] = max
return out

(which still gives an out-of-bound error for min=-1 because of __setitem__, but works for min=np.int64(-1))

But perhaps this is more work than is warranted.

@asmeurer
Copy link
Member

asmeurer commented Jun 7, 2024

This is also what we decided for the array API https://data-apis.org/array-api/latest/API_specification/generated/array_api.clip.html#clip

But I'm a little unclear what the ideal behavior should when the min or max has a higher range than the input. Consider

>>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None)
128

the result is a (promoted) int16. If we downcast the result back to int8, we get -128:

>>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None).astype(np.int8)
-128

This is also what happens with the suggested out[a<min] = min behavior:

>>> a, min = np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16)
>>> out = a.copy()
>>> out[a<min] = min
>>> out
array(-128, dtype=int8)

Should this be considered the correct answer? It seems to me another possibility would be for clip to avoid wrapping and instead "clip", as it were, large values to iinfo(a.dtype).max (i.e., the above would return 127)?

For floats, downcasting just overflows to +/- inf, which is probably what you would want.

Or should we reconsider the decision in the array API, and the suggestion here, to make clip not perform type promotion?

@mhvk
Copy link
Contributor Author

mhvk commented Jun 8, 2024

@asmeurer - ah, never thought about the case when the minimum is larger than the largest value one can express... It might not be crazy to just error on that case...

Though perhaps we are overthinking it, and should try to fix just the python min/max weak promotion case. For that case, raising an error would be consistent with setting analogy, since that should eventually error too, at least according to the deprecation warning that is currently raised when setting an array element with an out-of-bound integer:

DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 256 to uint8 will fail in the future.

Maybe it is OK to use regular ufunc promotion when min, max have different dtype (or at least we can punt...).

@asmeurer
Copy link
Member

Interesting. I don't see that deprecation warning when I run out[a<min] = min. Is it supposed to show there?

I agree if that sort of thing already deprecated in other places then it makes sense to disallow it here too.

@asmeurer
Copy link
Member

Oh I see, that warning (now actually an error in NumPy 2.0) comes from setting an array with an out-of-bounds Python int. That's a very different thing that downcasting an array with a higher precision integer dtype. I expect that sort of thing is done all the time and isn't something NumPy would want to deprecate.

@mhvk
Copy link
Contributor Author

mhvk commented Jun 12, 2024

Yes, indeed, it is just python integers that are treated differently, and I wondered if the easiest solution would be to extend that to np.clip. It seems fairly reasonable to just change dtype if one passes in multiple arrays -- e.g., np.clip(input_u1, min_i2, max_i2) -- and then by analogy we should do the same thing for numpy scalars.

p.s. Note that this is different from what I suggested on top!

asmeurer added a commit to asmeurer/array-api that referenced this issue Jun 13, 2024
…the bounds of x

As discussed in today's consortium meeting.

See the discussion at numpy/numpy#24976.
@asmeurer
Copy link
Member

FWIW, we decided to make this behavior unspecified in the standard. data-apis/array-api#814

@jni
Copy link
Contributor

jni commented Jun 18, 2024

I just want to add a voice to this thread that np.clip as currently implemented (numpy 2.0.0) is quite hard to use:

>>> np.clip(np.array([0, 255], dtype=np.uint8), 0, 4550)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 2247, in clip
    return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/_methods.py", line 108, in _clip
    return um.clip(a, min, max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer 4550 out of bounds for uint8

I think most users in this scenario would expect to be able to use arrays of any type and Python ints and just have it work — there is nothing unsatisfiable about this expression, including keeping the input dtype.

When you start dealing with numpy scalars and so on, the story is indeed more complicated as noted in the above discussion, but the Python int scenario seems like an easy fix (as noted by @mhvk above) that would unlock a lot of uses already.

Sorry about the noise, I expect there will be lots in this repo this coming week. 😅 Thank you all! 🙏

@mhvk
Copy link
Contributor Author

mhvk commented Jun 18, 2024

Indeed, a fix at least for python ints like for comparisons seems the way forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants