ENH: Ensure that output of np.clip has the same dtype as the main array #24976

mhvk · 2023-10-21T15:50:34Z

Proposed new feature or change:

The function np.clip arguably has surprising casting behaviour:

a = np.arange(5, dtype='u1')
np.clip(a, -1, 3)
# OverflowError with NEP 50
np.clip(a, np.int64(-1), np.int64(3))
# array([0, 1, 2, 3, 3])  
# int64 dtype with NEP 50
# (before NEP 50, both examples gave int16)

I would naively have expected for the output dtype to always be the same as the input one. That this does not happen is because internally np.clip calls a ufunc:

numpy/numpy/_core/_methods.py

Lines 92 to 101 in d885b0b

    
           def _clip(a, min=None, max=None, out=None, **kwargs): 
        
               if min is None and max is None: 
        
                   raise ValueError("One of max or min must be given") 
        
               if min is None: 
        
                   return um.minimum(a, max, out=out, **kwargs) 
        
               elif max is None: 
        
                   return um.maximum(a, min, out=out, **kwargs) 
        
               else: 
        
                   return um.clip(a, min, max, out=out, **kwargs)

and these treat the arguments symmetrically.

It is possible to get the output dtype by setting out or dtype, but in the current implementation that still gives either the OverflowError or casting errors:

np.clip(a, np.int64(-1), np.int64(3), out=a)  # or dtype=a.dtype
# UFuncTypeError: Cannot cast ufunc 'clip' output from dtype('int64') to dtype('uint8') with casting rule 'same_kind'

adding casting="unsafe" gives the wrong answer, because -1 becomes 255.

I think it should be possible to make the np.clip function (probably not the ufunc) cast the min and max to a.dtype, but ensure that the valid ranges are respected (i.e., negative integers would become 0 if the dtype is unsigned). This would be similar to what was done in #24915, i.e., ideally we have the behaviour of np.clip be identical to

min = -1
max = 3
out = a.copy()
out[a<min] = min
out[a>max] = max
return out

(which still gives an out-of-bound error for min=-1 because of __setitem__, but works for min=np.int64(-1))

But perhaps this is more work than is warranted.

The text was updated successfully, but these errors were encountered:

asmeurer · 2024-06-07T22:56:29Z

This is also what we decided for the array API https://data-apis.org/array-api/latest/API_specification/generated/array_api.clip.html#clip

But I'm a little unclear what the ideal behavior should when the min or max has a higher range than the input. Consider

>>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None)
128

the result is a (promoted) int16. If we downcast the result back to int8, we get -128:

>>> np.clip(np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16), None).astype(np.int8)
-128

This is also what happens with the suggested out[a<min] = min behavior:

>>> a, min = np.asarray(0, dtype=np.int8), np.asarray(128, dtype=np.int16)
>>> out = a.copy()
>>> out[a<min] = min
>>> out
array(-128, dtype=int8)

Should this be considered the correct answer? It seems to me another possibility would be for clip to avoid wrapping and instead "clip", as it were, large values to iinfo(a.dtype).max (i.e., the above would return 127)?

For floats, downcasting just overflows to +/- inf, which is probably what you would want.

Or should we reconsider the decision in the array API, and the suggestion here, to make clip not perform type promotion?

mhvk · 2024-06-08T21:11:13Z

@asmeurer - ah, never thought about the case when the minimum is larger than the largest value one can express... It might not be crazy to just error on that case...

Though perhaps we are overthinking it, and should try to fix just the python min/max weak promotion case. For that case, raising an error would be consistent with setting analogy, since that should eventually error too, at least according to the deprecation warning that is currently raised when setting an array element with an out-of-bound integer:

DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 256 to uint8 will fail in the future.

Maybe it is OK to use regular ufunc promotion when min, max have different dtype (or at least we can punt...).

asmeurer · 2024-06-11T22:31:21Z

Interesting. I don't see that deprecation warning when I run out[a<min] = min. Is it supposed to show there?

I agree if that sort of thing already deprecated in other places then it makes sense to disallow it here too.

asmeurer · 2024-06-11T22:34:46Z

Oh I see, that warning (now actually an error in NumPy 2.0) comes from setting an array with an out-of-bounds Python int. That's a very different thing that downcasting an array with a higher precision integer dtype. I expect that sort of thing is done all the time and isn't something NumPy would want to deprecate.

mhvk · 2024-06-12T06:03:08Z

Yes, indeed, it is just python integers that are treated differently, and I wondered if the easiest solution would be to extend that to np.clip. It seems fairly reasonable to just change dtype if one passes in multiple arrays -- e.g., np.clip(input_u1, min_i2, max_i2) -- and then by analogy we should do the same thing for numpy scalars.

p.s. Note that this is different from what I suggested on top!

…the bounds of x As discussed in today's consortium meeting. See the discussion at numpy/numpy#24976.

asmeurer · 2024-06-13T22:30:39Z

FWIW, we decided to make this behavior unspecified in the standard. data-apis/array-api#814

jni · 2024-06-18T04:26:06Z

I just want to add a voice to this thread that np.clip as currently implemented (numpy 2.0.0) is quite hard to use:

>>> np.clip(np.array([0, 255], dtype=np.uint8), 0, 4550)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 2247, in clip
    return _wrapfunc(a, 'clip', a_min, a_max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jni/micromamba/envs/np2/lib/python3.12/site-packages/numpy/_core/_methods.py", line 108, in _clip
    return um.clip(a, min, max, out=out, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python integer 4550 out of bounds for uint8

I think most users in this scenario would expect to be able to use arrays of any type and Python ints and just have it work — there is nothing unsatisfiable about this expression, including keeping the input dtype.

When you start dealing with numpy scalars and so on, the story is indeed more complicated as noted in the above discussion, but the Python int scenario seems like an easy fix (as noted by @mhvk above) that would unlock a lot of uses already.

Sorry about the noise, I expect there will be lots in this repo this coming week. 😅 Thank you all! 🙏

mhvk · 2024-06-18T06:26:30Z

Indeed, a fix at least for python ints like for comparisons seems the way forward.

asmeurer · 2024-06-27T20:16:29Z

With the non-promoting behavior one should also be aware that you might not actually have min <= x <= max when min or max are float64 and x is float32, because of rounding in the downcast:

>>> np.clip(np.asarray([0.0], dtype=np.float32), np.asarray([4.0311033624323596e-209], dtype=np.float64), np.asarray([1.0], dtype=np.float64))
array([4.03110336e-209])
>>> _.astype(np.float32)
array([0.], dtype=float32)

That's underflow, but the rounding could work against you in virtually any case

>>> x = np.asarray([1.0], dtype=np.float32)
>>> min = np.asarray([1.00000001], dtype=np.float64)
>>> max = np.asarray([2.0], dtype=np.float64)
>>> np.clip(x, min, max).astype(np.float32)
array([1.], dtype=float32)
>>> min <= _
array([False])

Tbh, I'm starting to think this whole proposal is a bad idea and putting it in the standard was a mistake. Not type promoting implies downcasting, which just leads too many weird behaviors. But if we decide to keep it in the array API and change NumPy, we should figure out reasonable behavior for ints and document the float behavior.

jni · 2024-06-28T01:05:07Z

I don't find those examples that bad tbh. The test is not whether they are smaller than min, it is whether they are smaller than the clipped-and-rounded min. Just like I shouldn't be surprised that 0.1 + 0.2 <= 0.3 evaluates to False, I should not surprised by those edge cases. The only situation where I think clip should fail to do anything is when the min and max are both outside the range of the array dtype limits. ie np.clip(np.array([9], dtype=np.uint8), 400, 500) should be an error.

asmeurer · 2024-07-01T21:10:12Z

Well note that you don't get this issue at all (rounding or integer overflow) if you just do type promotion. The original argument here is that type promotion on x is surprising, but for me this shows why it is necessary. The whole point of type promotion in general is that functions can produce a result that fits within the bounds of both input dtypes.

mhvk · 2024-07-02T11:14:12Z

Well note that you don't get this issue at all (rounding or integer overflow) if you just do type promotion. The original argument here is that type promotion on x is surprising, but for me this shows why it is necessary. The whole point of type promotion in general is that functions can produce a result that fits within the bounds of both input dtypes.

I think I've gotten convinced about that too - I think the actionable part of this really just is to treat python ints specially (as we do for comparisons).

kgryte mentioned this issue Jan 11, 2024

Add clip to the specification data-apis/array-api#715

Merged

asmeurer mentioned this issue Apr 19, 2024

2023.12 support data-apis/array-api-tests#249

Open

21 tasks

asmeurer added a commit to asmeurer/array-api that referenced this issue Jun 13, 2024

Clarify that clip() behavior is undefined when min or max is outside …

5f2bb2e

…the bounds of x As discussed in today's consortium meeting. See the discussion at numpy/numpy#24976.

asmeurer mentioned this issue Jun 13, 2024

Clarify that clip() behavior is undefined when min or max is outside the bounds of x data-apis/array-api#814

Open

seberg mentioned this issue Jun 19, 2024

ENH: np.clip as currently implemented (numpy 2.0.0) is quite hard to use #26759

Open

asmeurer mentioned this issue Jun 27, 2024

Issues with test_clip data-apis/array-api-tests#276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

mhvk commented Oct 21, 2023

asmeurer commented Jun 7, 2024 •

edited

Loading

mhvk commented Jun 8, 2024

asmeurer commented Jun 11, 2024

asmeurer commented Jun 11, 2024

mhvk commented Jun 12, 2024

asmeurer commented Jun 13, 2024

jni commented Jun 18, 2024

mhvk commented Jun 18, 2024

asmeurer commented Jun 27, 2024

jni commented Jun 28, 2024

asmeurer commented Jul 1, 2024

mhvk commented Jul 2, 2024

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

Comments

mhvk commented Oct 21, 2023

Proposed new feature or change:

asmeurer commented Jun 7, 2024 • edited Loading

mhvk commented Jun 8, 2024

asmeurer commented Jun 11, 2024

asmeurer commented Jun 11, 2024

mhvk commented Jun 12, 2024

asmeurer commented Jun 13, 2024

jni commented Jun 18, 2024

mhvk commented Jun 18, 2024

asmeurer commented Jun 27, 2024

jni commented Jun 28, 2024

asmeurer commented Jul 1, 2024

mhvk commented Jul 2, 2024

asmeurer commented Jun 7, 2024 •

edited

Loading