API: Switch to NEP 50 behavior by default #23912

seberg · 2023-06-09T19:37:05Z

This is a WIP for discussion and visibility for switching to NEP 50 behavior by default.

As of writing, it fixes up mainly clear tests and one small hack to allow isnan(2**100) to pass. That isn't perfectly clean maybe, but seems pragmatic.
(The individual commits make sense on their own)

There are some remaining tests that need fixing:

Should percentile(float32_arr, 30.) return a float32 or float64 (currently it switches to float64), but we could add np.result_type to fix that.
- Adding np.result_type isn't great, it would be nice to have something slightly cleaner. I am not exactly sure what, though. (A function to prep multiple arrays, something that @mdhaber would like anyway. A way to mark the array as representing a Python scalar? But we want that to be used with caution, it would be weird to be used beyond temporaries)
can_cast(123, np.uint8) isn't well defined right now. I can live with that for the moment.
np.where() doesn't do the weak-promotion, np.concatenate() does, so it should not be super hard to fix.
np.add(1., 2., dtype=np.int64) should fail because of unsafe casting, but casting safety+scalars is still a bit of a mess.

I am not sure that all of these are easy, but will chip away. OTOH some of these also look like things that might be OK as "known issues".

The main thing I would like is figure out percentile: We won't get this right everywhere, but it would be good to know what our story is for such a function.

seberg · 2023-06-09T19:38:10Z

numpy/lib/tests/test_histograms.py


-        # gh-10322 means that the type comes from arr - this may change
-        assert_equal(x_loc.dtype, float_small)
+        assert_equal(x_loc.dtype, float_large)


@eric-wieser these histogram changes look right to you? What was previously rounded to the small bin, now ends up with the large dtype because it is explicit, I guess.

FWIW, I think this is logical as long as one doesn't special-case histogram in treating its arguments asymmetrically.

seberg · 2023-07-12T11:01:16Z

Really need some opinions on this. I hacked np.select() to just check for int, float, complex explicitly for now, that is OK in practice.

np.percentile tests are more complicated, because either we have to make a clear decision that np.percentile(float32_arr, 30.) (I suspect also quantile).

This is an API decision that outlines the difficulty with NEP 50: where to draw the line and disregard the "weakly typed" float there. For percentile it may yet be a bit simpler, but still:

This should maintain the weak scalar, this probably requires making sure that all math uses python scalars (or checking up-front and enforcing a Python float at the end).
- I am really not sure how to design this correctly!
Draw the line here, quantile/percentile uses q = np.asarray(q) and unlike ufuncs doesn't use weak promotion (how to teach what does and what doesn't?)
For percentile/quantile, I guess it may make sense to ignore the interpolation precision. We do currently return a Fraction when q is a Fraction though. For some things, casting q is probably wrong.

Ping @mhvk, but really everyone should have an opinion on this, this is probably the most important thing to figure out for NEP 50 and I think a lot of people want NEP 50.

mhvk

Looked through the whole PR. I think the main strike against NEP 50 consequences is for comparisons; really, np.uint*(anything) > -1 should just work. Is the idea right now to decide on that separately?

Looking through the errors: it should be work to do np.can_cast(np.iinfo(whatever).min, whatever.dtype) -- maybe those attributes need to be of whatever.dtype?

On the percentile failures - I think I'm for option (1) here, ensuring that we keep the weak promotion that is being tested for. It will be really confusing otherwise!

mhvk · 2023-07-12T13:48:32Z

numpy/core/src/multiarray/multiarraymodule.c

+        NPY_ITER_READONLY | NPY_ITER_ALIGNED,
+        NPY_ITER_READONLY | NPY_ITER_ALIGNED
+    };
+    PyArray_Descr * common_dt = PyArray_ResultType(2, &op_in[0] + 2,


Realize this was here before, but while you are editing this, is &op_in[0]+2 not just the same as &op_in[2]?

mhvk · 2023-07-12T13:57:28Z

numpy/core/tests/test_multiarray.py

@@ -4119,7 +4119,8 @@ def test_extension_incref_elide_stack(self):
    def test_temporary_with_cast(self):
        # check that we don't elide into a temporary which would need casting
        d = np.ones(200000, dtype=np.int64)


Might as well change to np.ones(2, dtype=np.int64), or is the number so large on purpose here?

numpy/core/tests/test_multiarray.py

mhvk · 2023-07-12T14:02:29Z

numpy/core/tests/test_multiarray.py

+    @pytest.mark.parametrize("inplace", [False, True])
+    def test_int_range_error(self, inplace):
+        # E.g. clipping uint with negative integers fails to promote
+        # (changed with NEP 50 and may be adaptable)


Yikes. Here, in principle it would be nicer for the code to be smarter and realize that -1 and 256 are just not possible, so those limits should be ignored. But I see the need to avoid the draw of value-based promotion...

There are possible solution, but right now they need a fair bit of complexity (maybe some adaptation too, but not much). I think we need a ufunc implementation for (any_integer, pyinteger) and (pyinteger, anyinteger).
And either we allow access to the scalar values (if an abstract scalar) in one of the steps (promotion or loop selection), or (which is possible now) we put it into the inner-loop itself.

I.e. you can effectively make it a loop that uses the Python object for the scalar (and then deal with the rest in the loop itself).

That ain't pretty, although fortunately it would only be necessary for comparisons.

(maybe there are other ways to tweak it to be a bit more reasonable, not sure)

mhvk · 2023-07-12T14:03:46Z

numpy/core/tests/test_multiarray.py

-            assert_(not -1 > np.array(1, dtype=dt1), "type %s failed" % (dt1,))
-            assert_(-1 != np.array(1, dtype=dt1), "type %s failed" % (dt1,))
+            # NEP 50 broke comparison of unsigned with -1 for the time being
+            # (this may be fixed in the future of course).


Another painful result, indeed.

mhvk · 2023-07-12T14:24:48Z

numpy/core/tests/test_ufunc.py

-        # type resolution to kick in, which then fails):
-        with assert_raises(TypeError):
-            _rational_tests.test_add(a, np.uint16(2))
+        # This path used to go into legacy promotion, but doesn't now:


Maybe "This path with a scalar used ..."

mhvk · 2023-07-12T14:25:52Z

numpy/core/tests/test_umath.py

+        # the test should check equivalence to Python (no overflow).
+        # If not, the test can be simplified a bit more eventually.
+        with pytest.raises(OverflowError, match="Python integer -1"):
+            np_comp(arr, -1)


Salvaging this would indeed be good.

mhvk · 2023-07-12T14:28:34Z

numpy/core/tests/test_umath.py

+        # but promotion rules mean that we try to use a normal integer
+        # in the following case:
+        with pytest.raises(OverflowError):
+            np.equal(2**200, 2**200)


Ouch, these also are not great. One wonders about just delegating to standard operators/math here...

Hmmmmmm, some thoughts:

For comparisons, yes, I think we can just allow this by saying that PyInt + PyInt -> bool (i.e. a comparison loop) can just use object + object -> bool. For float or complex it doesn't really matter anyway (well we currently allow complex <, but whatever).

Beyond that, I think things may get tricky:

Some functions behave differently for objects or may not have an object loop at all, so I am not sure you can do this choice universally.

I might start wondering if we should do this more generally, which then diverges a bit into the whole problem of returning scalars vs. arrays.

Let's focus on the first part (comparisons). I wonder if it would be very useful if we don't also do the general int64(2) < 2**200 implementation/hack, but I suppose it may remove some pressure at least from test suits that use NumPy asserts all over.

I think it makes sense to remove pain points in order of simplicity! Certainly, deferring PyInt, PyInt to O,O for comparisons seems logical. Let's move out further discussion of comparisons to the main thread.

mhvk · 2023-07-12T15:30:02Z

numpy/lib/histograms.py

-        return np.subtract(a, b, casting='unsafe', dtype=dt)
+        # signed to unsigned.  The input may be negative python integers so
+        # ensure we pass in arrays with the initial dtype (related to NEP 50).
+        return np.subtract(np.asarray(a, dtype=dt), np.asarray(b, dtype=dt),


Isn't it easier to just do np.asanyarray(a - b, dtype=unsigned_dt) - I don't think we're getting any speed-up doing it another way.

It doesn't work because if a or b is a negative Python integer, the conversion will raise an error (rather being unsafe and "well defined").

EDIT: For clarification, the other one could be an int8 also, and then the mix doesn't work. (I think at least.)

mhvk · 2023-07-12T15:32:58Z

numpy/lib/tests/test_histograms.py


-        # gh-10322 means that the type comes from arr - this may change
-        assert_equal(x_loc.dtype, float_small)
+        assert_equal(x_loc.dtype, float_large)


FWIW, I think this is logical as long as one doesn't special-case histogram in treating its arguments asymmetrically.

mhvk · 2023-07-12T18:14:23Z

I made a PR to your branch at seberg#46 - this fixes percentile and quantile. It all feels a bit fragile, but not horribly so, and it seems best for this PR to not change behaviour that logically is consistent with NEP 50. So, probably the fixes are for the better.

Aside: it also fixes a bug for longdouble where the gamma factors were (I think) calculated in regular float rather than double precision.

mhvk · 2023-07-12T19:39:20Z

Of course, you are right in https://github.com/seberg/numpy/pull/46/files#r1261584859 that this problem may be more general -- I guess that is an argument for not worrying so much and instead remove those dtype expectation tests. It does seem that for scalars at least one really should just not worry too much! What matters is for array-scalar interaction where the output is an array too.

mhvk · 2023-07-13T16:05:56Z

Looking again at NEP 50, the main rationale is that values should not determine the result type (correctly!). That also means that for comparisons there is a bit of leeway, since the result type is always bool. And I do think we need to avoid that np.uint8(1) > -1 raises while np.uint8(1) > 1.0 works fine...

One option is to make the behaviour of the comparison operator different from the ufunc (as it is already), and for the operator try to ensure that an error is raised really only when necessary - i.e., let the operator check what is the problem if there is an OverflowError. This would make it more like the python comparison operators.

seberg · 2023-07-13T16:29:55Z

Right, comparisons are special and it is indeed very reasonable to allow uint8 > -1 (just not very straight forward unfortunatley); just like I added special uint64 < int64 comparisons.

Putting a special case into the operator is a good idea! The worse hack but it should be a straight forward one, at least.

seberg · 2023-10-04T12:28:15Z

I would like some input, I opened a "tracking issue" for this, but how to approach/continue here?

I think we need to push forward soon and cut corners, because there are too many moving parts for me to keep track off, and I could probably to reduce some of them if we disable legacy promotion. But that requires putting this in.

Right now, this works, the two main problems are (as mentioned many times):

can_cast(123, "int8") doesn't yet work. But I am not sure it is used that much.
Comparison ufuncs with unsigned > -1 and also isfinite(integer) failing. I am relatively confident we can find a solution for that, but of course merging this might mean downstream has to fix or skip a few tests for moderately good reasons (since we are likely to make things work again).

Of course there is more, many functions in principle should use weak-promotion rules but are not doing so yet due to early asarray() calls.

The only test failure (that isn't explicitly xfailed) is the can_cast documention.

mhvk · 2023-10-04T17:07:09Z

@seberg - Ideally, we have a PR that fixes the comparisons before merging this (I don't worry about can_cast). Numpy 2.0 changes are quite disruptive already and I think we should try hard to avoid merging something that has known problems that we intend to fix - it will just mean people fixing tests/rewriting code -- most won't know the problems will be fixed and in any case want to have their tests against numpy-dev to pass so that those are useful to find other regressions.

seberg · 2023-10-12T13:21:53Z

Well, I have given this a shot after some thoughts... It is still tricky. So aside from changing can_cast, there is now a big chunk of additional stuff:

I allow abstract DTypes (with small bug fixes) to register as loops (ArrayMethods). This is fine, you just need to take more care. It does muddy the waters between promotion and not a bit if the result DType would also be abstract. (For the specific case it doesn't, but in general it does.)
A new resolve_descriptors_raw function, which does the same job as resolve_descriptors but I pass it this scalar argument (can be expanded, but starting small).
- This only happens when available, so there are some niche paths were this cannot be used (ufunc.at and the explicit resolution function right now), we can deal with those by keeping the previous rules (things will just raise trying to convert).
- The function also gets the actual arrays.dtype instances while I normally ensure that we pass in dtypes already cast to the correct DType class. (The reason is that we don't define how to cast the abstract DTypes as of now, and even if we did, it would not be what we need unless the dtype instance actually had the value information.)
There are new loops added (for combinations!), which:
- Use the new resolve_descriptors_raw (a single function dealing with everything)
- Return the current legacy loop when that makes sense.
- Return an always true/false loop when that makes sense.
- To achieve this, they employ a hack/trick: get_loop() needs to know the value, but only resolve_descriptors_raw() does right now, so this is encoded on whether we use the np.dtype("object") singleton or a fresh instance! (Yes, probably ugly, but avoids channeling things to more places.)

Additionally, there is a promoter to say that Python integer comparisons can just use object dtype (in theory weird if the input then wasn't a Python int, but that is breaking promises).

Why did we land on that?

@ngoldbaum likes the idea of having values available for promotion, because it allows things like units**234 and fixed_width_string_array * 8.
I could channel the info into the promotion step instead
- The up-side would be a clean fallback to the normal loop and separation of concerns.
- The down-sides are that we cannot cache such a promotion and that we still need to unpack the Python integer in the inner-loop (unless we accept very slow object loops).
Unpacking only in the inner-loop could work, but doesn't solve the issue that type-resolution will currently fail. Fixing that would only work if we promote to a new int > object loop always, but then it would have to be able to deal with the general case, which seems very unappealing.

In short, I don't love resolve_descriptors_raw, nor do I love the little hack. But it seemed like the more minimal thing. One funny thing about it: Comparing with a Python integer is for the moment much faster than a NumPy scalar (if it it's dtype is larger).

The main alternative I still see is actually defining an AbstractPyIntDType(1234) instance, which cannot be attached to a real array, but holds on to the value so that it is available in the resolution step. (On the one hand, I like this, on the other hand, it probably leads to having to partially defining casts for those instances. As of now, we need to rip out legacy paths, though. Otherwise introducing this concept is going to be impossible)

Admittedly, one argument might be that comparisons with -1 for uint were usually very slow because there is always at least a casts to int64 involved. In that sense, the value-based promotion approach leads to very slow comparisons, but equivalent to now.

mhvk · 2023-10-12T17:13:49Z

@seberg - the progress sounds good, but at least for me this PR is now far too large to meaningfully review. Would it at all be possible to split it up in pieces? E.g., at least in principle, it would seem that fixing the comparisons does not require enabling NEP 50 behaviour by default. Same for, say, the quantile and percentile changes.

In order to allow more incremental progress, is there already a CI run that has NPY_PROMOTION_STATE=weak? That would probably mean marking quite a few tests with xfail(weak_promotion) for now (or make the test dependent on the state if that is relevant), but perhaps helps give a way forward, if only by making clear where the problems are.

seberg · 2023-10-12T19:56:04Z

@mhvk I first planned to just put the new stuff into its own PR, but thought I would push it here because I wanted to test the full thing. I squashed the change, but only to split it out easier into gh-24915.
I would prefer not to try to add a full test run, TBH. I don't think it helps much with the difficult bigger things, which we can review well enough without trying to keep track of xfails.

This contains two fixes: 1. a somewhat hackish solution to keep allowing very large integers for unary ufuncs. This is partially wrong, but keeps isnan and other functions working (reducing fallout in the tests quite a bit) 2. Generally adapt changes that now fail for the same reason, this is mainly due to comparisons like `-1 < uint8` now failing in generally. That is fixable in principle, but it isn't super easy. Neither is perfect, both are the pragmatic solutions at this time until they proof to be too problematic for some reason. (Getting rid of going to object for large integers is tougher and needs deeper work. It certainly is possible but too difficult at this stage: we need to remove some old branches first. We could, and maybe should, get rid of the `uint` step though, it is even more of a trap I think!)

Maybe not the best way, but its functional and easy to grok and I doubt it matters speed wise.

Readding invalid warning ignoring, even though it should not matter (it does seem to for macos arm64 tests, maybe some M2/clang version issue...).

This doesn't seem like an ideal solution, but is "minimal"?

mhvk

I think with the comparisons done, this is almost all OK. The one ufunc that still is a bit nasty is clip - I think it makes sense to use resolve_descriptors_raw for it too.

Otherwise, all small comments.

mhvk · 2023-10-19T18:58:49Z

numpy/_core/src/multiarray/multiarraymodule.c


-        /* Get the result from the iterator object array */
-        ret = (PyObject*)NpyIter_GetOperandArray(iter)[0];
+    if (common_dt == NULL || op_dt[1] == NULL) {


Might as well move these failures up to single tests right after definition

mhvk · 2023-10-19T18:59:35Z

numpy/_core/src/multiarray/multiarraymodule.c

+        goto fail;
+    }
+    iter =  NpyIter_MultiNew(4, op_in, flags,
+                                NPY_KEEPORDER, NPY_UNSAFE_CASTING,


Keep indentation consistent?

numpy/_core/src/multiarray/multiarraymodule.c

mhvk · 2023-10-19T19:05:10Z

numpy/_core/tests/test_dtype.py

-        # or object depending on the input value.  So test those paths!
-        expected_dtype = np.result_type(np.array(val).dtype, np.array(0).dtype)
+        # If we only pass scalars (mainly python ones!), NEP 50 means
+        # that we get either the default integer


delete "either"

mhvk · 2023-10-19T19:07:11Z

numpy/_core/tests/test_multiarray.py

+        # Similar to last check in `test_basic`
+        x = (np.random.random(1000) * 255).astype("uint8")
+        with pytest.raises(OverflowError):
+            x.clip(-1, 10, out=x if inplace else None)


Hmm, should we change clip as well to use resolve_descriptors_raw?

You will have to do a 3 way (you have to check two arguments) dispatching stuff which will be seems like a lot of trouble for a niche feature. So it seems like a lot of trouble for something very niche? It isn't even clear to me that the test is testing anything we expect users to do.
(I may be wrong, but I expect more floating point uses and this type of clip call is only relevant when you don't know the input integer dtype and also do not want to cast anyway.)

clip goes through python code in _methods.py that already special-cases min and max values:

numpy/numpy/_core/_methods.py

Lines 92 to 101 in d885b0b

def _clip(a, min=None, max=None, out=None, **kwargs):

if min is None and max is None:

raise ValueError("One of max or min must be given")

if min is None:

return um.minimum(a, max, out=out, **kwargs)

elif max is None:

return um.maximum(a, min, out=out, **kwargs)

else:

return um.clip(a, min, max, out=out, **kwargs)

I made a PR with a possible fix -- seberg#48 -- though at some level that just brings up the next question: if clip, why not also np.maximum and np.minimum (one-sided only, i.e., np.minimum(int_array, py_int) should error if py_int is less than the minimum possible given the dtype, but just pass through int_array if it is larger than the maximum possible).

In that respect, maybe editing the python function is the right call, since that already special-cases None as well.

Thanks, I was tempted to make it a follow-up if there is an actual complaint. Do you really think it is important?

Probably only a theoretical caveat, but all of these function (for clip, I think it is a bad choice), return the result type of mixing everything. That is slightly surprising to get something that can't hold all values. Although, I have to admit, it isn't really problematic in practice.

After I wrote my change, I must admit I became unsure, especially after having to adjust the tests for numpy ints to not do in-place -- as you say, this makes very little sense! I'm definitely fine with just raising an issue that the clip dtype resolution needs more work and keeping things as you have them here.

mhvk · 2023-10-19T19:14:25Z

numpy/_core/tests/test_umath.py

@@ -422,13 +422,14 @@ def test_unsigned_signed_direct_comparison(

        assert py_comp(arr, -1) == expected
        assert np_comp(arr, -1) == expected
+
+


One empty line...

mhvk · 2023-10-19T19:14:38Z

numpy/_core/tests/test_umath.py

        scalar = arr[0]
        assert isinstance(scalar, np.integer)
        # The Python operator here is mainly interesting:
        assert py_comp(scalar, -1) == expected
        assert np_comp(scalar, -1) == expected

-


Two empty lines between classes is good.

strange, probably some conflict resolution

mhvk · 2023-10-19T19:18:59Z

numpy/lib/_function_base_impl.py

-    q = np.asanyarray(q)
+    # Use dtype of array if possible (e.g., if q is a python int or float).
+    if isinstance(q, (int, float)) and a.dtype.kind == "f":
+        q = np.asanyarray(q, dtype=np.result_type(a, q))


I'm confused about this: can it not just be dtype=a.dtype?

mhvk · 2023-10-19T19:21:37Z

numpy/lib/_nanfunctions_impl.py

-    q = np.asanyarray(q)
+    # Use dtype of array if possible (e.g., if q is a python int or float).
+    if isinstance(q, (int, float)) and a.dtype.kind == "f":
+        q = np.asanyarray(q, dtype=np.result_type(a, q))


As above, just dtype=a.dtype? I know I looked at this before, so probably there was a reason for this... If so, let's add it to the comment...

Nah, I think it was added for good sport probably. It might do byte-swaps, etc. but that doesn't matter much.

There may have been a reason: slightly saner user dtype support. But it seems so niche (and unlikely to affect anything), that I am happy to ignore it for now.

mhvk · 2023-10-19T19:26:33Z

numpy/typing/tests/data/pass/bitwise_ops.py

-u8 | AR
-u8 ^ AR
-u8 & AR
+i << AR


Bit of an illogical place for this. Move up, just after the i8 set?

mhvk

@seberg - This looked ready to go in (just one small comment, which you can ignore).

To see how things were, I thought I'd testit with astropy. It seems mostly fine -- there are just a few cases where we need to adjust to ensure casting works fine, mostly for float32. Quite a few of those were corner cases already so no problem. And some work-arounds could be removed.

But I also ran into what I think is a similar problem to what we had with integers for comparisons: large_py_float > float32 raises an OverflowError when it should just be true. I think we should deal with this... (note that I could adjust the piece of code that fails, but it seems wrong in the same way the integers were wrong).

numpy/_core/src/multiarray/multiarraymodule.c

seberg · 2023-10-21T07:15:13Z

Hmmmm, interesting and also just some notes for now because I have to sort my thoughts to make up my mind:

You are seeing warning turned to error np.errstate(over="ignore") will hide the warning. (which means end-users won't notice this much).
There are some interesting considerations:
1. By casting float64 to float32, we lose precision, which changes == (and thus also <).
2. By casting an out-of-bound float64 to float32 we get an infinity: This is important because inf == inf so we change the bounds of that.

The last point means that the warning actually warns about relevant behavior (although typical user is unlikely to understand of course).

Fixing 2. to get rid of the warning seems like a dubious choice to me unless we also fix 1. which means we introduce explicit float32 to float64 comparisons (and float16, etc.).
OTOH, I am not sure that fixing 1. is even desired because it means that np.float32(3.1) == 3.1 fails in some cases.

For integers it was a clear cut thing were it just makes sense to allow as user API and the difficulty is a technical. But here it is more fuzzy than that!

What does everyone think?

seberg · 2023-10-21T09:06:36Z

@ngoldbaum can you have a (last?) look over. I would ping everyone else too, but not sure who wants to look at it.

ngoldbaum · 2023-10-21T11:34:55Z

Sure, I'll take a look on Monday.

mhvk · 2023-10-21T15:14:20Z

The actual comparison that failed (warned) is

header[kw] > np.finfo(np.float32).max

where header[kw] is a float. Getting a warning here is surprising at first! But, as you say logical: while in this case the code gives the "right" answer by casting the python float to float32 and let it become inf, that one gets True for == np.float32(np.inf) might not be immediately expected. On the other hand, as you note, we want for sure that 3.1 == np.float32(3.1), so the cast to float32 is logical in general. The question then is whether it would make sense to treat the float64 values that are unrepresentable just because they are outside of the float32 range differently from those that are unrepresentable because of the precision.

I think I am now convinced that in the end that would require guessing what the user wants, which makes this unsolvable (and hard to explain).

The other solution to my specific example would be to make all the attributes of finfo return regular float. But I think I can see cases where it would be really handy to have the exact representation, so better just keep as is too. After all, in astropy, I can just cast the right side to float and solve the issue.

Anyway, for this PR we clearly do not need to solve this. What seems most important is to document this example in perhaps a general casting page...

…tins) But move the error check to be higher up and thus more directly relevant. If `PyArray_DescrFromType` could have failed, the code would have been incorrect anyway.

numpy/_core/src/multiarray/multiarraymodule.c

mhvk

It would be great for @ngoldbaum to have a last look, but in the meantime let me explicitly approve, since I think this is ready to go in...

ngoldbaum

I saw a few places that could be clarified a bit and I have a couple questions.

ngoldbaum · 2023-10-23T17:52:16Z

doc/source/user/basics.creation.rst

-are handled in C/C++ functions. If you are not careful with ``dtype``
-assignments, you can get unwanted overflow, as such 
+are handled in C/C++ functions.
+When values do not fit and you are using a ``dtype`` NumPy may raise an


Suggested change

When values do not fit and you are using a ``dtype`` NumPy may raise an

When values do not fit and you are using a ``dtype``, NumPy may raise an

ngoldbaum · 2023-10-23T17:59:55Z

doc/source/user/basics.creation.rst


-  >>> a = np.array([127, 128, 129], dtype=np.int8)
-  >>> a
+  >>> np.array([127, np.int64(128), np.int64(129)], dtype=np.int8)


I feel like this is enough of a corner case that it'll be confusing more often than helpful to include this in the "basics of array creation" docs. I'm not saying it needs to be added for this PR, but it would probably be more useful to have a new subsection in reference/arrays.ndarray.rst that goes into how rounding and overflow work in detail with the NEP 50 semantics. For now I would just delete this second example you've added.

OK, removed it. I agree, I don't like it much, I just didn't really want to remove ahe note about overflows being possible completely.

ngoldbaum · 2023-10-23T18:01:39Z

numpy/_core/multiarray.py

-    False
-
-    >>> np.can_cast('<i8', '>u4', 'unsafe')
-    True


Should we leave some examples behind that demonstrate the new semantics for numpy scalars?

It just raises a TypeError right now, so not sure it is very useful?

Ah, that wasn't clear, never mind.

ngoldbaum · 2023-10-23T18:03:07Z

numpy/_core/src/multiarray/multiarraymodule.c

@@ -3252,8 +3252,9 @@ array_set_datetimeparse_function(PyObject *NPY_UNUSED(self),
 NPY_NO_EXPORT PyObject *
 PyArray_Where(PyObject *condition, PyObject *x, PyObject *y)
 {
-    PyArrayObject *arr, *ax, *ay;
+    PyArrayObject *arr, *ax, *ay = NULL;


Might as well initialize all of these to NULL, leaving it like this is a little confusing

Added, although I can't say I prefer it when it isn't needed for cleanup (the "uninitialized" warnings work pretty well anyway).

ngoldbaum · 2023-10-23T18:20:18Z

numpy/_core/src/multiarray/multiarraymodule.c

+    if (common_dt == NULL) {
+        goto fail;
+    }
+    PyArray_Descr * op_dt[4] = {common_dt, PyArray_DescrFromType(NPY_BOOL),


Couldn't hurt to add a \* creating a built in dtype like this cannot fail *\ above this line.

OK, we really should probably just have a name for it...

ngoldbaum · 2023-10-23T18:23:16Z

numpy/_core/src/multiarray/multiarraymodule.c

    ay = (PyArrayObject*)PyArray_FROM_O(y);
-    if (ax == NULL || ay == NULL) {
+    if (ay == NULL) {
        goto fail;


Sorry for writing this with awkward control flow in the first place... 🙃

ngoldbaum · 2023-10-23T18:25:25Z

numpy/_core/src/umath/ufunc_object.c

+             *       isnan).  This is not ideal, but pragmatic...
+             *       We should eventually have special loops for isnan and once
+             *       we do, we may just deprecate all remaining ones (e.g.
+             *       `negative(2**100)` not working as it is an object.)


It would be nice if this comment mentioned it was a TODO from the NEP 50 work, for context

I have it in the tracking issue...

I'm thinking of a reader who doesn't have context about NEP 50 and wouldn't be aware of a tracking issue.

Ah, sorry, somehow was thinking about creating an issue.

numpy/_core/tests/test_numeric.py

ngoldbaum · 2023-10-24T18:11:52Z

Merging with a quick fixup for the linter. Thanks @seberg!

If you're looking at this PR because of CI breakage, take a look at NEP 50: this PR turns on NEP 50 semantics by default. The result of some binary operations may end up in lower precision than they used to and some operations that used to succeed will now raise errors.

If you are finding it difficult to update your code to adapt to these new semantics or have any other questions, please feel free to open an issue.

* 3 test failures show up with latest NumPy: https://github.com/scipy/scipy/actions/runs/6632037087/job/18016857190?pr=19419 * almost certainly because of NEP50 adoption/changes: numpy/numpy#23912 * it looks to me like the only special handling we need here is for floats, so `can_cast` may have been overkill anyway, although if you follow the control flow/docstrings, it gets a bit confusing with some things claiming to use `np.intp` and others Python `int`... [skip cirrus] [skip circle]

github-actions bot added the 30 - API label Jun 9, 2023

seberg commented Jun 9, 2023

View reviewed changes

seberg force-pushed the nep50-finalize branch from ea3428e to 61245d0 Compare July 12, 2023 10:45

mattip added the triage review Issue/PR to be discussed at the next triage meeting label Jul 12, 2023

mhvk reviewed Jul 12, 2023

View reviewed changes

mattip mentioned this pull request Jul 25, 2023

BUG: numpy.clip doesn't maintain dtype #24253

Closed

seberg force-pushed the nep50-finalize branch from da8f360 to b69e88c Compare September 28, 2023 10:42

seberg marked this pull request as ready for review September 28, 2023 10:42

seberg force-pushed the nep50-finalize branch from 4097765 to 15421fd Compare September 28, 2023 12:42

seberg added this to the 2.0.0 release milestone Sep 28, 2023

seberg mentioned this pull request Oct 4, 2023

NEP 50 adoption related improvements/followups #24856

Open

7 tasks

bsipocz mentioned this pull request Oct 4, 2023

Pin numpy<2 in v6.0? astropy/astropy#15393

Closed

seberg force-pushed the nep50-finalize branch from 15421fd to 8359c36 Compare October 12, 2023 12:43

seberg force-pushed the nep50-finalize branch from a47f9e3 to 441b5d5 Compare October 12, 2023 14:30

seberg force-pushed the nep50-finalize branch from 9bb252b to cc17c38 Compare October 12, 2023 19:20

seberg mentioned this pull request Oct 12, 2023

API: Allow comparisons with and between any python integers #24915

Merged

seberg added 4 commits October 19, 2023 17:38

API: Switch to "weak" promotion state by default

197e61c

MAINT: Robustify histogram unsigned-subtract and fix test for NEP 50

14f6aa1

TST: Adjust tests for direct NEP 50 changes

0cc124d

seberg added 8 commits October 19, 2023 17:38

DOC: Fix docs that cause an integer overflow

df8f9d6

MAINT: Avoid overflow in comparison with int64 on shorter platforms

8a1497f

Maybe not the best way, but its functional and easy to grok and I doubt it matters speed wise.

DOC: Fix docstring pointing out quirks that got less bad with NEP 50

0aa93fa

TST: Undo test changes related to comparising to out-of-bound ints

375de5d

MAINT: Undo now unnecessary random number cast to int64

0e6d569

API: Disable can_cast with Python scalars (for now)

a907624

TST: Adjust test to avoid overflow (and bad results on BSD)

dcb8de2

Readding invalid warning ignoring, even though it should not matter (it does seem to for macos arm64 tests, maybe some M2/clang version issue...).

BENCH: Explicitly skip reversed (arg)partition case because they fail

6707f39

This doesn't seem like an ideal solution, but is "minimal"?

seberg force-pushed the nep50-finalize branch from cc17c38 to 6707f39 Compare October 19, 2023 15:53

mhvk reviewed Oct 19, 2023

View reviewed changes

MAINT: Address review comments

ba47e60

mhvk reviewed Oct 21, 2023

View reviewed changes

numpy/_core/src/multiarray/multiarraymodule.c Show resolved Hide resolved

mhvk mentioned this pull request Oct 21, 2023

ENH: Ensure that output of np.clip has the same dtype as the main array #24976

Open

MAINT: Clean up error path (DescrFromType cannot fail for simple buil…

bf0ff21

…tins) But move the error check to be higher up and thus more directly relevant. If `PyArray_DescrFromType` could have failed, the code would have been incorrect anyway.

mhvk reviewed Oct 23, 2023

View reviewed changes

numpy/_core/src/multiarray/multiarraymodule.c Outdated Show resolved Hide resolved

mhvk approved these changes Oct 23, 2023

View reviewed changes

ngoldbaum reviewed Oct 23, 2023

View reviewed changes

seberg and others added 2 commits October 24, 2023 14:45

MAINT: Address Nathans review comments

35e105f

MAINT: appease linter

b5c8398

ngoldbaum merged commit 5abbb3a into numpy:main Oct 24, 2023
35 of 42 checks passed

mhvk mentioned this pull request Oct 24, 2023

Numpy dev and nep 50 fixes astropy/astropy#15525

Merged

1 task

tylerjereddy mentioned this pull request Oct 24, 2023

MAINT: more NEP 50 shims scipy/scipy#19431

Merged

ksunden mentioned this pull request Jan 15, 2024

[TST] Upcoming dependency test failures matplotlib/matplotlib#27645

Closed

aganders3 mentioned this pull request Jun 10, 2024

More numpy 2.0 related fixes napari/napari#6974

Merged

	def _clip(a, min=None, max=None, out=None, **kwargs):
	if min is None and max is None:
	raise ValueError("One of max or min must be given")

	if min is None:
	return um.minimum(a, max, out=out, **kwargs)
	elif max is None:
	return um.maximum(a, min, out=out, **kwargs)
	else:
	return um.clip(a, min, max, out=out, **kwargs)

		@@ -422,13 +422,14 @@ def test_unsigned_signed_direct_comparison(

		assert py_comp(arr, -1) == expected
		assert np_comp(arr, -1) == expected

	When values do not fit and you are using a ``dtype`` NumPy may raise an
	When values do not fit and you are using a ``dtype``, NumPy may raise an

API: Switch to NEP 50 behavior by default #23912

API: Switch to NEP 50 behavior by default #23912

Conversation

seberg commented Jun 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg commented Jul 12, 2023

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Jul 12, 2023

mhvk commented Jul 12, 2023

mhvk commented Jul 13, 2023

seberg commented Jul 13, 2023

seberg commented Oct 4, 2023

mhvk commented Oct 4, 2023

seberg commented Oct 12, 2023 • edited Loading

mhvk commented Oct 12, 2023

seberg commented Oct 12, 2023

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk left a comment

Choose a reason for hiding this comment

seberg commented Oct 21, 2023 • edited Loading

seberg commented Oct 21, 2023

ngoldbaum commented Oct 21, 2023

mhvk commented Oct 21, 2023

mhvk left a comment

Choose a reason for hiding this comment

ngoldbaum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngoldbaum commented Oct 24, 2023 • edited Loading

seberg Jul 13, 2023 •

edited

Loading

seberg commented Oct 12, 2023 •

edited

Loading

seberg commented Oct 21, 2023 •

edited

Loading

ngoldbaum commented Oct 24, 2023 •

edited

Loading