ENH: Add Array API 2023.12 version support #26724

mtsokol · 2024-06-17T14:37:29Z

This PR upgrades Array API and array-api-tests suite to 2023.12 version:

Adds np.cumulative_sum and np.cumulative_prod.
Adds min and max keyword arguments to np.clip. Allows passing Nones for both a_min and a_max.
Adds device keyword argument to np.astype.

mtsokol · 2024-06-19T08:38:57Z

Hi @seberg,

This PR will add support for 2023.12 Array API standard that NumPy aims to implement.

This PR still needs Fixes for 2023.12 tests data-apis/array-api-tests#262 to be merged first.
In ENH: Add unstack() #26579 we will get unstack.
Here I added min and max keyword args to clip, while still supporting a_min and a_max. I see in the GitHub search that there are a few repos that pass a_min and a_max as keyword args:
- Should we deprecate passing a_min/a_max by keyword?
- Currently np.clip disallows passing both a_min/a_max as None i.e. np.clip(arr). The array API allows it and it's just an identity operation def id(x): return x. What is the most effective way for an identity with out argument in NumPy?
```
def identity(x, out=None):
    if out is None:
        return x
    else:
        # TODO: move x contents to out
        ...
        return out
```
- https://data-apis.org/array-api/latest/API_specification/generated/array_api.clip.html
Here I added device keyword argument to astype.
Here I implemented cumulative_sum that is the same as cumsum but has an additional include_initial keyword argument. For the initial version I implemented it in the Python layer by concatenating the cumsum result with the zeros array.
- My implementation disallows include_initial=True and out=array. The standard doesn't require it and I think it might be a bit tricky to implement as out shape would need to be different from the input array shape. Concatenation is omitted by default as the default is include_initial=False. WDYT?
- https://data-apis.org/array-api/latest/API_specification/generated/array_api.cumulative_sum.html

With these items merged we will be 2023.12 compliant for 2.1.0 release.

mhvk

Some in-line comments. Main ones are that any new or changed API should be properly documented, including a changelog entry.

For cumulative_sum, arguably a better way would be to implement include_initial - though that is obviously a lot more work, changing the implementation of accumulate for the ufuncs. But it would imply one can actually include any initial.

mhvk · 2024-06-19T09:05:41Z

numpy/_core/_methods.py

@@ -98,8 +98,8 @@ def _count_reduce_items(arr, axis, keepdims=False, where=True):

 def _clip(a, min=None, max=None, out=None, **kwargs):
    if min is None and max is None:
-        raise ValueError("One of max or min must be given")
-
+        # TODO: is there a better way to return identity?


This seems fine.

But it implies a (small) API change, which perhaps is worth a changelog entry?

But then out argument is ignored. But what if a user passed it? Throw an error?

I wonder about something like:

def identity(x, out=None): if out is None: return x else: # TODO: move x contents to out ... return out

Ah, yes, you are right. And perhaps in general one would expect a copy, like for np.minimum and np.maximum. In that case, how about return np.positive(x, out=out)?

np.positive will work here - thank you!

mhvk · 2024-06-19T09:07:30Z

numpy/_core/fromnumeric.py



 @array_function_dispatch(_clip_dispatcher)
-def clip(a, a_min, a_max, out=None, **kwargs):
+def clip(a, a_min=_NotPassed, a_max=_NotPassed, out=None, *, min=_NotPassed,


Maybe just use np._NoValue - that is meant for cases like these (and looks better in the documentation!)

mhvk · 2024-06-19T09:08:13Z

numpy/_core/fromnumeric.py

+    Array API compatible cumulative_sum
+    """
+    if out is not None and include_initial:
+        raise ValueError("not supported")


The error message should be a bit clearer about what is not supported.

Yes, right, still WIP.

OK, yes, should have seen that this is a draft!

Updated, done!

mhvk · 2024-06-19T09:08:29Z

numpy/_core/fromnumeric.py

+        raise ValueError("not supported")
+
+    if x.ndim >= 2 and axis is None:
+        raise ValueError("requires axis arg")


Again, expand the error message.

mhvk · 2024-06-19T09:10:32Z

numpy/_core/fromnumeric.py

+        initial_shape = list(x.shape)
+        initial_shape[axis] = 1
+        res = np.concat(
+            [np.zeros(initial_shape, dtype=res.dtype), res], axis=axis


Hmm, seems a bit of a missed opportunity to actually allow initial to be passed on too. But easy to adjust if that were to happen.

Sure! Let's keep additive and multiplicative identity as initial for cumulative_sum and cumulative_prod for now.

If the plan is to update accumulate, or add a separate exclusive scan method, I would include this behavior there. I see cumulative_sum/cumsum as a shorthand for the common sum.accumulate, where more advanced cases should be accessed by using the ufunc methods.

And anyways, you can always do cumulative_sum(x, include_initial=True) + initial.

I think that in this PR we can ship include_initial in cumulative_sum only to make it to 2.1.0 release with 2023.12 compatibility. Updating accumulate ufunc to also accept include_initial could be addressed separately as ufunc enhancement.

mhvk · 2024-06-19T09:11:18Z

numpy/_core/fromnumeric.py

+def cumulative_sum(x, /, *, axis=None, dtype=None, out=None,
+                   include_initial=False):
+    """
+    Array API compatible cumulative_sum


Since this is presumably going to be exposed in the main namespace, this needs a proper docstring, including the differences with regular cumsum.

numpy/_core/tests/test_numeric.py

mattip · 2024-06-26T18:20:01Z

@mhvk could you take another look?

mhvk

Thanks, some comments inline, but a larger one on passing in out and include_initial: this is not too hard to do, with the following replacement,

def _cumulative_func(x, func, axis, dtype, out, include_initial):
    x_ndim = ndim(x)
    if axis is None:
        if x_ndim >= 2:
            raise ValueError("For arrays which have more than one dimension "
                             "``axis`` argument is required.")
        axis = 0

    if out is not None and include_initial:
        item = [slice(None)] * x_ndim
        item[axis] = slice(1, None)
        func.accumulate(x, axis=axis, dtype=dtype, out=out[tuple(item)])
        item[axis] = 0
        out[tuple(item)] = func.identity
        return out

    res = func.accumulate(x, axis=axis, dtype=dtype, out=out)
    if include_initial:
        initial_shape = list(x.shape)
        initial_shape[axis] = 1
        res = np.concat(
            [np.full_like(res, func.identity, shape=initial_shape), res],
            axis=axis
        )

    return res

mhvk · 2024-06-27T12:13:20Z

.github/workflows/linux.yml

@@ -243,6 +243,7 @@ jobs:
        python -m pip install -r requirements/build_requirements.txt
        python -m pip install -r requirements/test_requirements.txt
        python -m pip install -r array-api-tests/requirements.txt
+        python -m pip install --upgrade hypothesis


Is this necessary? Why not set a minimum version instead? (just coriosity, but the changes to the workflow seem unrelated to the PR)

In test_requirements.txt there's hypothesis==6.81.1. array-api-tests suite requires newer version so I updated it to the latest hypothesis==6.104.1.

@ngoldbaum I see that you updated the version a year ago. Is there a reason not to use >= here?

@mhvk I removed that --upgrade line.

At one point NumPy used a bot to keep the test dependencies regularly updated, but it caused a lot of noise on forks of NumPy due to a bug and we decided to turn that off. That means the test requirements are only updated as people notice they are out of date. Please feel free to update the hypothesis version in the test requirements and if it doesn't affect anything else in CI it should be fine.

We do want pinned versions though to make it less likely that an upstream version change will suddenly break our CI.

mhvk · 2024-06-27T12:14:15Z

doc/release/upcoming_changes/26724.new_feature.rst

+  compatible alternatives for `numpy.cumsum` and `numpy.cumprod`.
+* `numpy.clip` now supports ``max`` and ``min`` keyword arguments which are meant
+  to replace ``a_min`` and ``a_max``. Also, for ``np.clip(a)`` or ``np.clip(a, None, None)``
+  an identity will be returned instead of raising an error.


Maybe "a copy of the input array will be ..."

mhvk · 2024-06-27T12:16:06Z

numpy/_core/fromnumeric.py

-    return (a, a_min, a_max)
+def _clip_dispatcher(a, a_min=None, a_max=None, out=None, *, min=None,
+                     max=None, **kwargs):
+    return (a,)


This seems wrong: there is no reason one couldn't dispatch on the minimum and maximum too (they can be arrays), so I think this should become return (a, a_min, a_max, min, max)

Right, I made it (a, a_min, a_max, out, min, max) as other functions here also put out here. Done!

mhvk · 2024-06-27T12:17:53Z

numpy/_core/fromnumeric.py

@@ -2283,6 +2292,19 @@ def clip(a, a_min, a_max, out=None, **kwargs):
    array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8])

    """
+    if a_min is np._NoValue and a_max is np._NoValue:
+        a_min = None if min == np._NoValue else min


Replace == with is here and on the next line.

mhvk · 2024-06-27T12:19:52Z

numpy/_core/fromnumeric.py

+        raise ValueError("Passing ``out`` and ``include_initial=True`` "
+                         "at the same time is not supported.")
+
+    x = asarray(x)


Might as well use np.asanyarray(x) here, so that for astropy I don't have to override these new functions!

Alternatively, omit this whole line and use if np.ndim(x) >= 2 below.

I used np.atleast1d(x) as for scalars func.accumulate throws an error. Internally atleast1d uses asanyarray. Used np.ndim(x) here also anyway.

mhvk · 2024-06-27T12:33:08Z

numpy/_core/fromnumeric.py

+        for more details.
+    include_initial : bool, optional
+        Boolean indicating whether to include the initial value (zeros) as
+        the first value in the output. ``include_initial=True`` changes


Why is this constraint necessary?

Constraint removed.

mhvk · 2024-06-27T12:34:01Z

numpy/_core/fromnumeric.py

+
+    Examples
+    --------
+    >>> a = np.array([1,2,3,4,5,6])


Do insert spaces after the , in the examples.

mhvk · 2024-06-27T12:35:10Z

numpy/_core/fromnumeric.py

+    1000000.0050000029
+
+    """
+    initial_func = np.zeros if include_initial else None


With my suggestion, the full implementation becomes,

return _cumulative_func(x, um.add, axis, dtype, out, include_initial)

mhvk · 2024-06-27T12:35:59Z

numpy/_core/fromnumeric.py

@@ -2643,6 +2665,197 @@ def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue):
                                  keepdims=keepdims, where=where)


+def _cumulative_func(x, func, axis, dtype, out, initial_func):
+    if out is not None and initial_func is not None:


This is not really necessary, since we know where the output is placed (see below).

mhvk · 2024-06-27T12:37:12Z

numpy/_core/fromnumeric.py

+
+    res = func(x, axis=axis, dtype=dtype, out=out)
+
+    if initial_func:


This would become if include_initial with my suggestion.

mhvk

A leftover...

mhvk · 2024-06-27T13:10:37Z

numpy/_core/numeric.py

@@ -2630,6 +2635,11 @@ def astype(x, dtype, /, *, copy = True):
        raise TypeError(
            f"Input should be a NumPy array. It is a {type(x)} instead."
        )
+    if device not in ["cpu", None]:


Nitpick: make it device is not None and device != "cpu" to speed up the 99.999999% of the cases where the default is used.

mtsokol · 2024-06-28T09:56:58Z

@mhvk Thank you for a thorough review! I applied all your comments.

mhvk

Looks good but for a few very minor nits. Maybe squash commits while you are at it?

mhvk · 2024-06-29T15:15:12Z

numpy/_core/fromnumeric.py

@@ -2643,6 +2665,202 @@ def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue):
                                  keepdims=keepdims, where=where)


+def _cumulative_func(x, func, axis, dtype, out, include_initial):
+    x = np.atleast_1d(x)
+    x_ndim = ndim(x)


Good catch on atleast_1d. But now using ndim is no longer required, so I would just replace with x.ndim

mhvk · 2024-06-29T15:16:06Z

doc/release/upcoming_changes/26724.new_feature.rst

@@ -0,0 +1,6 @@
+* `numpy.cumulative_sum` and `numpy.cumulative_prod` were added as Array API
+  compatible alternatives for `numpy.cumsum` and `numpy.cumprod`.


Is it useful to mention the differences? I.e., that one cannot pass in an initial, but can include_initial in the result?

I added one line about include_initial functionality. Done!

mhvk · 2024-06-29T15:17:34Z

numpy/_core/fromnumeric.py

+
+    Examples
+    --------
+    >>> a = np.array([1,2,3])


While you are at it, one more space after comma...

mhvk · 2024-06-29T15:17:42Z

numpy/_core/fromnumeric.py

+
+    The cumulative product for each column (i.e., over the rows) of ``b``:
+
+    >>> b = np.array([[1,2,3], [4,5,6]])


And here too.

mhvk

Thanks!

Numpy recently merged support for the 2023.12 revision of the Array API: numpy/numpy#26724 This breaks two of our tests and I've chosen to skip those tests for now: 1. The first breakage was caused by differences in how numpy and JAX cast negative floats to `uint8`. Specifically `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`. We don't make any promises about consistency with casting floats to ints, noting that this can even be backend dependent. I don't believe this failure is identifying any unexpected behavior, and we test many other dtypes properly so I'm not concerned about skipping this test. 2. The second failure was caused by the fact that the approach we took in google#20550 to support backwards compatibility and the Array API for `clip` differs from the one used in numpy/numpy#26724. Again, the behavior is consistent, but it produces a different signature. I've skipped checking `clip`'s signature, but we should revisit it once the `a_min` and `a_max` parameters have been removed from JAX. Fixes google#22251

Numpy recently merged support for the 2023.12 revision of the Array API: numpy/numpy#26724 This breaks two of our tests: 1. The first breakage was caused by differences in how numpy and JAX cast negative floats to `uint8`. Specifically `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`. We don't make any promises about consistency with casting floats to ints, noting that this can even be backend dependent. To fix our test, we now only generate positive inputs when the output dtype is unsigned. 2. The second failure was caused by the fact that the approach we took in google#20550 to support backwards compatibility and the Array API for `clip` differs from the one used in numpy/numpy#26724. Again, the behavior is consistent, but it produces a different signature. I've skipped checking `clip`'s signature, but we should revisit it once the `a_min` and `a_max` parameters have been removed from JAX. Fixes google#22251

mtsokol added the 01 - Enhancement label Jun 17, 2024

mtsokol self-assigned this Jun 17, 2024

mtsokol force-pushed the array-api-2023 branch 3 times, most recently from 862c8ec to 005e41e Compare June 18, 2024 14:52

mhvk reviewed Jun 19, 2024

View reviewed changes

mtsokol marked this pull request as ready for review June 19, 2024 13:21

mtsokol requested a review from mhvk June 19, 2024 13:21

mtsokol changed the title ~~[WIP] Add Array API 2023.12 version support~~ Add Array API 2023.12 version support Jun 19, 2024

mtsokol changed the title ~~Add Array API 2023.12 version support~~ ENH: Add Array API 2023.12 version support Jun 19, 2024

mtsokol added this to the 2.1.0 release milestone Jun 20, 2024

mhvk reviewed Jun 27, 2024

View reviewed changes

mhvk reviewed Jun 29, 2024

View reviewed changes

Add Array API 2023.12 version support

b6fcc19

mtsokol force-pushed the array-api-2023 branch from ef143d7 to b6fcc19 Compare June 30, 2024 21:40

mhvk approved these changes Jul 1, 2024

View reviewed changes

mhvk merged commit 2efae2b into numpy:main Jul 1, 2024
67 of 68 checks passed

mtsokol deleted the array-api-2023 branch July 1, 2024 07:46

This was referenced Jul 3, 2024

BUG: declare np.cumulative_prod and np.cumulative_sum as subclass-safe and test them (fix incompatibility with NumPy 2.1) astropy/astropy#16663

Merged

BUG: fix new incompatibilities with NumPy 2.1 yt-project/unyt#512

Open

dfm mentioned this pull request Jul 3, 2024

Fix compatibility with nightly numpy google/jax#22257

Merged


		res = func(x, axis=axis, dtype=dtype, out=out)

		if initial_func:

		@@ -0,0 +1,6 @@
		* `numpy.cumulative_sum` and `numpy.cumulative_prod` were added as Array API
		compatible alternatives for `numpy.cumsum` and `numpy.cumprod`.


		The cumulative product for each column (i.e., over the rows) of ``b``:

		>>> b = np.array([[1,2,3], [4,5,6]])

ENH: Add Array API 2023.12 version support #26724

ENH: Add Array API 2023.12 version support #26724

Conversation

mtsokol commented Jun 17, 2024 • edited Loading

mtsokol commented Jun 19, 2024

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtsokol Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattip commented Jun 26, 2024

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtsokol commented Jun 28, 2024

mhvk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk left a comment

Choose a reason for hiding this comment

mtsokol commented Jun 17, 2024 •

edited

Loading

mtsokol Jun 19, 2024 •

edited

Loading