Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Array API 2023.12 version support #26724

Merged
merged 1 commit into from
Jul 1, 2024
Merged

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Jun 17, 2024

This PR upgrades Array API and array-api-tests suite to 2023.12 version:

  • Adds np.cumulative_sum and np.cumulative_prod.
  • Adds min and max keyword arguments to np.clip. Allows passing Nones for both a_min and a_max.
  • Adds device keyword argument to np.astype.

@mtsokol mtsokol self-assigned this Jun 17, 2024
@mtsokol mtsokol force-pushed the array-api-2023 branch 3 times, most recently from 862c8ec to 005e41e Compare June 18, 2024 14:52
@mtsokol
Copy link
Member Author

mtsokol commented Jun 19, 2024

Hi @seberg,

This PR will add support for 2023.12 Array API standard that NumPy aims to implement.

  • This PR still needs Fixes for 2023.12 tests data-apis/array-api-tests#262 to be merged first.
  • In ENH: Add unstack() #26579 we will get unstack.
  • Here I added min and max keyword args to clip, while still supporting a_min and a_max. I see in the GitHub search that there are a few repos that pass a_min and a_max as keyword args:
    • Should we deprecate passing a_min/a_max by keyword?
    • Currently np.clip disallows passing both a_min/a_max as None i.e. np.clip(arr). The array API allows it and it's just an identity operation def id(x): return x. What is the most effective way for an identity with out argument in NumPy?
    def identity(x, out=None):
        if out is None:
            return x
        else:
            # TODO: move x contents to out
            ...
            return out
  • Here I added device keyword argument to astype.
  • Here I implemented cumulative_sum that is the same as cumsum but has an additional include_initial keyword argument. For the initial version I implemented it in the Python layer by concatenating the cumsum result with the zeros array.

With these items merged we will be 2023.12 compliant for 2.1.0 release.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some in-line comments. Main ones are that any new or changed API should be properly documented, including a changelog entry.

For cumulative_sum, arguably a better way would be to implement include_initial - though that is obviously a lot more work, changing the implementation of accumulate for the ufuncs. But it would imply one can actually include any initial.

@@ -98,8 +98,8 @@ def _count_reduce_items(arr, axis, keepdims=False, where=True):

def _clip(a, min=None, max=None, out=None, **kwargs):
if min is None and max is None:
raise ValueError("One of max or min must be given")

# TODO: is there a better way to return identity?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine.

But it implies a (small) API change, which perhaps is worth a changelog entry?

Copy link
Member Author

@mtsokol mtsokol Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then out argument is ignored. But what if a user passed it? Throw an error?

I wonder about something like:

def identity(x, out=None):
    if out is None:
        return x
    else:
        # TODO: move x contents to out
        ...
        return out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, you are right. And perhaps in general one would expect a copy, like for np.minimum and np.maximum. In that case, how about return np.positive(x, out=out)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.positive will work here - thank you!



@array_function_dispatch(_clip_dispatcher)
def clip(a, a_min, a_max, out=None, **kwargs):
def clip(a, a_min=_NotPassed, a_max=_NotPassed, out=None, *, min=_NotPassed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use np._NoValue - that is meant for cases like these (and looks better in the documentation!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Array API compatible cumulative_sum
"""
if out is not None and include_initial:
raise ValueError("not supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be a bit clearer about what is not supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right, still WIP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes, should have seen that this is a draft!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, done!

raise ValueError("not supported")

if x.ndim >= 2 and axis is None:
raise ValueError("requires axis arg")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, expand the error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

initial_shape = list(x.shape)
initial_shape[axis] = 1
res = np.concat(
[np.zeros(initial_shape, dtype=res.dtype), res], axis=axis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems a bit of a missed opportunity to actually allow initial to be passed on too. But easy to adjust if that were to happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Let's keep additive and multiplicative identity as initial for cumulative_sum and cumulative_prod for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to update accumulate, or add a separate exclusive scan method, I would include this behavior there. I see cumulative_sum/cumsum as a shorthand for the common sum.accumulate, where more advanced cases should be accessed by using the ufunc methods.

And anyways, you can always do cumulative_sum(x, include_initial=True) + initial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in this PR we can ship include_initial in cumulative_sum only to make it to 2.1.0 release with 2023.12 compatibility. Updating accumulate ufunc to also accept include_initial could be addressed separately as ufunc enhancement.

def cumulative_sum(x, /, *, axis=None, dtype=None, out=None,
include_initial=False):
"""
Array API compatible cumulative_sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is presumably going to be exposed in the main namespace, this needs a proper docstring, including the differences with regular cumsum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

numpy/_core/tests/test_numeric.py Show resolved Hide resolved
@mtsokol mtsokol marked this pull request as ready for review June 19, 2024 13:21
@mtsokol mtsokol requested a review from mhvk June 19, 2024 13:21
@mtsokol mtsokol changed the title [WIP] Add Array API 2023.12 version support Add Array API 2023.12 version support Jun 19, 2024
@mtsokol mtsokol changed the title Add Array API 2023.12 version support ENH: Add Array API 2023.12 version support Jun 19, 2024
@mtsokol mtsokol added this to the 2.1.0 release milestone Jun 20, 2024
@mattip
Copy link
Member

mattip commented Jun 26, 2024

@mhvk could you take another look?

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some comments inline, but a larger one on passing in out and include_initial: this is not too hard to do, with the following replacement,

def _cumulative_func(x, func, axis, dtype, out, include_initial):
    x_ndim = ndim(x)
    if axis is None:
        if x_ndim >= 2:
            raise ValueError("For arrays which have more than one dimension "
                             "``axis`` argument is required.")
        axis = 0

    if out is not None and include_initial:
        item = [slice(None)] * x_ndim
        item[axis] = slice(1, None)
        func.accumulate(x, axis=axis, dtype=dtype, out=out[tuple(item)])
        item[axis] = 0
        out[tuple(item)] = func.identity
        return out

    res = func.accumulate(x, axis=axis, dtype=dtype, out=out)
    if include_initial:
        initial_shape = list(x.shape)
        initial_shape[axis] = 1
        res = np.concat(
            [np.full_like(res, func.identity, shape=initial_shape), res],
            axis=axis
        )

    return res

@@ -243,6 +243,7 @@ jobs:
python -m pip install -r requirements/build_requirements.txt
python -m pip install -r requirements/test_requirements.txt
python -m pip install -r array-api-tests/requirements.txt
python -m pip install --upgrade hypothesis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Why not set a minimum version instead? (just coriosity, but the changes to the workflow seem unrelated to the PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In test_requirements.txt there's hypothesis==6.81.1. array-api-tests suite requires newer version so I updated it to the latest hypothesis==6.104.1.

@ngoldbaum I see that you updated the version a year ago. Is there a reason not to use >= here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhvk I removed that --upgrade line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point NumPy used a bot to keep the test dependencies regularly updated, but it caused a lot of noise on forks of NumPy due to a bug and we decided to turn that off. That means the test requirements are only updated as people notice they are out of date. Please feel free to update the hypothesis version in the test requirements and if it doesn't affect anything else in CI it should be fine.

We do want pinned versions though to make it less likely that an upstream version change will suddenly break our CI.

compatible alternatives for `numpy.cumsum` and `numpy.cumprod`.
* `numpy.clip` now supports ``max`` and ``min`` keyword arguments which are meant
to replace ``a_min`` and ``a_max``. Also, for ``np.clip(a)`` or ``np.clip(a, None, None)``
an identity will be returned instead of raising an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "a copy of the input array will be ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

return (a, a_min, a_max)
def _clip_dispatcher(a, a_min=None, a_max=None, out=None, *, min=None,
max=None, **kwargs):
return (a,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong: there is no reason one couldn't dispatch on the minimum and maximum too (they can be arrays), so I think this should become return (a, a_min, a_max, min, max)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I made it (a, a_min, a_max, out, min, max) as other functions here also put out here. Done!

@@ -2283,6 +2292,19 @@ def clip(a, a_min, a_max, out=None, **kwargs):
array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8])

"""
if a_min is np._NoValue and a_max is np._NoValue:
a_min = None if min == np._NoValue else min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace == with is here and on the next line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

raise ValueError("Passing ``out`` and ``include_initial=True`` "
"at the same time is not supported.")

x = asarray(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use np.asanyarray(x) here, so that for astropy I don't have to override these new functions!

Alternatively, omit this whole line and use if np.ndim(x) >= 2 below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used np.atleast1d(x) as for scalars func.accumulate throws an error. Internally atleast1d uses asanyarray. Used np.ndim(x) here also anyway.

for more details.
include_initial : bool, optional
Boolean indicating whether to include the initial value (zeros) as
the first value in the output. ``include_initial=True`` changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this constraint necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constraint removed.


Examples
--------
>>> a = np.array([1,2,3,4,5,6])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do insert spaces after the , in the examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

1000000.0050000029

"""
initial_func = np.zeros if include_initial else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my suggestion, the full implementation becomes,

return _cumulative_func(x, um.add, axis, dtype, out, include_initial)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -2643,6 +2665,197 @@ def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue):
keepdims=keepdims, where=where)


def _cumulative_func(x, func, axis, dtype, out, initial_func):
if out is not None and initial_func is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really necessary, since we know where the output is placed (see below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


res = func(x, axis=axis, dtype=dtype, out=out)

if initial_func:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would become if include_initial with my suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A leftover...

@@ -2630,6 +2635,11 @@ def astype(x, dtype, /, *, copy = True):
raise TypeError(
f"Input should be a NumPy array. It is a {type(x)} instead."
)
if device not in ["cpu", None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: make it device is not None and device != "cpu" to speed up the 99.999999% of the cases where the default is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mtsokol
Copy link
Member Author

mtsokol commented Jun 28, 2024

@mhvk Thank you for a thorough review! I applied all your comments.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but for a few very minor nits. Maybe squash commits while you are at it?

@@ -2643,6 +2665,202 @@ def all(a, axis=None, out=None, keepdims=np._NoValue, *, where=np._NoValue):
keepdims=keepdims, where=where)


def _cumulative_func(x, func, axis, dtype, out, include_initial):
x = np.atleast_1d(x)
x_ndim = ndim(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on atleast_1d. But now using ndim is no longer required, so I would just replace with x.ndim

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,6 @@
* `numpy.cumulative_sum` and `numpy.cumulative_prod` were added as Array API
compatible alternatives for `numpy.cumsum` and `numpy.cumprod`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to mention the differences? I.e., that one cannot pass in an initial, but can include_initial in the result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one line about include_initial functionality. Done!


Examples
--------
>>> a = np.array([1,2,3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at it, one more space after comma...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


The cumulative product for each column (i.e., over the rows) of ``b``:

>>> b = np.array([[1,2,3], [4,5,6]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mhvk mhvk merged commit 2efae2b into numpy:main Jul 1, 2024
67 of 68 checks passed
@mtsokol mtsokol deleted the array-api-2023 branch July 1, 2024 07:46
dfm added a commit to dfm/jax that referenced this pull request Jul 3, 2024
Numpy recently merged support for the 2023.12 revision of the Array API:
numpy/numpy#26724

This breaks two of our tests and I've chosen to skip those tests for
now:

1. The first breakage was caused by differences in how numpy and JAX
   cast negative floats to `uint8`. Specifically
   `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas
   `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`.
   We don't make any promises about consistency with casting floats to
   ints, noting that this can even be backend dependent. I don't believe
   this failure is identifying any unexpected behavior, and we test many
   other dtypes properly so I'm not concerned about skipping this test.

2. The second failure was caused by the fact that the approach we took
   in google#20550 to support backwards compatibility and the Array API for
   `clip` differs from the one used in numpy/numpy#26724. Again, the
   behavior is consistent, but it produces a different signature. I've
   skipped checking `clip`'s signature, but we should revisit it once
   the `a_min` and `a_max` parameters have been removed from JAX.

Fixes google#22251
dfm added a commit to dfm/jax that referenced this pull request Jul 3, 2024
Numpy recently merged support for the 2023.12 revision of the Array API:
numpy/numpy#26724

This breaks two of our tests:

1. The first breakage was caused by differences in how numpy and JAX
   cast negative floats to `uint8`. Specifically
   `np.float32(-1).astype(np.uint8)` returns `np.uint8(255)` whereas
   `jnp.float32(-1).astype(jnp.uint8)` produces `Array(0, dtype=uint8)`.
   We don't make any promises about consistency with casting floats to
   ints, noting that this can even be backend dependent. To fix our
   test, we now only generate positive inputs when the output dtype is
   unsigned.

2. The second failure was caused by the fact that the approach we took
   in google#20550 to support backwards compatibility and the Array API for
   `clip` differs from the one used in numpy/numpy#26724. Again, the
   behavior is consistent, but it produces a different signature. I've
   skipped checking `clip`'s signature, but we should revisit it once
   the `a_min` and `a_max` parameters have been removed from JAX.

Fixes google#22251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants